Accessing Vantage SQL Functions | SQLAlchemy | Teradata Python Package - Accessing Vantage SQL Functions

Accessing Vantage SQL Functions | SQLAlchemy | Teradata Python Package - Accessing Vantage SQL Functions - Teradata Package for Python

Teradata® Package for Python User Guide

Product

Teradata Package for Python

Release Number

17.10

Published

May 2022

Language

English (United States)

Last Update

2022-08-18

dita:mapPath

rsu1641592952675.ditamap

dita:ditavalPath

ayr1485454803741.ditaval

dita:id

B700-4006

lifecycle

Product Category

Teradata Vantage

teradataml supports SQL functions with the SQLAlchemy extension. For lists of these functions, see Supported SQL Functions.

To access these functions in teradataml, you only need to know the following SQLAlchemy statement, the rest of the functionality is managed by teradataml and Vantage.

from sqlalchemy import func

You can use this imported func module to call Vantage SQL functions and pass those to DataFrame.assign() function to process the data.

For more details about these functions, refer to Teradata Vantage™ - SQL Functions, Expressions, and Predicates, B035-1145.

Example

In this example, you will calculate the standard average of column 'gpa' and 'admitted' of the "admissions_trains" dataset using the avg/average/ave function in Vantage.

The avg/average/ave function has the following syntax, and returns the arithmetic average of all values in value_expression.

Avg(value_expression)

Create connection to Vantage and load example dataset.

# Get the connection to the Vantage using create_context()
>>> from teradataml import *
>>>td_context = create_context(host=getpass.getpass("Hostname: "), username=getpass.getpass("Username: "), password=getpass.getpass("Password: "))

# Load the example dataset.
load_example_data("GLM", ["admissions_train"])

Prepare example dataset.

# Create the DataFrame on 'admissions_train' table
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train

   masters   gpa     stats programming  admitted
id                                              
13      no  4.00  Advanced      Novice         1
9       no  3.82  Advanced    Advanced         1
33      no  3.55    Novice      Novice         1
6      yes  3.50  Beginner    Advanced         1
7      yes  2.33    Novice      Novice         1
22     yes  3.46    Novice    Beginner         0
37      no  3.52    Novice      Novice         1
35      no  3.68    Novice    Beginner         1
27     yes  3.96  Advanced    Advanced         0
4      yes  3.50  Beginner      Novice         1

Import func module from SQLAlchemy package.

# Import func from SQLAlchemy to use the same for executing aggregate functions
>>> from sqlalchemy import func

Use the func module to generate a function object "agg_func_".
```
>>> agg_func_ = func.avg(admissions_train.gpa.expression)
```
Where:
- func: sqlalchemy module that is imported.
- avg: the Vantage function name.
  The function name is case insensitive.
  You can pass the function name in lowercase (avg), uppercase (AVG) or in mixed cases (Avg).
- admissions_train.gpa.expression: an expression that is passed to the function, specifying the column to use for calculating average.
  Where:
  
  admissions_train: the teradataml DataFrame;
  
  gpa: a column name of the teradataml DataFrame;
  
  Together, admissions_train.gpa forms a ColumnExpression (teradataml DataFrame Column).
  
  expression: a property of teradataml DataFrame column, which is passed as the required expression to this function object.

View the type of the function object "agg_func_".

>>> type(agg_func_)

sqlalchemy.sql.functions.Function

Pass the function object "agg_func_", with other function objects for calculating average value of the 'admitted' column, to DataFrame.assign() function.

>>> df = admissions_train.assign(True, avg_gpa_=agg_func_, average_admitted_=func.average(admissions_train.admitted.expression), ave_admitted_=func.ave(admissions_train.admitted.expression))

>>> print_variables(df, ["avg_gpa_", "average_admitted_", "ave_admitted_"])

Equivalent SQL: select ave(admitted) AS ave_admitted_, average(admitted) AS average_admitted_, avg(gpa) AS avg_gpa_ from "admissions_train"


 ************************* DataFrame ********************* 
   ave_admitted_  average_admitted_  avg_gpa_
0           0.65               0.65   3.54175



 ************************* DataFrame.dtypes ********************* 
ave_admitted_        float
average_admitted_    float
avg_gpa_             float



 'avg_gpa_' Column Type: FLOAT
 'average_admitted_' Column Type: FLOAT
 'ave_admitted_' Column Type: FLOAT

Different function names "AVERAGE" and "AVE" that Vantage offers to calculate average are used along with 'avg'.

"print_variables()" is a private function used to display example outputs here. This function uses "DataFrame.show_query()" to print the query, "print(dataframe)" to display dataframe content, and "DataFrame.dtypes" to display DataFrame column types. It is not part of the teradataml package.