Accessing Vantage SQL Functions | SQLAlchemy | Teradata Python Package - Accessing Vantage SQL Functions - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.10
Published
May 2022
Language
English (United States)
Last Update
2022-08-18
dita:mapPath
rsu1641592952675.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

teradataml supports SQL functions with the SQLAlchemy extension. For lists of these functions, see Supported SQL Functions.

To access these functions in teradataml, you only need to know the following SQLAlchemy statement, the rest of the functionality is managed by teradataml and Vantage.
from sqlalchemy import func
You can use this imported func module to call Vantage SQL functions and pass those to DataFrame.assign() function to process the data.

For more details about these functions, refer to Teradata Vantage™ - SQL Functions, Expressions, and Predicates, B035-1145.

Example

In this example, you will calculate the standard average of column 'gpa' and 'admitted' of the "admissions_trains" dataset using the avg/average/ave function in Vantage.

The avg/average/ave function has the following syntax, and returns the arithmetic average of all values in value_expression.
Avg(value_expression)
  • Create connection to Vantage and load example dataset.
    # Get the connection to the Vantage using create_context()
    >>> from teradataml import *
    >>>td_context = create_context(host=getpass.getpass("Hostname: "), username=getpass.getpass("Username: "), password=getpass.getpass("Password: "))
    
    # Load the example dataset.
    load_example_data("GLM", ["admissions_train"])
  • Prepare example dataset.
    # Create the DataFrame on 'admissions_train' table
    >>> admissions_train = DataFrame("admissions_train")
    >>> admissions_train
       masters   gpa     stats programming  admitted
    id                                              
    13      no  4.00  Advanced      Novice         1
    9       no  3.82  Advanced    Advanced         1
    33      no  3.55    Novice      Novice         1
    6      yes  3.50  Beginner    Advanced         1
    7      yes  2.33    Novice      Novice         1
    22     yes  3.46    Novice    Beginner         0
    37      no  3.52    Novice      Novice         1
    35      no  3.68    Novice    Beginner         1
    27     yes  3.96  Advanced    Advanced         0
    4      yes  3.50  Beginner      Novice         1
  • Import func module from SQLAlchemy package.
    # Import func from SQLAlchemy to use the same for executing aggregate functions
    >>> from sqlalchemy import func
  • Use the func module to generate a function object "agg_func_".
    >>> agg_func_ = func.avg(admissions_train.gpa.expression)
    
    Where:
    • func: sqlalchemy module that is imported.
    • avg: the Vantage function name.
      The function name is case insensitive.

      You can pass the function name in lowercase (avg), uppercase (AVG) or in mixed cases (Avg).

    • admissions_train.gpa.expression: an expression that is passed to the function, specifying the column to use for calculating average.

      Where:

      admissions_train: the teradataml DataFrame;

      gpa: a column name of the teradataml DataFrame;

      Together, admissions_train.gpa forms a ColumnExpression (teradataml DataFrame Column).

      expression: a property of teradataml DataFrame column, which is passed as the required expression to this function object.

  • View the type of the function object "agg_func_".
    >>> type(agg_func_)
    sqlalchemy.sql.functions.Function
  • Pass the function object "agg_func_", with other function objects for calculating average value of the 'admitted' column, to DataFrame.assign() function.
    >>> df = admissions_train.assign(True, avg_gpa_=agg_func_, average_admitted_=func.average(admissions_train.admitted.expression), ave_admitted_=func.ave(admissions_train.admitted.expression))
    
    >>> print_variables(df, ["avg_gpa_", "average_admitted_", "ave_admitted_"])
    Equivalent SQL: select ave(admitted) AS ave_admitted_, average(admitted) AS average_admitted_, avg(gpa) AS avg_gpa_ from "admissions_train"
    
    
     ************************* DataFrame ********************* 
       ave_admitted_  average_admitted_  avg_gpa_
    0           0.65               0.65   3.54175
    
    
    
     ************************* DataFrame.dtypes ********************* 
    ave_admitted_        float
    average_admitted_    float
    avg_gpa_             float
    
    
    
     'avg_gpa_' Column Type: FLOAT
     'average_admitted_' Column Type: FLOAT
     'ave_admitted_' Column Type: FLOAT
    Different function names "AVERAGE" and "AVE" that Vantage offers to calculate average are used along with 'avg'.
    "print_variables()" is a private function used to display example outputs here. This function uses "DataFrame.show_query()" to print the query, "print(dataframe)" to display dataframe content, and "DataFrame.dtypes" to display DataFrame column types. It is not part of the teradataml package.