Example: Accessing Vantage SQL Function with SQLAlchemy - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

In this example, you calculate the standard average of column 'gpa' and 'admitted' of the "admissions_trains" dataset using the avg/average/ave function in Vantage.

The avg/average/ave function has the following syntax, and returns the arithmetic average of all values in value_expression.
Avg(value_expression)
  • Create connection to Vantage and load example dataset.
    # Get the connection to the Vantage using create_context()
    >>> from teradataml import *
    >>>td_context = create_context(host=getpass.getpass("Hostname: "), username=getpass.getpass("Username: "), password=getpass.getpass("Password: "))
    
    # Load the example dataset.
    load_example_data("GLM", ["admissions_train"])
  • Prepare example dataset.
    # Create the DataFrame on 'admissions_train' table
    >>> admissions_train = DataFrame("admissions_train")
    >>> admissions_train
       masters   gpa     stats programming  admitted
    id                                              
    13      no  4.00  Advanced      Novice         1
    9       no  3.82  Advanced    Advanced         1
    33      no  3.55    Novice      Novice         1
    6      yes  3.50  Beginner    Advanced         1
    7      yes  2.33    Novice      Novice         1
    22     yes  3.46    Novice    Beginner         0
    37      no  3.52    Novice      Novice         1
    35      no  3.68    Novice    Beginner         1
    27     yes  3.96  Advanced    Advanced         0
    4      yes  3.50  Beginner      Novice         1
  • Import func module from SQLAlchemy package.
    # Import func from SQLAlchemy to use the same for executing aggregate functions
    >>> from sqlalchemy import func
  • Use the func module to generate a function object "agg_func_".
    >>> agg_func_ = func.avg(admissions_train.gpa.expression)
    
    Where:
    • func: sqlalchemy module that is imported.
    • avg: the Vantage function name.
      The function name is case insensitive.

      You can pass the function name in lowercase (avg), uppercase (AVG) or in mixed cases (Avg).

    • admissions_train.gpa.expression: an expression that is passed to the function, specifying the column to use for calculating average.

      Where:

      admissions_train: the teradataml DataFrame;

      gpa: a column name of the teradataml DataFrame;

      Together, admissions_train.gpa forms a ColumnExpression (teradataml DataFrame Column).

      expression: a property of teradataml DataFrame column, which is passed as the required expression to this function object.

  • View the type of the function object "agg_func_".
    >>> type(agg_func_)
    sqlalchemy.sql.functions.Function
  • Pass the function object "agg_func_", with other function objects for calculating average value of the 'admitted' column, to DataFrame.assign() function.
    >>> df = admissions_train.assign(True, avg_gpa_=agg_func_, average_admitted_=func.average(admissions_train.admitted.expression), ave_admitted_=func.ave(admissions_train.admitted.expression))
    
    >>> print_variables(df, ["avg_gpa_", "average_admitted_", "ave_admitted_"])
    Equivalent SQL: select ave(admitted) AS ave_admitted_, average(admitted) AS average_admitted_, avg(gpa) AS avg_gpa_ from "admissions_train"
    
    
     ************************* DataFrame ********************* 
       ave_admitted_  average_admitted_  avg_gpa_
    0           0.65               0.65   3.54175
    
    
    
     ************************* DataFrame.dtypes ********************* 
    ave_admitted_        float
    average_admitted_    float
    avg_gpa_             float
    
    
    
     'avg_gpa_' Column Type: FLOAT
     'average_admitted_' Column Type: FLOAT
     'ave_admitted_' Column Type: FLOAT
    Different function names "AVERAGE" and "AVE" that Vantage offers to calculate average are used along with 'avg'.
    "print_variables()" is a private function used to display example outputs here. This function uses "DataFrame.show_query()" to print the query, "print(dataframe)" to display dataframe content, and "DataFrame.dtypes" to display DataFrame column types. This function is not part of the teradataml package.