teradataml supports SQL functions with the SQLAlchemy extension. For lists of these functions, see Supported SQL Functions.
from sqlalchemy import funcYou can use this imported func module to call Vantage SQL functions and pass those to DataFrame.assign() function to process the data.
For more details about these functions, refer to Teradata Vantage™ - SQL Functions, Expressions, and Predicates, B035-1145.
Example
In this example, you will calculate the standard average of column 'gpa' and 'admitted' of the "admissions_trains" dataset using the avg/average/ave function in Vantage.
Avg(value_expression)
- Create connection to Vantage and load example dataset.
# Get the connection to the Vantage using create_context() >>> from teradataml import * >>>td_context = create_context(host=getpass.getpass("Hostname: "), username=getpass.getpass("Username: "), password=getpass.getpass("Password: "))
# Load the example dataset. load_example_data("GLM", ["admissions_train"])
- Prepare example dataset.
# Create the DataFrame on 'admissions_train' table >>> admissions_train = DataFrame("admissions_train") >>> admissions_train
masters gpa stats programming admitted id 13 no 4.00 Advanced Novice 1 9 no 3.82 Advanced Advanced 1 33 no 3.55 Novice Novice 1 6 yes 3.50 Beginner Advanced 1 7 yes 2.33 Novice Novice 1 22 yes 3.46 Novice Beginner 0 37 no 3.52 Novice Novice 1 35 no 3.68 Novice Beginner 1 27 yes 3.96 Advanced Advanced 0 4 yes 3.50 Beginner Novice 1
- Import func module from SQLAlchemy package.
# Import func from SQLAlchemy to use the same for executing aggregate functions >>> from sqlalchemy import func
- Use the func module to generate a function object "agg_func_".
>>> agg_func_ = func.avg(admissions_train.gpa.expression)
Where:- func: sqlalchemy module that is imported.
- avg: the Vantage function name.The function name is case insensitive.
You can pass the function name in lowercase (avg), uppercase (AVG) or in mixed cases (Avg).
- admissions_train.gpa.expression: an expression that is passed to the function, specifying the column to use for calculating average.
Where:
admissions_train: the teradataml DataFrame;
gpa: a column name of the teradataml DataFrame;
Together, admissions_train.gpa forms a ColumnExpression (teradataml DataFrame Column).
expression: a property of teradataml DataFrame column, which is passed as the required expression to this function object.
- View the type of the function object "agg_func_".
>>> type(agg_func_)
sqlalchemy.sql.functions.Function
- Pass the function object "agg_func_", with other function objects for calculating average value of the 'admitted' column, to DataFrame.assign() function.
>>> df = admissions_train.assign(True, avg_gpa_=agg_func_, average_admitted_=func.average(admissions_train.admitted.expression), ave_admitted_=func.ave(admissions_train.admitted.expression)) >>> print_variables(df, ["avg_gpa_", "average_admitted_", "ave_admitted_"])
Equivalent SQL: select ave(admitted) AS ave_admitted_, average(admitted) AS average_admitted_, avg(gpa) AS avg_gpa_ from "admissions_train" ************************* DataFrame ********************* ave_admitted_ average_admitted_ avg_gpa_ 0 0.65 0.65 3.54175 ************************* DataFrame.dtypes ********************* ave_admitted_ float average_admitted_ float avg_gpa_ float 'avg_gpa_' Column Type: FLOAT 'average_admitted_' Column Type: FLOAT 'ave_admitted_' Column Type: FLOAT
Different function names "AVERAGE" and "AVE" that Vantage offers to calculate average are used along with 'avg'."print_variables()" is a private function used to display example outputs here. This function uses "DataFrame.show_query()" to print the query, "print(dataframe)" to display dataframe content, and "DataFrame.dtypes" to display DataFrame column types. It is not part of the teradataml package.