In this example, you calculate the standard average of column 'gpa' and 'admitted' of the "admissions_trains" dataset using the avg/average/ave function in Vantage.
The avg/average/ave function has the following syntax, and returns the arithmetic average of all values in value_expression.
Avg(value_expression)
- Create connection to Vantage and load example dataset.
# Get the connection to the Vantage using create_context() >>> from teradataml import * >>>td_context = create_context(host=getpass.getpass("Hostname: "), username=getpass.getpass("Username: "), password=getpass.getpass("Password: "))
# Load the example dataset. load_example_data("GLM", ["admissions_train"])
- Prepare example dataset.
# Create the DataFrame on 'admissions_train' table >>> admissions_train = DataFrame("admissions_train") >>> admissions_train
masters gpa stats programming admitted id 13 no 4.00 Advanced Novice 1 9 no 3.82 Advanced Advanced 1 33 no 3.55 Novice Novice 1 6 yes 3.50 Beginner Advanced 1 7 yes 2.33 Novice Novice 1 22 yes 3.46 Novice Beginner 0 37 no 3.52 Novice Novice 1 35 no 3.68 Novice Beginner 1 27 yes 3.96 Advanced Advanced 0 4 yes 3.50 Beginner Novice 1
- Import func module from SQLAlchemy package.
# Import func from SQLAlchemy to use the same for executing aggregate functions >>> from sqlalchemy import func
- Use the func module to generate a function object "agg_func_".
>>> agg_func_ = func.avg(admissions_train.gpa.expression)
Where:- func: sqlalchemy module that is imported.
- avg: the Vantage function name.The function name is case insensitive.
You can pass the function name in lowercase (avg), uppercase (AVG) or in mixed cases (Avg).
- admissions_train.gpa.expression: an expression that is passed to the function, specifying the column to use for calculating average.
Where:
admissions_train: the teradataml DataFrame;
gpa: a column name of the teradataml DataFrame;
Together, admissions_train.gpa forms a ColumnExpression (teradataml DataFrame Column).
expression: a property of teradataml DataFrame column, which is passed as the required expression to this function object.
- View the type of the function object "agg_func_".
>>> type(agg_func_)
sqlalchemy.sql.functions.Function
- Pass the function object "agg_func_", with other function objects for calculating average value of the 'admitted' column, to DataFrame.assign() function.
>>> df = admissions_train.assign(True, avg_gpa_=agg_func_, average_admitted_=func.average(admissions_train.admitted.expression), ave_admitted_=func.ave(admissions_train.admitted.expression)) >>> print_variables(df, ["avg_gpa_", "average_admitted_", "ave_admitted_"])
Equivalent SQL: select ave(admitted) AS ave_admitted_, average(admitted) AS average_admitted_, avg(gpa) AS avg_gpa_ from "admissions_train" ************************* DataFrame ********************* ave_admitted_ average_admitted_ avg_gpa_ 0 0.65 0.65 3.54175 ************************* DataFrame.dtypes ********************* ave_admitted_ float average_admitted_ float avg_gpa_ float 'avg_gpa_' Column Type: FLOAT 'average_admitted_' Column Type: FLOAT 'ave_admitted_' Column Type: FLOAT
Different function names "AVERAGE" and "AVE" that Vantage offers to calculate average are used along with 'avg'."print_variables()" is a private function used to display example outputs here. This function uses "DataFrame.show_query()" to print the query, "print(dataframe)" to display dataframe content, and "DataFrame.dtypes" to display DataFrame column types. This function is not part of the teradataml package.