Teradata Package for Python Function Reference | 17.10 - Explore - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

Explore

Functions
		Explore(data, columns=None, bins=10, bin_style='bins', max_comb_values=10000, max_unique_char_values=100, max_unique_num_values=20, min_comb_rows=25000, restrict_freq=True, restrict_threshold=1, statistical_method='population', stats_options=None, distinct=False, filter=None, gen_sql=False) DESCRIPTION: Function performs basic statistical analysis on a set of selected teradataml DataFrame(s), or on selected columns from teradataml DataFrame. It stores results from four fundamental types of analysis based on simplified versions of the Descriptive Statistics analysis: 1. Values 2. Statistics 3. Frequency 4. Histogram Output teradataml DataFrames are produced for each type of analysis. PARAMETERS: data: Required Argument. Specifies the input data to perform basic statistical analysis. Types: teradataml DataFrame columns: Optional Argument. Specifies the name(s) of the column(s) to analyze. Types: str OR list of Strings (str) bins: Optional Argument. Specifies the number of equal width bins to create for Histogram analysis. Default Value: 10 Types: int bin_style: Optional Argument. Specifies the bin style for Histogram analysis. Permitted Values: 'bins', 'quantiles' Default Value: 'bins' Types: str max_comb_values: Optional Argument. Specifies the maximum number of combined values for frequency or histogram analysis. Default Value: 10000 Types: int max_unique_char_values: Optional Argument. Specifies the maximum number of unique character values for unrestricted frequency analysis. Default Value: 100 Types: int max_unique_num_values: Optional Argument. Specifies the maximum number of unique date or numeric values for frequency analysis. Default Value: 20 Types: int min_comb_rows: Optional Argument. Specifies the minimum number of rows before frequency or histogram combining attempted. Default Value: 25000 Types: int restrict_freq: Optional Argument. Specifies the restricted frequency processing including prominent values. Default Value: True Types: bool restrict_threshold: Optional Argument. Specifies the minimum percentage of rows a value must occur in, for inclusion in results. Default Value: 1 Types: int statistical_method: Optional Argument. Specifies the method for calculating the statistics. Permitted Values: 'population', 'sample' Default Value: 'population' Types: str stats_options: Optional Argument. Specifies the basic statistics to be calculated for the Statistics analysis. Permitted Values: * all * count (cnt) * minimum (min) * maximum (max) * mean * standarddeviation (std) * skewness (skew) * kurtosis (kurt) * standarderror (ste) * coefficientofvariance (cv) * variance (var) * sum * uncorrectedsumofsquares (uss) * correctedsumofsquares (css) Types: str OR list of Strings (str) distinct: Optional Argument. Specifies the unique values count for each selected column when this argument is set to True. Default Value: False Types: bool filter: Optional Argument. Specifies the clause to filter rows selected for data exploration. For example, filter = "cust_id > 0" Types: str gen_sql: Optional Argument. Specifies whether to store and return the generated function SQL or not. When set to True, function SQL is generated as well as executed, which can be accessed using show_query() method, otherwise SQL is just executed but not returned. Default Value: False Types: bool RETURNS: An instance of Explore. Output teradataml DataFrames can be accessed using attribute references, such as ExploreObj.<attribute_name>. Output teradataml DataFrame attribute names are: 1. frequency_output 2. histogram_output 3. statistics_output 4. values_output RAISES: TeradataMlException, TypeError, ValueError EXAMPLES: # Notes: # 1. To execute Vantage Analytic Library functions, # a. import "valib" object from teradataml. # b. set 'configure.val_install_location' to the database name where Vantage # analytic library functions are installed. # 2. Datasets used in these examples can be loaded using Vantage Analytic Library # installer. # Import valib object from teradataml to execute this function. from teradataml import valib # Set the 'configure.val_install_location' variable, from teradataml import configure configure.val_install_location = "SYSLIB" # Create required teradataml DataFrame. df = DataFrame("customer") print(df) # Example 1: Shows data exploration with default values. obj = valib.Explore(data=df) # Print the frequency results. print(obj.frequency_output) # Print the histogram results. print(obj.histogram_output) # Print the statistics results. print(obj.statistics_output) # Print the values results. print(obj.values_output) # Example 2: Generate SQL for the function and execute the same. obj = valib.Explore(data=df,gen_sql=True) # Print the generated SQL. print(obj.show_query("sql")) # Print both generated SQL and stored procedure call. print(obj.show_query("both")) # Print the stored procedure call. print(obj.show_query()) print(obj.show_query("sp"))