| |
- Values(data, columns=None, exclude_columns=None, group_columns=None, distinct=False, filter=None, gen_sql_only=False)
- DESCRIPTION:
Use Values analysis as the first type of analysis performed on unknown data.
Values analysis determines the nature and quality of the data. For example, whether
the data is categorical or continuously numeric, how many null values it contains,
and so on.
A Values analysis provides a count of rows, rows with non-null values, rows with
null values, rows with value 0, rows with a positive value, rows with a negative
value, and the number of rows containing blanks in the given column. By default,
unique values are counted, but this calculation can be inhibited for performance
reasons if desired.
For a column of non-numeric type, the zero, positive, and negative counts are
always zero (for example, 000 is not counted as 0). A Values analysis can be
performed on columns of any data type, though the measures displayed vary according
to column type.
PARAMETERS:
data:
Required Argument.
Specifies the input data to perform Values analysis.
Types: teradataml DataFrame
columns:
Required Argument.
Specifies the name(s) of the column(s) to analyze. Occasionally, it can also
accept permitted strings to specify all columns, or all numeric columns, or
all character columns.
Permitted Values:
* Name(s) of the column(s) in "data".
* Pre-defined strings:
* 'all' - all columns
* 'allnumeric' - all numeric columns
* 'allcharacter' - all numeric and date columns
Types: str OR list of Strings (str)
exclude_columns:
Optional Argument.
Specifies the name(s) of the column(s) to exclude from the analysis, if a
column specifier such as 'all', 'allnumeric', 'allcharacter' is used in the
"columns" argument.
Types: str OR list of Strings (str)
group_columns:
Optional Argument.
Specifies the name(s) of column(s) to perform separate analysis for each group.
Types: str OR list of Strings (str)
distinct:
Optional Argument.
Specifies whether to select unique values count for each selected column.
Default Value: False
Types: bool
filter:
Optional Argument.
Specifies the clause to filter rows selected for analysis within Values.
For example,
filter = "cust_id > 0"
Types: str
gen_sql_only:
Optional Argument.
Specifies whether to generate only SQL for the function.
When set to True, function SQL is generated, not executed, which can be accessed
using show_query() method, otherwise SQL is just executed but not returned.
Default Value: False
Types: bool
RETURNS:
An instance of Values.
Output teradataml DataFrames can be accessed using attribute references, such as
ValuesObj.<attribute_name>.
Output teradataml DataFrame attribute name is: result.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLES:
# Notes:
# 1. To execute Vantage Analytic Library functions,
# a. import "valib" object from teradataml.
# b. set 'configure.val_install_location' to the database name where Vantage
# analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library
# installer.
# Import valib object from teradataml to execute this function.
from teradataml import valib
# Set the 'configure.val_install_location' variable.
from teradataml import configure
configure.val_install_location = "SYSLIB"
# Create required teradataml DataFrame.
df = DataFrame("customer")
print(df)
# Example 1: Perform Values analysis using default values on 'income' and
# 'marital_status' columns.
obj = valib.Values(data=df, columns=["income", "marital_status"])
# Print the results.
print(obj.result)
# Example 2: Perform Values analysis on 'income' column with values grouped by
# 'gender' and only for rows with income greater than 0.
obj = valib.Values(data=df, columns="income", group_columns="gender", filter="income > 0")
# Print the results.
print(obj.result)
# Example 3: Generate only SQL for the function, but do not execute the same.
obj = valib.Values(data=df,
columns=["income", "marital_status"],
filter="cust_id > 0",
distinct=False,
gen_sql_only=True)
# Print the generated SQL.
print(obj.show_query("sql"))
# Print both generated SQL and stored procedure call.
print(obj.show_query("both"))
# Print the stored procedure call.
print(obj.show_query())
print(obj.show_query("sp"))
|