Teradata Package for R Function Reference | 17.00 - 17.00 - td_statistics_valib - Teradata Package for R

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B700-4007-090K
Language
English (United States)

Description

Statistics analysis provides several common and not so common statistical measures for numeric data columns. Extended options include additional analyses and measures such as Values, Modes, Quantiles, and Ranks. Use statistical measures to understand the characteristics and properties of each numeric column, and to look for outlying values and anomalies.

Statistics analysis can be performed on columns of numeric or date data type. For columns of type DATE, statistics other than count, minimum, maximum, and mean are calculated by first converting to the number of days since 1900.

Usage

td_statistics_valib(data, columns, ...)

Arguments

data

Required Argument.
Specifies the input data to perform statistical analysis.
Types: tbl_teradata

columns

Required Argument.
Specifies the name(s) of the column(s) to analyze. Occasionally, it can also accept permitted strings to specify all columns, or all numeric columns, or all numeric and date columns.
Permitted Values:

  1. Name(s) of the column(s) in "data".

  2. Pre-defined strings:

    1. 'all' - all columns

    2. 'allnumeric' - all numeric columns

    3. 'allnumericanddate' - all numeric and date columns

Types: character OR vector of Strings (character)

...

Specifies other arguments supported by the function as described in the 'Other Arguments' section.

Value

Function returns an object of class "td_statistics_valib" which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator using name: result.

Other Arguments

exclude.columns

Optional Argument.
Specifies the name(s) of the column(s) to exclude from the analysis, if a column specifier such as 'all', 'allnumeric', 'allnumericanddate' is used in the "columns" argument.
Types: character OR vector of Strings (character)

extended.options

Optional Argument.
Specifies the extended options for calculating statistics.
Permitted Values: 'all', 'none', 'modes', 'quantiles', 'values', 'rank'
Default Value: 'none'
Types: character OR vector of Strings (character)

group.columns

Optional Argument.
Specifies the name(s) of column(s) to perform separate analysis for each group.
Types: character OR vector of Strings (character)

statistical.method

Optional Argument.
Specifies the statistical method.
Permitted Values: 'sample', 'population'
Default Value: 'population'
Types: character

stats.options

Optional Argument.
Specifies the basic statistics to be calculated.
Permitted Values:

  • all

  • count (cnt)

  • minimum (min)

  • maximum (max)

  • mean

  • standarddeviation (std)

  • skewness (skew)

  • kurtosis (kurt)

  • standarderror (ste)

  • coefficientofvariance (cv)

  • variance (var)

  • sum

  • uncorrectedsumofsquares (uss)

  • correctedsumofsquares (css)

Default Value: c('cnt', 'min', 'max', 'mean', 'std')
Types: character OR vector of Strings (character)

filter

Optional Argument.
Specifies the clause to filter rows selected for analysis within Statistics.
For example,
filter = "cust_id > 0"
Types: character

Examples

# Notes:
#   1. To execute Vantage Analytic Library functions, set options 'val.install.location' to
#      the database name where Vantage analytic library functions are installed.
#   2. Datasets used in these examples can be loaded using Vantage Analytic Library installer.

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Get remote data source connection.
con <- td_get_context()$connection

# Create an object of class "tbl_teradata".
df <- tbl(con, "customer")
print(df)

# Example 1: Perform Statistics analysis using default values on 'income' column.
obj <- td_statistics_valib(data=df, columns="income")

# Print the results.
print(obj$result)

# Example 2: Perform Statistics analysis on 'income' column with values grouped
#            by 'gender' and only for rows with income greater than 0.
obj <- td_statistics_valib(data=df,
                           columns="income",
                           group.columns="gender",
                           filter="income > 0")

# Print the results.
print(obj$result)

# Example 3: Perform Statistics analysis requesting all statistical measures
#            and extended options.
obj <- td_statistics_valib(data=df,
                           columns="income",
                           stats.options="all",
                           extended.options="all")

# Print the results.
print(obj$result)

# Example 4: Perform Statistics analysis requesting specific statistical measures
#            and extended options and return sample statistics.
obj <- td_statistics_valib(data=df,
                           columns="income",
                           stats.options=c("cnt", "max", "min", "mean",
                                           "css", "uss", "kurt", "skew"),
                           extended.options=c("modes", "rank"),
                           statistical.method="sample")

# Print the results.
print(obj$result)