Description
Statistics analysis provides several common and not so common statistical measures
for numeric data columns. Extended options include additional analyses and measures
such as Values, Modes, Quantiles, and Ranks. Use statistical measures to understand
the characteristics and properties of each numeric column, and to look for outlying
values and anomalies.
Statistics analysis can be performed on columns of numeric or date data type. For
columns of type DATE, statistics other than count, minimum, maximum, and mean are
calculated by first converting to the number of days since 1900.
Usage
td_statistics_valib(data, columns, ...)
Arguments
data |
Required Argument. |
columns |
Required Argument.
Types: character OR vector of Strings (character) |
... |
Specifies other arguments supported by the function as described in the 'Other Arguments' section. |
Value
Function returns an object of class "td_statistics_valib"
which is a named list containing object of class "tbl_teradata".
Named list member can be referenced directly with the "$" operator
using name: result.
Other Arguments
exclude.columns
Optional Argument.
Specifies the name(s) of the column(s) to exclude from the
analysis, if a column specifier such as 'all', 'allnumeric',
'allnumericanddate' is used in the "columns" argument.
Types: character OR vector of Strings (character)
extended.options
Optional Argument.
Specifies the extended options for calculating statistics.
Permitted Values: 'all', 'none', 'modes', 'quantiles', 'values', 'rank'
Default Value: 'none'
Types: character OR vector of Strings (character)
group.columns
Optional Argument.
Specifies the name(s) of column(s) to perform separate analysis for
each group.
Types: character OR vector of Strings (character)
statistical.method
Optional Argument.
Specifies the statistical method.
Permitted Values: 'sample', 'population'
Default Value: 'population'
Types: character
stats.options
Optional Argument.
Specifies the basic statistics to be calculated.
Permitted Values:
all
count (cnt)
minimum (min)
maximum (max)
mean
standarddeviation (std)
skewness (skew)
kurtosis (kurt)
standarderror (ste)
coefficientofvariance (cv)
variance (var)
sum
uncorrectedsumofsquares (uss)
correctedsumofsquares (css)
Default Value: c('cnt', 'min', 'max', 'mean', 'std')
Types: character OR vector of Strings (character)
filter
Optional Argument.
Specifies the clause to filter rows selected for analysis within
Statistics.
For example,
filter = "cust_id > 0"
Types: character
Examples
# Notes:
# 1. To execute Vantage Analytic Library functions, set options 'val.install.location' to
# the database name where Vantage analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library installer.
# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")
# Get remote data source connection.
con <- td_get_context()$connection
# Create an object of class "tbl_teradata".
df <- tbl(con, "customer")
print(df)
# Example 1: Perform Statistics analysis using default values on 'income' column.
obj <- td_statistics_valib(data=df, columns="income")
# Print the results.
print(obj$result)
# Example 2: Perform Statistics analysis on 'income' column with values grouped
# by 'gender' and only for rows with income greater than 0.
obj <- td_statistics_valib(data=df,
columns="income",
group.columns="gender",
filter="income > 0")
# Print the results.
print(obj$result)
# Example 3: Perform Statistics analysis requesting all statistical measures
# and extended options.
obj <- td_statistics_valib(data=df,
columns="income",
stats.options="all",
extended.options="all")
# Print the results.
print(obj$result)
# Example 4: Perform Statistics analysis requesting specific statistical measures
# and extended options and return sample statistics.
obj <- td_statistics_valib(data=df,
columns="income",
stats.options=c("cnt", "max", "min", "mean",
"css", "uss", "kurt", "skew"),
extended.options=c("modes", "rank"),
statistical.method="sample")
# Print the results.
print(obj$result)