Description
Function performs basic statistical analysis on a set of selected tbl_teradata
object(s), or on selected columns from tbl_teradata. It stores results from
four fundamental types of analysis based on simplified versions of the
Descriptive Statistics analysis:
Values
Statistics
Frequency
Histogram
Output tbl_teradata are produced for each type of analysis.
Usage
td_explore_valib(data, ...)
Arguments
data |
Required Argument. |
... |
Specifies other arguments supported by the function as described in the 'Other Arguments' section. |
Value
Function returns an object of class "td_explore_valib"
which is a named list containing object of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator
using names as follows:
frequency.output
histogram.output
statistics.output
values.output
Other Arguments
columns
Optional Argument.
Specifies the name(s) of the column(s) to analyze.
Types: character OR vector of Strings (character)
bins
Optional Argument.
Specifies the number of equal width bins to create for
Histogram analysis.
Default Value: 10
Types: integer
bin.style
Optional Argument.
Specifies the bin style for Histogram analysis.
Permitted Values: 'bins', 'quantiles'
Default Value: 'bins'
Types: character
max.comb.values
Optional Argument.
Specifies the maximum number of combined values
for frequency or histogram analysis.
Default Value: 10000
Types: integer
max.unique.char.values
Optional Argument.
Specifies the maximum number of unique
character values for unrestricted frequency
analysis.
Default Value: 100
Types: integer
max.unique.num.values
Optional Argument.
Specifies the maximum number of unique date
or numeric values for frequency analysis.
Default Value: 20
Types: integer
min.comb.rows
Optional Argument.
Specifies the minimum number of rows before frequency
or histogram combining attempted.
Default Value: 25000
Types: integer
restrict.freq
Optional Argument.
Specifies the restricted frequency processing
including prominent values.
Default Value: TRUE
Types: logical
restrict.threshold
Optional Argument.
Specifies the minimum percentage of rows a value
must occur in, for inclusion in results.
Default Value: 1
Types: integer
statistical.method
Optional Argument.
Specifies the method for calculating the
statistics.
Permitted Values: 'population', 'sample'
Default Value: 'population'
Types: character
stats.options
Optional Argument.
Specifies the basic statistics to be calculated for
the Statistics analysis.
Permitted Values:
all
count (cnt)
minimum (min)
maximum (max)
mean
standarddeviation (std)
skewness (skew)
kurtosis (kurt)
standarderror (ste)
coefficientofvariance (cv)
variance (var)
sum
uncorrectedsumofsquares (uss)
correctedsumofsquares (css)
Types: character OR vector of Strings (character)
distinct
Optional Argument.
Specifies the unique values count for each selected
column when this argument is set to TRUE.
Default Value: FALSE
Types: logical
filter
Optional Argument.
Specifies the clause to filter rows selected for data
exploration.
For example,
filter = "cust_id > 0"
Types: character
Examples
# Notes:
# 1. To execute Vantage Analytic Library functions, set option
# 'val.install.location' to the database name where Vantage analytic
# library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic
# Library installer.
# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")
# Get remote data source connection.
con <- td_get_context()$connection
# Create an object of class "tbl_teradata".
cust <- tbl(con, "customer_analysis")
print(cust)
# Example 1: Shows data exploration with default values.
obj <- td_explore_valib(data=cust)
# Print the frequency results.
print(obj$frequency.output)
# Print the histogram results.
print(obj$histogram.output)
# Print the statistics results.
print(obj$statistics.output)
# Print the values results.
print(obj$values.output)