Teradata Package for R Function Reference | 17.00 - td_explore_valib - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Published
July 2021
Language
English (United States)
Last Update
2023-08-08
dita:id
B700-4007
NMT
no
Product Category
Teradata Vantage
Descriptive Statistics Function: Explore

Description

Function performs basic statistical analysis on a set of selected tbl_teradata object(s), or on selected columns from tbl_teradata. It stores results from four fundamental types of analysis based on simplified versions of the Descriptive Statistics analysis:

  1. Values

  2. Statistics

  3. Frequency

  4. Histogram

Output tbl_teradata are produced for each type of analysis.

Usage

td_explore_valib(data, ...)

Arguments

data

Required Argument.
Specifies the input data to perform basic statistical analysis.
Types: tbl_teradata

...

Specifies other arguments supported by the function as described in the 'Other Arguments' section.

Value

Function returns an object of class "td_explore_valib" which is a named list containing object of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator using names as follows:

  1. frequency.output

  2. histogram.output

  3. statistics.output

  4. values.output

Other Arguments

columns

Optional Argument.
Specifies the name(s) of the column(s) to analyze.
Types: character OR vector of Strings (character)

bins

Optional Argument.
Specifies the number of equal width bins to create for Histogram analysis.
Default Value: 10
Types: integer

bin.style

Optional Argument.
Specifies the bin style for Histogram analysis.
Permitted Values: 'bins', 'quantiles'
Default Value: 'bins'
Types: character

max.comb.values

Optional Argument.
Specifies the maximum number of combined values for frequency or histogram analysis.
Default Value: 10000
Types: integer

max.unique.char.values

Optional Argument.
Specifies the maximum number of unique character values for unrestricted frequency analysis.
Default Value: 100
Types: integer

max.unique.num.values

Optional Argument.
Specifies the maximum number of unique date or numeric values for frequency analysis.
Default Value: 20
Types: integer

min.comb.rows

Optional Argument.
Specifies the minimum number of rows before frequency or histogram combining attempted.
Default Value: 25000
Types: integer

restrict.freq

Optional Argument.
Specifies the restricted frequency processing including prominent values.
Default Value: TRUE
Types: logical

restrict.threshold

Optional Argument.
Specifies the minimum percentage of rows a value must occur in, for inclusion in results.
Default Value: 1
Types: integer

statistical.method

Optional Argument.
Specifies the method for calculating the statistics.
Permitted Values: 'population', 'sample'
Default Value: 'population'
Types: character

stats.options

Optional Argument.
Specifies the basic statistics to be calculated for the Statistics analysis.
Permitted Values:

  • all

  • count (cnt)

  • minimum (min)

  • maximum (max)

  • mean

  • standarddeviation (std)

  • skewness (skew)

  • kurtosis (kurt)

  • standarderror (ste)

  • coefficientofvariance (cv)

  • variance (var)

  • sum

  • uncorrectedsumofsquares (uss)

  • correctedsumofsquares (css)

Types: character OR vector of Strings (character)

distinct

Optional Argument.
Specifies the unique values count for each selected column when this argument is set to TRUE.
Default Value: FALSE
Types: logical

filter

Optional Argument.
Specifies the clause to filter rows selected for data exploration.
For example,
filter = "cust_id > 0" Types: character

Examples


# Notes:
#   1. To execute Vantage Analytic Library functions, set option 
#      'val.install.location' to the database name where Vantage analytic 
#      library functions are installed.
#   2. Datasets used in these examples can be loaded using Vantage Analytic 
#      Library installer.

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Get remote data source connection.
con <- td_get_context()$connection

# Create an object of class "tbl_teradata".
cust <- tbl(con, "customer_analysis")
print(cust)

# Example 1: Shows data exploration with default values.
obj <- td_explore_valib(data=cust)

# Print the frequency results.
print(obj$frequency.output)

# Print the histogram results.
print(obj$histogram.output) 

# Print the statistics results. 
print(obj$statistics.output) 

# Print the values results. 
print(obj$values.output)