Teradata Package for R Function Reference | 17.00 - 17.00 - td_pca_predict_valib - Teradata Package for R

Teradata® Package for R Function Reference

Product
Teradata Package for R
Release Number
17.00
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B700-4007-090K
Language
English (United States)

Description

The function generates PCA scores using the model created by td_pca_valib(). The scoring process expresses each component as a linear combination of the input columns. The result output tbl_teradata contains one or more index (key) columns and PCA score columns, one for each component.

When PCA analysis was based on a correlation matrix, scoring input data is normalized by subtracting the mean and dividing by the standard deviation. If multiple factor models were built by means of one or more group by columns, the resulting score tbl_teradata includes these columns and score the grouped input columns accordingly.

Usage

td_pca_predict_valib(model, data, ...)

Arguments

model

Required Argument.
Specifies the input containing the PCA model to use in scoring. This must be the "result" tbl_teradata generated by td_pca_valib() or a tbl_teradata created on a table generated by 'factor' function from Vantage Analytic Library.
Types: tbl_teradata

data

Required Argument.
Specifies the input data containing the columns to get PCA scores.
Types: tbl_teradata

...

Specifies other arguments supported by the function as described in the 'Other Arguments' section.

Value

Function returns an object of class "td_pca_predict_valib" which is a named list containing object of class "tbl_teradata".cr Named list member can be referenced directly with the "$" operator using name: result.

Other Arguments

index.columns

Optional Argument.
Specifies one or more different columns for the primary index of the result output tbl_teradata. By default, the primary index columns of the result output tbl_teradata are the primary index columns of the input tbl_teradata "data". In addition, the columns specified in this argument need to form a unique key for the result output tbl_teradata. Otherwise, there are more than one score for a given observation.
Types: character OR vector of Strings (character)

accumulate

Optional Argument.
Specifies one or more columns from the "data" tbl_teradata that can be passed to the result output tbl_teradata.
Types: character OR vector of Strings (character)

Examples

# Notes:
#   1. To execute Vantage Analytic Library functions, set option 'val.install.location' to
#      the database name where Vantage analytic library functions are installed.
#   2. Datasets used in these examples can be loaded using Vantage Analytic Library installer.

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Get remote data source connection.
con <- td_get_context()$connection

# Create an object of class "tbl_teradata".
df <- tbl(con, "customer")
print(df)

# Run PCA() on columns "age", "income", "years_with_bank" and "nbr_children".
pca_obj <- td_pca_valib(data=df,
                        columns=c("age", "years_with_bank", "nbr_children", "income"))
                        
# Get PCA scores using the model generated above.
obj <- td_pca_predict_valib(data=df,
                            model=pca_obj$result,
                            index.columns="cust_id",
                            accumulate=c("age", "years_with_bank", "nbr_children"))

# Print the results.
print(obj$result)

# Score using S3 predict function and the model generated above.
obj <- predict(pca_obj, 
               data=df,
               index.columns="cust_id",
               accumulate=c("age", "years_with_bank", "nbr_children"))

# Print the results.
print(obj$result)