Teradata Package for R Function Reference | 17.20 - PCA Predict - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
Language
English (United States)
Last Update
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
Product Category
Teradata Vantage
Factor Analysis Scoring

Description

The function generates PCA scores using the model created by td_pca_valib(). The scoring process expresses each component as a linear combination of the input columns. The result output tbl_teradata contains one or more index (key) columns and PCA score columns, one for each component.

When PCA analysis was based on a correlation matrix, scoring input data is normalized by subtracting the mean and dividing by the standard deviation. If multiple factor models were built by means of one or more group by columns, the resulting score tbl_teradata includes these columns and score the grouped input columns accordingly.

Usage

td_pca_predict_valib(model, data, ...)

Arguments

model

Required Argument.
Specifies the input containing the PCA model to use in scoring. This must be the "result" tbl_teradata generated by td_pca_valib() or a tbl_teradata created on a table generated by 'factor' function from Vantage Analytic Library.
Types: tbl_teradata

data

Required Argument.
Specifies the input data containing the columns to get PCA scores.
Types: tbl_teradata

...

Specifies other arguments supported by the function as described in the 'Other Arguments' section.

Value

Function returns an object of class "td_pca_predict_valib" which is a named list containing object of class "tbl_teradata".cr Named list member can be referenced directly with the "$" operator using name: result.

Other Arguments

index.columns

Optional Argument.
Specifies one or more different columns for the primary index of the result output tbl_teradata. By default, the primary index columns of the result output tbl_teradata are the primary index columns of the input tbl_teradata "data". In addition, the columns specified in this argument need to form a unique key for the result output tbl_teradata. Otherwise, there are more than one score for a given observation.
Types: character OR vector of Strings (character)

accumulate

Optional Argument.
Specifies one or more columns from the "data" tbl_teradata that can be passed to the result output tbl_teradata.
Types: character OR vector of Strings (character)

Examples


# Notes:
#   1. To execute Vantage Analytic Library functions, set option 'val.install.location' to
#      the database name where Vantage analytic library functions are installed.
#   2. Datasets used in these examples can be loaded using Vantage Analytic Library installer.

# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")

# Get remote data source connection.
con <- td_get_context()$connection

# Create an object of class "tbl_teradata".
df <- tbl(con, "customer")
print(df)

# Run PCA() on columns "age", "income", "years_with_bank" and "nbr_children".
pca_obj <- td_pca_valib(data=df,
                        columns=c("age", "years_with_bank", "nbr_children", "income"))
                        
# Get PCA scores using the model generated above.
obj <- td_pca_predict_valib(data=df,
                            model=pca_obj$result,
                            index.columns="cust_id",
                            accumulate=c("age", "years_with_bank", "nbr_children"))

# Print the results.
print(obj$result)

# Score using S3 predict function and the model generated above.
obj <- predict(pca_obj, 
               data=df,
               index.columns="cust_id",
               accumulate=c("age", "years_with_bank", "nbr_children"))

# Print the results.
print(obj$result)