Description
The function generates PCA scores using the model created by td_pca_valib()
. The
scoring process expresses each component as a linear combination of the input columns.
The result output tbl_teradata contains one or more index (key) columns and PCA score
columns, one for each component.
When PCA analysis was based on a correlation matrix, scoring input data is normalized
by subtracting the mean and dividing by the standard deviation. If multiple factor models
were built by means of one or more group by columns, the resulting score tbl_teradata
includes these columns and score the grouped input columns accordingly.
Usage
td_pca_predict_valib(model, data, ...)
Arguments
model |
Required Argument. |
data |
Required Argument. |
... |
Specifies other arguments supported by the function as described in the 'Other Arguments' section. |
Value
Function returns an object of class "td_pca_predict_valib" which is a named list containing object of class "tbl_teradata".cr Named list member can be referenced directly with the "$" operator using name: result.
Other Arguments
index.columns
Optional Argument.
Specifies one or more different columns for the primary index of
the result output tbl_teradata. By default, the primary index
columns of the result output tbl_teradata are the primary index
columns of the input tbl_teradata "data". In addition, the columns
specified in this argument need to form a unique key for the result
output tbl_teradata. Otherwise, there are more than one score for
a given observation.
Types: character OR vector of Strings (character)
accumulate
Optional Argument.
Specifies one or more columns from the "data" tbl_teradata that can
be passed to the result output tbl_teradata.
Types: character OR vector of Strings (character)
Examples
# Notes:
# 1. To execute Vantage Analytic Library functions, set option 'val.install.location' to
# the database name where Vantage analytic library functions are installed.
# 2. Datasets used in these examples can be loaded using Vantage Analytic Library installer.
# Set the option 'val.install.location'.
options(val.install.location = "SYSLIB")
# Get remote data source connection.
con <- td_get_context()$connection
# Create an object of class "tbl_teradata".
df <- tbl(con, "customer")
print(df)
# Run PCA() on columns "age", "income", "years_with_bank" and "nbr_children".
pca_obj <- td_pca_valib(data=df,
columns=c("age", "years_with_bank", "nbr_children", "income"))
# Get PCA scores using the model generated above.
obj <- td_pca_predict_valib(data=df,
model=pca_obj$result,
index.columns="cust_id",
accumulate=c("age", "years_with_bank", "nbr_children"))
# Print the results.
print(obj$result)
# Score using S3 predict function and the model generated above.
obj <- predict(pca_obj,
data=df,
index.columns="cust_id",
accumulate=c("age", "years_with_bank", "nbr_children"))
# Print the results.
print(obj$result)