Teradata Package for R Function Reference | 17.20 - KMeansPredict - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
ft:locale
en-US
ft:lastEdition
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
Product Category
Teradata Vantage

KMeansPredict

Description

The td_kmeans_predict_sqle() function uses the cluster centroids in the td_kmeans_sqle() function output to assign the input data points to the cluster centroids.

Notes:

  • This function requires the UTF8 client character set for UNICODE data.

  • This function does not support Pass Through Characters (PTCs).

  • For information about PTCs, see Teradata Vantage™ - Analytics Database International Character Set Support.

  • This function does not support KanjiSJIS or Graphic data types.

Usage

  td_kmeans_predict_sqle (
      data = NULL,
      object = NULL,
      accumulate = NULL,
      output.distance = FALSE,
      ...
  )

Arguments

data

Required Argument.
Specifies the input tbl_teradata.
Types: tbl_teradata

object

Required Argument.
Specifies the tbl_teradata generated by
td_kmeans_sqle() function or the instance of td_kmeans_sqle.
Types: tbl_teradata or instance of td_kmeans_sqle

accumulate

Optional Argument.
Specifies the name(s) of input tbl_teradata column(s) to copy to the
output. By default, the function copies no input columns to the output.
Types: character OR vector of Strings (character)

output.distance

Optional Argument.
Specifies whether to return the distance between
each data point and the nearest cluster.
Default Value: FALSE
Types: logical

...

Specifies the generic keyword arguments SQLE functions accept. Below
are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the
function in a table or not. When set to TRUE, results are persisted in a table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the
function in a volatile table or not. When set to TRUE, results are stored in a volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.hash.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.order.column" accepts character or vector of character (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:
These generic arguments are supported by tdplyr if the underlying SQL Engine function supports, else an exception is raised.

Value

Function returns an object of class "td_kmeans_predict_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):

  1. result

Examples

  
    
    # Get the current context/connection.
    con <- td_get_context()$connection
    
    # Load the example data.
    loadExampleData("kmeans_example", "computers_train1")
    
    # Create tbl_teradata object.
    computers_train1 <- tbl(con, "computers_train1")
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Grouping a set of observations into 2 clusters in which
    # each observation belongs to the cluster with the nearest mean.
    KMeans_out <- td_kmeans_sqle(
                    id.column="id",
                    target.columns=c('price', 'speed'),
                    data=computers_train1,
                    num.clusters=2)
    
    # Print the result tbl_teradata objects.
    print(KMeans_out$result)
    print(KMeans_out$model.data)
    
    # Example 1 : Assign the input data points to the cluster centroid
    #             using the model generated by the td_kmeans_sqle() function.
    #             Note that tbl_teradata representing the model
    #             is passed as input to "object".
    KMeansPredict_out <- td_kmeans_predict_sqle(object=KMeans_out$result,
                                                data=computers_train1)
    
    # Print the result tbl_teradata objects.
    print(KMeansPredict_out$result)
    
    # Example 2 : Assign the input data points to the cluster centroid
    #             using the model generated by the td_kmeans_sqle() function.
    #             Note that model is passed as  instance of  td_kmeans to "object".
    KMeansPredict_out_1 <- td_kmeans_predict_sqle(
                            data=computers_train1,
                            object=KMeans_out,
                            accumulate="ram",
                            output.distance=FALSE)
    
    # Print the result tbl_teradata objects.
    print(KMeansPredict_out_1$result)
    
    # Alternatively use S3 predict function to run predict on the output of
    # td_kmeans_sqle() function.
    
    KMeansPredict_out_1 <- predict(
                             KMeans_out,
                             data=computers_train1,
                             accumulate="ram",
                             output.distance=FALSE)
    
    # Print the result tbl_teradata objects.
    print(KMeansPredict_out_1$result)