Teradata Package for R Function Reference | 17.00 - KNN - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

KNN

Description

The KNN function uses training data objects to map test data objects to categories. The function is optimized for both small and large training sets. The function supports user-defined distance metrics and distance-weighted voting.

Usage

  td_knn_mle (
      train = NULL,
      test = NULL,
      k = NULL,
      response.column = NULL,
      id.column = NULL,
      distance.features = NULL,
      voting.weight = 0,
      customized.distance = NULL,
      force.mapreduce = FALSE,
      parblock.size = NULL,
      partition.key = NULL,
      accumulate = NULL,
      output.prob = FALSE,
      train.sequence.column = NULL,
      test.sequence.column = NULL
  )

Arguments

`train`	Required Argument. Specifies the name of the tbl_teradata that contains the training data. Each row represents a classified data object.
`test`	Required Argument. Specifies the name of the tbl_teradata that contains the test data to be classified by the `td_knn_mle` function. Each row represents a test data object.
`k`	Required Argument. Specifies the number of nearest neighbors to use for classifying the test data. Types: integer
`response.column`	Required Argument. Specifies the name of the training tbl_teradata column that contains the class label or classification of the classified data objects. Types: character
`id.column`	Required Argument. Specifies the name of the testing tbl_teradata column that uniquely identifies a data object. Types: character
`distance.features`	Required Argument. Specifies the names of the training tbl_teradata columns that the function uses to compute the distance between a test object and the training objects. The test tbl_teradata must also have these columns. Types: character OR vector of Strings (character)
`voting.weight`	Optional Argument. Specifies the voting weight of the distance between a test object and the training objects. The voting_weight must be a non-negative integer. The function calculates distance-weighted voting, w, with this equation: w = 1/POWER(distance, voting_weight) Where distance is the distance between the test object and the training object. Default Value: 0 Types: numeric
`customized.distance`	Optional Argument. Specifies the distance function. The first value of the parameter is the name of the JAR file that contains the distance metric class. The second value is the distance metric class defined in the JAR file. For details on how to install a JAR file, see Teradata Vantage user guide. The default distance function is Euclidean distance. Types: character OR vector of characters
`force.mapreduce`	Optional Argument. Specifies whether to partition the training data. This causes the `td_knn_mle` function to load all training data into memory and use only the row function. If you specify TRUE, the `td_knn_mle` function partitions the training data and uses the map and reduce function. Default Value: FALSE Types: logical
`parblock.size`	Optional Argument. Specifies the partition block size to use with force.mapreduce (TRUE). The recommended value depends on training data size and number of vworkers. For example, if your training data size is 10 billion and you have 10 vworkers, the recommended parblock.size is 1/n billion, where n is an integer that corresponds to your vworker nodes memory. Omitting this argument or specifying an inappropriate value for argument "parblock.size" can degrade performance. Types: integer
`partition.key`	Optional Argument. Specifies the name of the training tbl_teradata column that partitions data in parallel model. The default value is the first column of "distance.features" argument. Types: character
`accumulate`	Optional Argument. Specifies the names of test tbl_teradata columns to copy to the output tbl_teradata. Note: This argument is supported when tdplyr is connected to Vantage 1.1 or later versions. Types: character OR vector of Strings (character)
`output.prob`	Optional Argument. Specifies whether to display output probability for the predicted category. Note: This argument is supported when tdplyr is connected to Vantage 1.1 or later versions. Default Value: FALSE Types: logical
`train.sequence.column`	Optional Argument. Specifies the vector of column(s) that uniquely identifies each row of the input argument "train". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: character OR vector of Strings (character)
`test.sequence.column`	Optional Argument. Specifies the vector of column(s) that uniquely identifies each row of the input argument "test". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: character OR vector of Strings (character)

Value

Function returns an object of class "td_knn_mle" which is a named list containing objects of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator using the following names:

output.table
output

Examples

  
    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Load example data.
    loadExampleData("knn_example", "computers_train1_clustered", "computers_test1")

    # Both "computers_train1_clustered" tb_teradata and "computers_test1" tbl_teradata
    # contains five attributes of personal computers-price, speed, hard disk
    # size, RAM, and screen size.
    computers_train1_clustered <- tbl(con, "computers_train1_clustered")
    computers_test1 <- tbl(con, "computers_test1")

    # Example 1: Map the test computer data to their respective categories.
    td_knn_out <- td_knn_mle(train = computers_train1_clustered,
                             test = computers_test1,
                             k = 50,
                             response.column = "computer_category",
                             id.column = "id",
                             distance.features = c("price","speed","hd","ram","screen"),
                             voting.weight = 1
                             )