1.0 - 8.00 - KNN Arguments - Teradata Vantage

Teradata® Vantage Machine Learning Engine Analytic Function Reference

Teradata Vantage
Release Number
Release Date
May 2019
Content Type
Programming Reference
Publication ID
English (United States)
[Optional] Specify the name of the output table.
Default behavior: The function displays the output to the screen.
Specify the number of nearest neighbors to use for classifying the test data.
Specify the name of the training table column that contains the class label or classification of the classified data objects.
Specify the name of the testing table column that uniquely identifies a data object.
Specify the names of the training table columns that the function uses to compute the distance between a test object and the training objects. The test table must also have these columns.
[Optional] Specify the voting weight of the distance between a test object and the training objects. The voting_weight must be a nonnegative integer.
The function calculates distance-weighted voting, w, with this equation:

w = 1/POWER(distance, voting_weight)

Where distance is the distance between the test object and the training object.
Default: 0
[Optional] Specify the distance function. The parameter jar is the name of the JAR file that contains the distance metric class. The parameter distance_class is the distance metric class defined in the jar file. This JAR file must be installed on the ML Engine.
Default: Euclidean distance
The ML Engine does not support the creation of new customized distance classes. However, it does support existing JAR files—for installation instructions, see Teradata Vantage™ User Guide, B700-4002.
[Optional] Specify whether to partition the training data. If you specify 'true', the KNN function partitions the training data and uses the map-and reduce function.
Default: 'false' (The function loads all training data into memory and uses only the row function.)
[Optional] Specify the partition block size to use with ForceMapReduce ('true'). Specifying an optimal value for this argument may improve performance. The optimal value depends on the size of the training data and the vworker configuration. Because rows in a partition are processed together, a higher value improves performance, but the maximum value is limited by the memory of the vworker. For example, if the training data set has 1024 rows, specifying PartitionBlockSize('16') partitions the input data into 64 partitions of 16 rows each. Similarly, PartitionBlockSize('128') creates 8 (1024/128) partitions of 128 rows each. The partitions are distributed evenly across the number of vworkers available.