- OutputTable
- [Optional] Specify the name of the output table.
- K
- Specify the number of nearest neighbors to use for classifying the test data.
- ResponseColumn
- Specify the name of the training table column that contains the class label or classification of the classified data objects.
- IDColumn
- Specify the name of the testing table column that uniquely identifies a data object.
- DistanceFeatures
- Specify the names of the training table columns that the function uses to compute the distance between a test object and the training objects. The test table must also have these columns.
- VotingWeight
- [Optional] Specify the voting weight of the distance between a test object and the training objects. The voting_weight must be a nonnegative integer.
- CustomizedDistance
- [Optional] Specify the distance function. The parameter jar is the name of the JAR file that contains the distance metric class. The parameter distance_class is the distance metric class defined in the jar file. This JAR file must be installed on the ML Engine.
- ForceMapReduce
- [Optional] Specify whether to partition the training data. If you specify 'true', the KNN function partitions the training data and uses the map-and reduce function.
- PartitionBlockSize
- [Optional] Specify the partition block size to use with ForceMapReduce ('true'). Specifying an optimal value for this argument may improve performance. The optimal value depends on the size of the training data and the vworker configuration. Because rows in a partition are processed together, a higher value improves performance, but the maximum value is limited by the memory of the vworker. For example, if the training data set has 1024 rows, specifying PartitionBlockSize('16') partitions the input data into 64 partitions of 16 rows each. Similarly, PartitionBlockSize('128') creates 8 (1024/128) partitions of 128 rows each. The partitions are distributed evenly across the number of vworkers available.