KNN Arguments - Aster Analytics

Teradata Aster® Analytics Foundation User GuideUpdate 2

Product

Aster Analytics

Release Number

7.00.02

Published

September 2017

Language

English (United States)

Last Update

2018-04-17

dita:mapPath

uce1497542673292.ditamap

dita:ditavalPath

AA-notempfilter_pdf_output.ditaval

dita:id

B700-1022

lifecycle

Product Category

Software

TrainingTable

Specifies the name of the table that contains the training data. Each row represents a classified data object.

TestTable

Specifies the name of the table that contains the test data to be classified by the kNN algorithm. Each row represents a test data object.

K

Specifies the number of nearest neighbors to use for classifying the test data.

ResponseColumn

Specifies the name of the training table column that contains the class label or classification of the classified data objects.

IDColumn

Specifies the name of the testing table column that uniquely identifies a data object.

DistanceFeatures

Specifies the names of the training table columns that the function uses to compute the distance between a test object and the training objects. The test table must also have these columns.

VotingWeight

[Optional] Specifies the voting weight of the distance between a test object and the training objects. The voting_weight must be a nonnegative integer. Default: 0.

The function calculates distance-weighted voting, w, with this equation:

w = 1/POWER(distance, voting_weight)

Where distance is the distance between the test object and the training object.

OutputTable

[Optional] Specifies the name of the output table. Default behavior: The function displays the output to the console.

CustomizedDistance

[Optional] Specifies the distance function. The parameter jar is the name of the JAR file that contains the distance metric class. The parameter distance_class is the distance metric class defined in the jar file. This JAR file must be installed on the Aster Database. Default: Euclidean distance.

ForceMapreduce

[Optional] Specifies whether to partition the training data. Default: 'false', which causes the KNN function to load all training data into memory and use only the row function. If you specify 'true', the KNN function partitions the training data and uses the map-and reduce function.

Partition_Block_Size

[Optional] Specifies the partition block size to use with ForceMapreduce ('true'). Specifying an optimal value for this argument may improve performance. The optimal value depends on the size of the training data and the vworker configuration. Because rows in a partition are processed together, a higher value improves performance, but the maximum value is limited by the memory of the vworker. For example, if the training data set has 1024 rows, specifying PartitionBlockSize('16') partitions the input data into 64 partitions of 16 rows each. Similarly, PartitionBlockSize('128') creates 8 (1024/128) partitions of 128 rows each. The partitions are distributed evenly across the number of vworkers available.