Arguments - Aster Analytics

Teradata Aster Analytics Foundation User Guide

Product
Aster Analytics
Release Number
6.21
Published
November 2016
Language
English (United States)
Last Update
2018-04-14
dita:mapPath
kiu1466024880662.ditamap
dita:ditavalPath
AA-notempfilter_pdf_output.ditaval
dita:id
B700-1021
lifecycle
previous
Product Category
Software
Argument Category Description
TrainingTable Required Specifies the name of the table that contains the training data. Each row represents a classified data object.
TestTable Required Specifies the name of the table that contains the test data to be classified by the kNN algorithm. Each row represents a test data object.
K Required Specifies the number of nearest neighbors to use for classifying the test data.
ResponseColumn Required Specifies the name of the training table column that contains the class label or classification of the classified data objects.
IDColumn Required Specifies the name of the testing table column that uniquely identifies a data object.
DistanceFeatures Required Specifies the names of the training table columns that the function uses to compute the distance between a test object and the training objects. The test table must also have these columns.
VotingWeight Optional Specifies the voting weight of the distance between a test object and the training objects. The voting_weight must be a nonnegative integer. The default value is 0.

The function calculates distance-weighted voting, w, with this equation:

w = 1/POWER(distance, voting_weight)

Where distance is the distance between the test object and the training object.

OutputTable Optional Specifies the name of the output table. By default, the function displays the output to the console.
CustomizedDistance Optional Specifies the distance function. The parameter jar is the name of the JAR file that contains the distance metric class. The parameter distance_class is the distance metric class defined in the jar file. The KNN function installs the JAR file on the Aster Database server. The default distance function is Euclidean distance.
ForceMapreduce Optional Specifies whether to partition the training data. The default value is 'false', which causes the KNN function to load all training data into memory and use only the row function. If you specify 'true', the KNN function partitions the training data and uses the map-and reduce function.
Partition_Block_Size Optional Specifies the partition block size to use with ForceMapreduce ('true'). Specifying an optimal value for this argument may improve performance. The optimal value depends on the size of the training data and the vworker configuration. Because rows in a partition are processed together, a higher value improves performance, but the maximum value is limited by the memory of the vworker. For example, if the training data set has 1024 rows, specifying PartitionBlockSize('16') partitions the input data into 64 partitions of 16 rows each. Similarly, PartitionBlockSize('128') creates 8 (1024/128) partitions of 128 rows each. The partitions are distributed evenly across the number of vworkers available.