Teradata Package for Python Function Reference | 17.10 - KNN - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.analytics.mle.KNN = class KNN(builtins.object)

Methods defined here:

__init__(self, train=None, test=None, k=None, response_column=None, id_column=None, distance_features=None, voting_weight=0.0, customized_distance=None, force_mapreduce=False, parblock_size=None, partition_key=None, accumulate=None, output_prob=False, train_sequence_column=None, test_sequence_column=None, test_block_size=None, output_responses=None): DESCRIPTION: The KNN function uses training data objects to map test data objects to categories. The function is optimized for both small and large training sets. The function supports user-defined distance metrics and distance-weighted voting. PARAMETERS: train: Required Argument. Specifies the name of the teradataml DataFrame that contains the training data. Each row represents a classified data object. test: Required Argument. Specifies the name of the teradataml DataFrame that contains the test data to be classified by the KNN algorithm. Each row represents a test data object. k: Required Argument. Specifies the number of nearest neighbors to use for classifying the test data. Types: int response_column: Required Argument. Specifies the name of the training teradataml DataFrame column that contains the class label or classification of the classified data objects. Types: str id_column: Required Argument. Specifies the name of the testing teradataml DataFrame column that uniquely identifies a data object. Types: str distance_features: Required Argument. Specifies the names of the training teradataml DataFrame columns that the function uses to compute the distance between a test object and the training objects. The test teradataml DataFrame must also have these columns. Types: str OR list of Strings (str) voting_weight: Optional Argument. Specifies the voting weight of the distance between a test object and the training objects. The voting_weight must be a nonnegative integer. The function calculates distance-weighted voting, w, with this equation: w = 1/POWER(distance, voting_weight) Where distance is the distance between the test object and the training object. Default Value: 0.0 Types: float customized_distance: Optional Argument. This argument is currently not supported. force_mapreduce: Optional Argument. Specifies whether to partition the training data. which causes the KNN function to load all training data into memory and use only the row function. If you specify True, the KNN function partitions the training data and uses the map-and reduce function. Default Value: False Types: bool parblock_size: Optional Argument. Specifies the partition block size to use with force_mapreduce (True). The recommended value depends on training data size and number of vworkers. For example, if your training data size is 10 billion and you have 10 vworkers, the recommended, partition_block_size is 1/n billion, where n is an integer that corresponds to your vworker nodes memory. Omitting this argument or specifying an inappropriate partition_block_size can degrade performance. Types: int partition_key: Optional Argument. Specifies the name of the training teradataml DataFrame column that partition data in parallel model. The default value is the first column of distance_features. Note: "partition_key" argument support is only available when teradataml is connected to Vantage 1.0 Maintenance Update 2 version or later. Types: str accumulate: Optional Argument. Specifies the names of test teradataml DataFrame columns to copy to the output teradataml DataFrame. Note: "accumulate" argument support is only available when teradataml is connected to Vantage 1.1 or later. Types: str OR list of Strings (str) output_prob: Optional Argument. Specifies whether to display output probability for the predicted category. Note: "output_prob" argument support is only available when teradataml is connected to Vantage 1.1 or later. Default Value: False Types: bool train_sequence_column: Optional Argument, Required if 'partition_key' is specified. Specifies the list of column(s) that uniquely identifies each row of the input argument "train". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) test_sequence_column: Optional Argument, Required if 'partition_key' is specified. Specifies the list of column(s) that uniquely identifies each row of the input argument "test". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) test_block_size: Optional with when "force_mapreduce" is 'True', disallowed otherwise. Specifies the partition block size of testing data to use when "force_mapreduce" set to 'True'. Omitting this argument will start to estimate the value automatically. Specifying an inappropriate 'test_block_size' can degrade performance. Note: "test_block_size" argument support is only available when teradataml is connected to Vantage 1.3. Types: int output_responses: Optional when "output_prob" is 'True', disallowed otherwise. Specify 'response_column' for which to output probability. If you specify output_prob=True and omit 'response_column', the function adds the column prob to the output teradataml DataFrame. If you set "output_prob" to 'True' and specify 'response_column', then the function adds the specified response columns to the output table Dataframe Note: "output_responses" argument support is only available when teradataml is connected to Vantage 1.3. Types: str OR list of strs RETURNS: Instance of KNN. Output teradataml DataFrames can be accessed using attribute references, such as KNNObj.<attribute_name>. Output teradataml DataFrame attribute name is: 1. output_table 2. output RAISES: TeradataMlException EXAMPLES: # Load the data to run the example load_example_data("knn", ["computers_train1_clustered","computers_test1"]) # Create teradataml DataFrame objects. # The "computers_train1_clustered" and "computers_test1" remote tables # contains five attributes of personal computers price, speed, hard disk # size, RAM, and screen size. computers_train1_clustered = DataFrame.from_table("computers_train1_clustered") computers_test1 = DataFrame.from_table("computers_test1") # Example 1 - Map the test computer data to their respective categories knn_out = KNN(train = computers_train1_clustered, test = computers_test1, k = 50, response_column = "computer_category", id_column = "id", distance_features = ["price","speed","hd","ram","screen"], voting_weight = 1.0 ) # Print the result DataFrame print(knn_out)

__repr__(self): Returns the string representation for a KNN class instance.

get_build_time(self): Function to return the build time of the algorithm in seconds. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_prediction_type(self): Function to return the Prediction type of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

get_target_column(self): Function to return the Target Column of the algorithm. When model object is created using retrieve_model(), then the value returned is as saved in the Model Catalog.

show_query(self): Function to return the underlying SQL query. When model object is created using retrieve_model(), then None is returned.