Teradata Python Package Function Reference - KMeans - Teradata Python Package - Look here for syntax, methods and examples for the functions included in the Teradata Python Package.

teradataml.analytics.mle.KMeans = class KMeans(builtins.object)

Methods defined here:

__init__(self, data=None, centers=None, iter_max=10, initial_seeds=None, seed=None, unpack_columns=False, centroids_table=None, threshold=0.0395, data_sequence_column=None, centroids_table_sequence_column=None): DESCRIPTION: The KMeans function takes a data set and outputs the centroids of its clusters and, optionally, the clusters themselves. PARAMETERS: data: Required Argument. Specifies the input teradataml DataFrame containing the list of features by which we are clustering the data. centers: Optional Argument. Specifies the number of clusters to generate from the data. Note: With centers, the function uses a nondeterministic algorithm and the function supports up to 1543 dimensions. Types: int iter_max: Optional Argument. Specifies the maximum number of iterations that the algorithm runs before quitting if the convergence threshold has not been met. Default Value: 10 Types: int initial_seeds: Optional Argument. Specifies the initial seed means as strings of underscore-delimited float values. For example, this clause initializes eight clusters in eight-dimensional space: Means("50_50_50_50_50_50_50_50", "150_150_150_150_150_150_150_150", "250_250_250_250_250_250_250_250", "350_350_350_350_350_350_350_350", "450_450_450_450_450_450_450_450", "550_550_550_550_550_550_550_550", "650_650_650_650_650_650_650_650", "750_750_750_750_750_750_750_750") The dimensionality of the means must match the dimensionality of the data (that is, each mean must have n numbers in it, where n is the number of input columns minus one). By default, the algorithm chooses the initial seed means randomly. Note: With initial_seeds, the function uses a deterministic algorithm and the function supports up to 1596 dimensions. Types: str OR list of Strings (str) seed: Optional Argument. Sets a random seed for the algorithm. Types: int unpack_columns: Optional Argument. Specifies whether the means for each centroid appear unpacked (that is, in separate columns) in output DataFrame clusters_centroids. By default, the function concatenates the means for the centroids and outputs the result in a single VARCHAR column. Default Value: False Types: bool centroids_table: Optional Argument. Specifies the teradataml DataFrame that contains the initial seed means for the clusters. The schema of the centroids teradataml DataFrame depends on the value of the unpack_columns argument. Note: With centroids_table, the function uses a deterministic algorithm and the function supports up to 1596 dimensions. threshold: Optional Argument. Specifies the convergence threshold. When the centroids move by less than this amount, the algorithm has converged. Default Value: 0.0395 Types: float data_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "data". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) centroids_table_sequence_column: Optional Argument. Specifies the list of column(s) that uniquely identifies each row of the input argument "centroids_table". The argument is used to ensure deterministic results for functions which produce results that vary from run to run. Types: str OR list of Strings (str) RETURNS: Instance of KMeans. Output teradataml DataFrames can be accessed using attribute references, such as KMeansObj.<attribute_name>. Output teradataml DataFrame attribute names are: 1. clusters_centroids 2. clustered_output 3. output RAISES: TeradataMlException EXAMPLES: # Load the data to run the example. load_example_data("KMeans","computers_train1") # Create teradataml Dataframe. computers_train1 = DataFrame.from_table("computers_train1") # Example 1 - kmeans_out = KMeans(data=computers_train1, initial_seeds=['2249_51_408_8_14','2165_51_398_7_14.6','2182_51_404_7_14.6','2204_55_372_7.19_14.6','2419_44_222_6.6_14.3','2394_44.3_277_7.3_14.5','2326_43.6_301_7.11_14.3','2288_44_325_7_14.4'], centers=8, threshold=0.0395, iter_max=10, unpack_columns=False, seed=10, data_sequence_column='id' ) # Print the result DataFrame print(kmeans_out.clusters_centroids) print(kmeans_out.clustered_output) print(kmeans_out.output)

__repr__(self): Returns the string representation for a KMeans class instance.