| |
Methods defined here:
- __init__(self, data=None, centers=None, iter_max=10, initial_seeds=None, seed=None, unpack_columns=False, centroids_table=None, threshold=0.0395, data_sequence_column=None, centroids_table_sequence_column=None)
- DESCRIPTION:
The KMeans function takes a data set and outputs the centroids of its
clusters and, optionally, the clusters themselves.
PARAMETERS:
data:
Required Argument.
Specifies the input teradataml DataFrame containing the list of
features by which we are clustering the data.
centers:
Optional Argument.
Specifies the number of clusters to generate from the data.
Note: With centers, the function uses a nondeterministic
algorithm and the function supports up to 1543 dimensions.
Types: int
iter_max:
Optional Argument.
Specifies the maximum number of iterations that the algorithm runs
before quitting if the convergence threshold has not been met.
Default Value: 10
Types: int
initial_seeds:
Optional Argument.
Specifies the initial seed means as strings of underscore-delimited
float values. For example, this clause initializes eight clusters in
eight-dimensional space: Means("50_50_50_50_50_50_50_50",
"150_150_150_150_150_150_150_150", "250_250_250_250_250_250_250_250",
"350_350_350_350_350_350_350_350", "450_450_450_450_450_450_450_450",
"550_550_550_550_550_550_550_550", "650_650_650_650_650_650_650_650",
"750_750_750_750_750_750_750_750") The dimensionality of the means
must match the dimensionality of the data (that is, each mean must
have n numbers in it, where n is the number of input columns minus
one). By default, the algorithm chooses the initial seed means
randomly.
Note: With initial_seeds, the function uses a deterministic
algorithm and the function supports up to 1596 dimensions.
Types: str OR list of Strings (str)
seed:
Optional Argument.
Sets a random seed for the algorithm.
Types: int
unpack_columns:
Optional Argument.
Specifies whether the means for each centroid appear unpacked (that
is, in separate columns) in output DataFrame clusters_centroids.
By default, the function concatenates the means for the centroids
and outputs the result in a single VARCHAR column.
Default Value: False
Types: bool
centroids_table:
Optional Argument.
Specifies the teradataml DataFrame that contains the initial seed
means for the clusters. The schema of the centroids teradataml
DataFrame depends on the value of the unpack_columns argument.
Note: With centroids_table, the function uses a deterministic
algorithm and the function supports up to 1596 dimensions.
threshold:
Optional Argument.
Specifies the convergence threshold. When the centroids move by less
than this amount, the algorithm has converged.
Default Value: 0.0395
Types: float
data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
centroids_table_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "centroids_table". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
RETURNS:
Instance of KMeans.
Output teradataml DataFrames can be accessed using attribute
references, such as KMeansObj.<attribute_name>.
Output teradataml DataFrame attribute names are:
1. clusters_centroids
2. clustered_output
3. output
RAISES:
TeradataMlException
EXAMPLES:
# Load the data to run the example.
load_example_data("KMeans","computers_train1")
# Create teradataml Dataframe.
computers_train1 = DataFrame.from_table("computers_train1")
# Example 1 -
kmeans_out = KMeans(data=computers_train1,
initial_seeds=['2249_51_408_8_14','2165_51_398_7_14.6','2182_51_404_7_14.6','2204_55_372_7.19_14.6','2419_44_222_6.6_14.3','2394_44.3_277_7.3_14.5','2326_43.6_301_7.11_14.3','2288_44_325_7_14.4'],
centers=8,
threshold=0.0395,
iter_max=10,
unpack_columns=False,
seed=10,
data_sequence_column='id'
)
# Print the result DataFrame
print(kmeans_out.clusters_centroids)
print(kmeans_out.clustered_output)
print(kmeans_out.output)
- __repr__(self)
- Returns the string representation for a KMeans class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|