| |
Methods defined here:
- __init__(self, vertices=None, model_key=None, sequence_key=None, observed_key=None, hidden_states_num=None, max_iter_num=10, epsilon=None, skip_column=None, init_methods=None, init_params=None, vertices_sequence_column=None, vertices_partition_column=None, vertices_order_column=None)
- DESCRIPTION:
The HMMUnsupervised function runs on the SQL-GR framework. The
function can produce multiple HMM models simultaneously, where each
model is learned from a set of sequences and each sequence
represents a vertex.
PARAMETERS:
vertices:
Required Argument.
Specifies the teradataml DataFrame containing the vertex data.
vertices_partition_column:
Required Argument.
Specifies Partition By columns for vertices.
Values to this argument can be provided as list, if multiple columns
are used for partition.
Note:
1. This argument must contain the name of the column specified in
'sequence_key' argument.
2. This argument should contain the name of the column specified in
'model_key', if 'model_key' argument is used, and it must be
the first column followed by the name of the column specified
in 'sequence_key'.
Types: str OR list of Strings (str)
vertices_order_column:
Required Argument.
Specifies Order By columns for vertices.
Values to this argument can be provided as list, if multiple columns
are used for ordering.
Note: This argument must contain the name of the column, containing
time ordered sequence, as one of its columns.
Types: str OR list of Strings (str)
model_key:
Optional Argument.
Specifies the name of the column that contains the model attribute.
The values in the column can be integers or strings.
Note: Note: The 'vertices_partition_column' argument should contain the name
of the column specified in this argument.
Types: str
sequence_key:
Required Argument.
Specifies the name of the column that contains the sequence attribute. The
sequence_key must be a sequence attribute in the
vertices_partition_column. A sequence (value in this column) must contain more
than two observation symbols. Each sequence represent a vertex.
Types: str
observed_key:
Required Argument.
Specifies the name of the column that contains the observed symbols. The
function scans the input teradataml DataFrame to find all possible
observed symbols.
Note: Observed symbols are case-sensitive.
Types: str
hidden_states_num:
Required Argument.
Specifies the number of hidden states.
Note: The number of hidden states can influence model quality and
performance, so choose the number appropriately.
Types: int
max_iter_num:
Optional Argument.
Specifies the number of iterations that the training process runs before the
function completes.
Default Value: 10
Types: int
epsilon:
Optional Argument.
Specifies the threshold value in determining the convergence of HMM training.
If the parameter value difference is less than the threshold, the
training process converges. There is no default value. If you do not
specify this argument, only max_iter_num determines when the training
process converges.
Types: float
skip_column:
Optional Argument.
Specifies the name of the column whose values determine whether the function
skips the row. The function skips the row if the value is "true",
"yes", "y", or "1". The function does not skip the row if the value
is "false", "f", "no", "n", "0", or NULL.
Types: str
init_methods:
Optional Argument.
Specifies the method that the function uses to generate the initial parameters
for the initial state probabilities, state transition probabilities,
and emission probabilities. Permitted values:
• random (default): The initial parameters are based on uniform
distribution.
• flat: The probabilities are equal. Each cell holds the same
probability in the matrix or vector.
• input: The function takes the initial parameters from the
init_params argument.
The names of these methods are case-insensitive.
The seed number is meaningful only when the specified method is random.
Types: str OR list of Strings (str)
init_params:
Optional Argument.
When init_methods has the value "input", this argument specifies the
initial parameters for the models. The first parameter specifies the
initial state probabilities, the second parameter specifies the state
transition probabilities, and the third parameter specifies the
emission probabilities.
For example, if the hidden_states_num argument specifies three hidden
states and two observed symbols ("yes" and "no"), then the init_params
values are:
• init_state_probability_vector (the initial state probabilities):
"0.3333333333 0.3333333333 0.3333333333",
• state_transition_probability_matrix (the state transition probabilities):
"0.3333333333 0.3333333333 0.3333333333; 0.3333333333
0.3333333333 0.3333333333; 0.3333333333 0.3333333333 0.3333333333",
• observation_emission_probability_matrix (the emission probabilities):
"no:0.25 yes:0.75; no:0.35 yes:0.65; no:0.45 yes:0.55"
The sum of the probabilities in each row for the initial state probabilities, state
transition probabilities, or emission probabilities parameters must
be rounded to 1.0. The observed symbols are case-sensitive. The
number of states and the number of observed symbols must be
consistent with the number_hidden_states argument and the observed
symbols in the input table; otherwise, the function displays error
messages.
Types: str OR list of Strings (str)
vertices_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "vertices". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
RETURNS:
Instance of HMMUnsupervised.
Output teradataml DataFrames can be accessed using attribute
references, such as HMMUnsupervisedObj.<attribute_name>.
Output teradataml DataFrame attribute names are:
1. output_initialstate_table
2. output_statetransition_table
3. output_emission_table
4. output
RAISES:
TeradataMlException
EXAMPLES:
# Load example data.
load_example_data("hmmunsupervised", "loan_prediction")
# Create teradataml DataFrame objects.
loan_prediction = DataFrame.from_table("loan_prediction")
# Example 1 - Build a HMM Unsupervised model on the loan prediction dataset
HMMUnsupervised_out = HMMUnsupervised(vertices = loan_prediction,
vertices_partition_column = ["model_id", "seq_id"],
vertices_order_column = ["seq_vertex_id"],
model_key = "model_id",
sequence_key = "seq_id",
observed_key = "observed_id",
hidden_states_num = 3,
init_methods = ["random"]
)
# Print the results for each output teradataml DataFrame.
print(HMMUnsupervised_out.output_initialstate_table)
print(HMMUnsupervised_out.output_statetransition_table)
print(HMMUnsupervised_out.output_emission_table)
print(HMMUnsupervised_out.output)
- __repr__(self)
- Returns the string representation for a HMMUnsupervised class instance.
- get_build_time(self)
- Function to return the build time of the algorithm in seconds.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_prediction_type(self)
- Function to return the Prediction type of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- get_target_column(self)
- Function to return the Target Column of the algorithm.
When model object is created using retrieve_model(), then the value returned is
as saved in the Model Catalog.
- show_query(self)
- Function to return the underlying SQL query.
When model object is created using retrieve_model(), then None is returned.
|