HMMUnsupervisedLearner
Description
The HMMUnsupervisedLearner function is available on the SQL-Graph
platform. The function can produce multiple HMM models
simultaneously, where each model is learned from a set of sequences
and where each sequence represents a vertex.
Usage
td_hmm_unsupervised_mle (
vertices = NULL,
model.key = NULL,
sequence.key = NULL,
observed.key = NULL,
hidden.states.num = NULL,
max.iter.num = 10,
epsilon = NULL,
skip.column = NULL,
init.methods = NULL,
init.params = NULL,
vertices.sequence.column = NULL,
vertices.partition.column = NULL,
vertices.order.column = NULL
)
Arguments
vertices |
Required Argument.
Specifies a tbl_teradata that contains the input vertex information.
|
vertices.partition.column |
Required Argument.
Specifies the Partition By columns for "vertices".
Values to this argument can be provided as vector, if multiple
columns are used for partition.
Note:
1. This argument must contain the name of the column
specified in "sequence.key" argument.
2. This argument should contain the name of the column
specified in "model.key", if "model.key" argument is
used, and it must be the first column followed by the
name of the column specified in "sequence.key".
Types: character OR vector of Strings (character)
|
vertices.order.column |
Required Argument.
Specifies the Order By columns for "vertices".
Values to this argument can be provided as vector, if multiple
columns are used for ordering.
Note: This argument must contain the name of the column, containing
time ordered sequence, as one of its columns.
Types: character OR vector of Strings (character)
|
model.key |
Optional Argument.
Specifies the name of the column that contains the model attribute.
The values in the column can be integers or strings.
Note: The "vertices.partition.column" argument should contain the name
of the column specified in this argument.
Types: character
|
sequence.key |
Required Argument.
Specifies the name of the column that contains the sequence
attribute. It must match one of the columns specified in the
"vertices.partition.column" argument. A sequence (value in this column)
must contain more than two observation symbols. Each sequence represent
a vertex.
Types: character
|
observed.key |
Required Argument.
Specifies the name of the column that contains the observed symbols.
The function scans the input tbl_teradata to find all possible
observed symbols.
Note: Observed symbols are case-sensitive.
Types: character
|
hidden.states.num |
Required Argument.
Specifies the number of hidden states.
Note: The number of hidden states can influence model quality and performance,
so choose the number appropriately.
Types: integer
|
max.iter.num |
Optional Argument.
Specifies the number of iterations that the training process runs
before the function completes.
Default Value: 10
Types: integer
|
epsilon |
Optional Argument.
Specifies the threshold value in determining the convergence of HMM
training. If the parameter value difference is less than the
threshold, the training process converges. There is no default value.
If you do not specify epsilon, the "max.iter.num" agrument determines when
the training process converges.
Types: numeric
|
skip.column |
Optional Argument.
Specifies the name of the column whose values determine whether the
function skips the row. The function skips the row if the value is
"true", "yes", "y", or "1". The function does not skip the row if
the value is "false", "f", "no", "n", "0", or NULL.
Types: character
|
init.methods |
Optional Argument.
Specifies the method that the function uses to generate the initial
parameters for the initial state probabilities, state transition
probabilities, and emission probabilities. The possibilities are:
random (default): The initial parameters are based on uniform
distribution.
flat: The probabilities are equal. Each cell holds the same
probability in the matrix or vector.
input: The function takes the initial parameters from the
"init.params" argument.
The names of these methods are case-insensitive. The seed number is
meaningful only when the specified method is random. The correct way
to specify the seed for "init.methods" is as follows: c('random','25') .
Types: character OR vector of characters
|
init.params |
Optional Argument.
When argument "init.methods"" has the value "input", this argument specifies
the initial parameters for the models. The first parameter specifies the
initial state probabilities, the second parameter specifies the state
transition probabilities, and the third parameter specifies the
emission probabilities. For example, if the hidden.states.num
argument specifies three (M) hidden states and two (N) observed symbols
("yes" and "no"), then the init.params values are:
init_state_probability_vector (the initial state probabilities):
Vector of size M. Eg: "0.3333333333 0.3333333333 0.3333333333"
state_transition_probability_matrix (the state transition
probabilities): Matrix of dimensions M x M. Eg:
"0.3333333333 0.3333333333 0.3333333333; 0.3333333333
0.3333333333 0.3333333333; 0.3333333333 0.3333333333 0.3333333333"
observation_emission_probability_matrix (the emission probabilities):
Matrix of dimensions M * N. Eg: "no:0.25 yes:0.75; no:0.35 yes:0.65; no:0.45 yes:0.55"
For the above example, the correct way to specify "init.params" is as follows:
c("0.3333333333 0.3333333333 0.3333333333", "0.3333333333 0.3333333333
0.3333333333; 0.3333333333 0.3333333333 0.3333333333; 0.3333333333 0.3333333333
0.3333333333","no:0.25 yes:0.75; no:0.35 yes:0.65; no:0.45 yes:0.55") .
The sum of the probabilities in each row for the initial state probabilities,
state transition probabilities, or emission probabilities parameters must
be rounded to 1.0. The observed symbols are case-sensitive. The
number of states and the number of observed symbols must be
consistent with the "hidden.states.num" argument and the observed
symbols in the input tbl_teradata; otherwise, the function displays error
messages.
Types: character OR vector of characters
|
vertices.sequence.column |
Optional Argument.
Specifies the vector of column(s) that uniquely identifies each row
of the input argument "vertices". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: character OR vector of Strings (character)
|
Value
Function returns an object of class "td_hmm_unsupervised_mle" which is a
named list containing objects of class "tbl_teradata".
Named list members can be referenced directly with the "$" operator
using following names:
output.initialstate.table
-
output.statetransition.table
output.emission.table
output
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Load example data.
loadExampleData("hmmunsupervised_example", "loan_prediction")
# Create object(s) of class "tbl_teradata".
loan_prediction <- tbl(con, "loan_prediction")
# Example 1 - Run a td_hmm_supervised_mle() function on the loan prediction dataset.
td_hmm_unsupervised_out <- td_hmm_unsupervised_mle(vertices = loan_prediction,
vertices.partition.column = c("model_id", "seq_id"),
vertices.order.column = c("seq_vertex_id"),
model.key = "model_id",
sequence.key = "seq_id",
observed.key = "observed_id",
hidden.states.num = 3,
init.methods = c("random", "25")
)