| |
Methods defined here:
- __init__(self, formula=None, data=None, maxnum_categorical=20, tree_type=None, ntree=None, tree_size=None, nodesize=1, variance=0.0, max_depth=12, mtry=None, mtry_seed=None, seed=None, outofbag=False, display_num_processed_rows=False, categorical_encoding='graycode', data_sequence_column=None)
- DESCRIPTION:
The DecisionForest function uses a training data set to generate a
predictive model. You can input the model to the DecisionForestPredict
function, which uses it to make predictions.
PARAMETERS:
formula:
Required Argument.
A string consisting of "formula". Specifies the model to be fitted. Only
basic formula of the "col1 ~ col2 + col3 +..." form is supported and
all variables must be from the same virtual data frame object. The
response should be column of type real, numeric, integer or boolean.
Types: str
data:
Required Argument.
Specifies the teradataml DataFrame containing the input data set.
maxnum_categorical:
Optional Argument.
Specifies the maximum number of distinct values for a single
categorical variable. The maxnum_categorical must be a positive int. A
maxnum_categorical greater than 20 is not recommended.
Default Value: 20
Types: int
tree_type:
Optional Argument.
Specifies whether the analysis is a regression (continuous response
variable) or a multiclass classification (predicting result from the
number of classes). The default value is "regression", if the response
variable is numeric and "classification", if the response variable is
non-numeric.
Types: str
ntree:
Optional Argument.
Specifies the number of trees to grow in the forest model. When
specified, number of trees must be greater than or equal to the
number of vworkers. When not specified, the function builds the
minimum number of trees that provides the input dataset with full
coverage.
Types: int
tree_size:
Optional Argument.
Specifies the number of rows that each tree uses as its input data
set. If not specified, the function builds a tree using either the
number of rows on a vworker or the number of rows that fits into the
vworker's memory, whichever is less.
Types: int
nodesize:
Optional Argument.
Specifies a decision tree stopping criterion, the minimum size of any
node within each decision tree.
Default Value: 1
Types: int
variance:
Optional Argument.
Specifies a decision tree stopping criterion. If the variance within
any node dips below this value, the algorithm stops looking for splits
in the branch. The default value is 0.
Default Value: 0
Types: float
max_depth:
Optional Argument.
Specifies a decision tree stopping criterion. If the tree reaches a
depth past this value, the algorithm stops looking for splits.
Decision trees can grow to (2(max_depth+1) - 1) nodes. This stopping
criteria has the greatest effect on the performance of the function.
Default Value: 12
Types: int
mtry:
Optional Argument.
Specifies the number of variables to randomly sample from each
input value. For example, if mtry is 3, then the function randomly
samples 3 variables from each input at each split. The mtry must be an
int.
Types: int
mtry_seed:
Optional Argument.
Specifies a int value to use in determining the random seed for mtry.
Types: int
seed:
Optional Argument.
Specifies a int value to use in determining the seed for the random
number generator. If you specify this value, you can specify the same
value in future calls to this function and the function will build
the same tree.
Types: int
outofbag:
Optional Argument.
Specifies whether to output the out-of-bag estimate of error rate.
Default Value: False
Types: bool
display_num_processed_rows:
Optional Argument.
Specifies whether to display the number of processed rows of input
table.
Default Value: False
Types: bool
categorical_encoding:
Optional Argument.
Specifies which encoding method is used for categorical variables.
Note: categorical_encoding argument support is only available
when teradataml is connected to Vantage 1.1 or later.
Default Value: "graycode"
Permitted Values: graycode, hashing
Types: str
data_sequence_column:
Optional Argument.
Specifies the list of column(s) that uniquely identifies each row of
the input argument "data". The argument is used to ensure
deterministic results for functions which produce results that vary
from run to run.
Types: str OR list of Strings (str)
RETURNS:
Instance of DecisionForest.
Output teradataml DataFrames can be accessed using attribute
references, such as DecisionForestObj.<attribute_name>.
Output teradataml DataFrame attribute names are:
1. predictive_model
2. monitor_table
3. output
RAISES:
TeradataMlException
EXAMPLES:
# Load the data to run the example
load_example_data("decisionforest", ["housing_train", "boston"])
# Create teradataml DataFrame.
housing_train = DataFrame.from_table("housing_train")
boston = DataFrame.from_table("boston")
# Example 1 -
decision_forest_out1 = DecisionForest(formula = "homestyle ~ bedrooms + lotsize + gashw + driveway + stories + recroom + price + garagepl + bathrms + fullbase + airco + prefarea",
data = housing_train,
tree_type = "classification",
ntree = 50,
nodesize = 1,
variance = 0.0,
max_depth = 12,
mtry = 3,
mtry_seed = 100,
seed = 100)
# Print output dataframes
print(decision_forest_out1.output)
print(decision_forest_out1.predictive_model)
print(decision_forest_out1.monitor_table)
# Example 2 -
decision_forest_out2 = DecisionForest(formula = "homestyle ~ bedrooms + lotsize + gashw + driveway + stories + recroom + price + garagepl + bathrms + fullbase + airco + prefarea",
data = housing_train,
tree_type = "classification",
ntree = 50,
nodesize = 2,
max_depth = 12,
mtry = 3,
outofbag = True)
# Print all output dataframes.
print(decision_forest_out2.output)
print(decision_forest_out2.predictive_model)
print(decision_forest_out2.monitor_table)
# Example 3 -
decision_forest_out3 = DecisionForest(formula = "medv ~ indus + ptratio + lstat + black + tax + dis + zn + rad + nox + chas + rm + crim + age",
data = boston,
tree_type = "regression",
ntree = 50,
nodesize = 2,
max_depth = 6,
outofbag = True)
# Print all output dataframes.
print(decision_forest_out3.output)
print(decision_forest_out3.predictive_model)
print(decision_forest_out3.monitor_table)
- __repr__(self)
- Returns the string representation for a DecisionForest class instance.
|