Teradata Package for Python Function Reference on VantageCloud Lake - An Interface Object for scikit-learn - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference on VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Package for Python
Release Number
20.00.00.04
Published
March 2025
ft:locale
en-US
ft:lastEdition
2025-04-11
dita:id
TeradataPython_FxRef_Lake_2000
Product Category
Teradata Vantage

 
teradataml.opensource.td_sklearn = <teradataml.opensource._class.Sklearn object>
    DESCRIPTION:
    Interface object to access exposed classes and functions of scikit-learn 
    opensource package. All the classes and functions can be run and attributes 
    can be accessed using the object created by "td_sklearn" interface object.
    Refer Teradata Python Package User Guide for more information about OpenML 
    and exposed interface objects.
 
PARAMETERS:
    None
 
RETURNS:
    None
 
EXAMPLES:
    # Load example data.
    >>> load_example_data("openml", ["test_classification", "test_prediction"])
    >>> df = DataFrame("test_classification")
    >>> df.head(3)
                   col2      col3      col4  label
    col1
    -2.560430  0.402232 -1.100742 -2.959588      0
    -3.587546  0.291819 -1.850169 -4.331055      0
    -3.697436  1.576888 -0.461220 -3.598652      0
 
    >>> df_test = DataFrame("test_prediction")
    >>> df_test.head(3)
                   col2      col3      col4
    col1
    -2.560430  0.402232 -1.100742 -2.959588
    -3.587546  0.291819 -1.850169 -4.331055
    -3.697436  1.576888 -0.461220 -3.598652
 
 
    # Get the feature and label data.
    >>> df_x_clasif = df.select(df.columns[:-1])
    >>> df_y_clasif = df.select(df.columns[-1])
 
    >>> from teradataml import td_sklearn
    >>> dt_cl = td_sklearn.DecisionTreeClassifier(random_state=0)
    >>> dt_cl
    DecisionTreeClassifier(random_state=0)
 
    # Set the paramaters.
    >>> dt_cl.set_params(random_state=2, max_features="sqrt")
    DecisionTreeClassifier(max_features='sqrt', random_state=2)
 
    # Get the paramaters.
    >>> dt_cl.get_params()
    {'ccp_alpha': 0.0,
     'class_weight': None,
     'criterion': 'gini',
     'max_depth': None,
     'max_features': 'sqrt',
     'max_leaf_nodes': None,
     'min_impurity_decrease': 0.0,
     'min_impurity_split': None,
     'min_samples_leaf': 1,
     'min_samples_split': 2,
     'min_weight_fraction_leaf': 0.0,
     'random_state': 2,
     'splitter': 'best'}
 
    # Train the model using fit().
    >>> dt_cl.fit(df_x_clasif, df_y_clasif)
    DecisionTreeClassifier(max_features='sqrt', random_state=2)
 
    # Perform prediction.
    >>> dt_cl.predict(df_test)
           col1      col2      col3      col4  decisiontreeclassifier_predict_1
    0  1.105026 -1.949894 -1.537164  0.073171                                 1
    1  1.878349  0.577289  1.795746  2.762539                                 1
    2 -1.130582 -0.020296 -0.710234 -1.440991                                 0
    3 -1.243781  0.280821 -0.437933 -1.379770                                 0
    4 -0.509793  0.492659  0.248207 -0.309591                                 1
    5 -0.345538 -2.296723 -2.811807 -1.993113                                 0
    6  0.709217 -1.481740 -1.247431 -0.109140                                 0
    7 -1.621842  1.713381  0.955084 -0.885921                                 1
    8  2.425481 -0.549892  0.851440  2.689135                                 1
    9  1.780375 -1.749949 -0.900142  1.061262                                 0
 
    # Perform scoring.
    >>> dt_cl.score(df_x_clasif, df_y_clasif)
       score
    0    1.0
 
    # Access few attributes.
    >>> dt_cl.classes_
    array([0., 1.])
 
    >>> dt_cl.feature_importances_
    array([0.06945187, 0.02      , 0.67786339, 0.23268474])
 
    >>> dt_cl.max_features_
    2