Run scikit-learn Functions using X and y as Arguments | teradataml OpenSourceML - Run scikit-learn Functions using Legacy Arguments - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
lifecycle
latest
Product Category
Teradata Vantage

If you are familiar with scikit-learn, you can use the data argument X, y and groups like the way you use them in scikit-learn.

One minor difference in the usage:
  • In scikit-learn, these arguments are pandas DataFrames, or numpy arrays, or list of lists, and so on.
  • With td_sklearn, these arguments are teradataml DataFrames which are created from the same teradataml DataFrame using select() API.
    If there is only X argument, then it does not need to be derived using select() API.

scikit-learn Example

  • Generate data.
    # X : {array-like, sparse matrix} of shape (n_samples, n_features)
    # y : array-like of shape (n_samples,)
    from sklearn.datasets import make_classification
    X, y = make_classification(n_features=4, random_state=0)
  • Instantiate scikit-learn LinearSVC object.
    from sklearn.svm import LinearSVC
    clf = LinearSVC(random_state=0, tol=1e-5)
    clf
    LinearSVC(random_state=0, tol=1e-05)
  • Train the model.
    clf.fit(X=x, y=y)
    LinearSVC(random_state=0, tol=1e-05)
  • Generate predictions on test data.
    clf.predict([[0, 0, 0, 0]])
    [1]
  • Access attributes.
    linear_svc.intercept_
    array([0.55058172])

teradataml Open-Source Machine Learning Functions Example

  • Generate data.
    df_train = DataFrame("test_classification")
    df_train
                   col1                    col2                  col3                   col4    label
    -1.1305820619922704     -0.0202959251414216   -0.7102336334648424    -1.4409910829920618        0
    -0.2869200001717422     -0.7169529842687833   -0.9865850877151031     -0.848214734984639        0
    -2.5604297516143286      0.4022323367243113   -1.1007419820939435    -2.9595882598466674        0
     0.4223414406917685     -2.0391144030275625   -2.053215806414584     -0.8491230457662061        0
     0.7216694959200303     -1.1215566442946217   -0.8318398647044646     0.1507420965953343        0
    -0.9861325665504175      1.7105310292848412    1.3382818041204743    -0.0853410902974293        1
    -0.5097927128625588      0.4926589443964751    0.2482067293662461    -0.3095907315896897        1
     0.1833246820582146      -0.774610353732039    -0.766054694735782    -0.2936686329125327        0
    -0.4032571038523639      2.0061840569850093    2.0275124771199318     0.8508919440196763        1
    -0.0715602561938739      0.2295539000122874    0.21654344712218576    0.0652739792167357        1
    
    feature_columns = ["col1", "col2", "col3", "col4"]
    label_columns = "label"
    Input teradataml DataFrames must be created using select() on the same parent DataFrame.
    df_x_clasif = df.select(feature_columns)
    df_y_clasif = df.select(label_columns)
  • Create an instance of scikit-learn LinearSVC object through 'td_sklearn'.
    from teradataml import td_sklearn as osml
    linear_svc = osml.LinearSVC(loss="hinge", tol=0.01)
    linear_svc
    LinearSVC(loss='hinge', tol=0.01)
  • Train the model.
    linear_svc.fit(X=df_x_clasif, y=df_y_clasif)
    LinearSVC(loss='hinge', tol=0.01)
  • Get predictions on test data.
    Compared to the predicted values in previous scikit-learn example, teradataml OpenSourceML returns teradataml DataFrame with both features and labels.
    linear_svc.predict(df_x_clasif)
                  col1                  col2                 col3                 col4   linearsvc_predict_1
      1.23195055037206     -1.53949525926716    -0.99510531686895    0.511600970144431                   0.0
      1.26780439921386     -1.80170792990881    -1.27034986297172    0.379112827728592                   0.0
    -0.869536951900537      1.99896877100815     1.73590334857413    0.257374908024379                   1.0
      1.43370121321312     -1.75423983622451    -1.11573423222268    0.620716743476382                   0.0
     -1.05286597780779    -0.641515112432539    -1.36672011108273    -1.76399738946526                   0.0
    -0.345538051487565     -2.29672333669221    -2.81180710379968     -1.9931134219738                   0.0
      -1.2573206891836     -2.14861012008993    -3.19826339415065    -3.04373306805433                   0.0
    -0.205721671526727      1.75895320535307     1.86752027575658    0.932664558487293                   1.0
     -3.58754622394712      0.29181935785016    -1.85016852734401    -4.33105451025007                   0.0
     -2.52159550020822      2.47822554412282     1.27458363813847    -1.50328319686837                   1.0
    
  • Access attributes.
    linear_svc.intercept_
    array([0.55058172])