Using training APIs for training
To run training APIs train and cv, create a Dataset object then pass the object as input to these functions. The following subsections cover both single model and multi model training and highlight the differences between training with actual lightGBM and td_lightgbm.
Whether it is single-model training or multi-model training, you must first create the required teradataml DataFrames. These DataFrames should be created from the same parent teradataml DataFrame using the select() API.
# Load the example dataset. >>> load_example_data("openml", ["multi_model_classification", "multi_model_regression"]) ## Classification data. # Create DataFrames. >>> df_train_classif = DataFrame('multi_model_classification') >>> df_train_classif col1 col2 col3 col4 label group_column partition_column_1 partition_column_2 -2.86976401 -1.45650077 -0.096175758 -1.831834743 0 11 1 11 -2.69651383 -1.244829178 -0.110505514 -1.670524168 0 9 0 10 0.939206169 -0.618203487 0.209642108 0.150728526 0 11 0 10 1.413963995 0.329342266 0.110572078 0.743405742 0 10 0 11 0.599510805 1.298906549 -0.141761534 0.790378168 1 8 1 10 -1.17273888 0.05006965 -0.144305532 -0.484090357 1 12 1 11 1.087219107 -1.006162386 0.289957882 0.055393647 0 9 1 10 0.669401117 0.965276402 -0.079356706 0.683697322 1 8 1 10 1.441799942 -0.007863088 0.16867607 0.617164007 0 9 1 10 1.619580063 0.479070866 0.110079835 0.893252734 1 8 0 10 # Create required classification DataFrames. >>> df_x_classif = df_train_classif.select(["col1", "col2", "col3", "col4"]) >>> df_y_classif = df_train_classif.select("label") ## Regression data. # Create DataFrames. >>> df_train_reg = DataFrame('multi_model_regression') >>> df_train_reg.head(4) col1 col2 col3 col4 label group_column partition_column_1 partition_column_2 -2.86976401 -1.45650077 -0.096175758 -1.831834743 -141 11 1 11 -2.69651383 -1.244829178 -0.110505514 -1.670524168 -89 9 0 10 0.939206169 -0.618203487 0.209642108 0.150728526 -137 11 0 10 1.413963995 0.329342266 0.110572078 0.743405742 8 10 0 11 # Create required regression DataFrames. df_x_reg = df_train_reg.select(["col1", "col2", "col3", "col4"]) df_y_reg = df_train_reg.select("label")
See the following model exmaples: