Key Feature Additions and Changes | Teradata Package for Python 20.00 - Key Feature Additions and Changes - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2025-01-23
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
lifecycle
latest
Product Category
Teradata Vantage

The following table lists the key feature additions and changes in the Teradata Package for Python, teradataml.

Date Release Description
December 2024 20.00.00.03
  • teradataml no longer supports setting the auth_token using set_config_params(). Use set_auth_token() to set the token.
  • New Features/Functionality
    • teradataml: DataFrame
      • New Function

        alias() - Creates a DataFrame with alias name.

      • New Property

        db_object_name - Gets the underlying database object name, on which DataFrame is created.

    • teradataml: GeoDataFrame
      • New Function

        alias() - Creates a GeoDataFrame with alias name.

    • teradataml: DataFrameColumn a.k.a. ColumnExpression
      • Arithmetic Functions
        • DataFrameColumn.isnan() - Evaluates expression to determine if the floating-point argument is a NaN (Not-a-Number) value.
        • DataFrameColumn.isinf() - Evaluates expression to determine if the floating-point argument is an infinite number.
        • DataFrameColumn.isfinite() - Evaluates expression to determine if it is a finite floating value.
    • FeatureStore - handles feature management within the environment
      • FeatureStore Components
        • Feature - Represents a feature which is used in ML Modeling.
        • Entity - Represents the columns which serves as uniqueness for the data used in ML Modeling.
        • DataSource - Represents the source of Data.
        • FeatureGroup - Collection of Feature, Entity and DataSource.
          • Methods
          • apply() - Adds Feature, Entity, DataSource to a FeatureGroup.
          • from_DataFrame() - Creates a FeatureGroup from teradataml DataFrame.
          • from_query() - Creates a FeatureGroup using a SQL query.
          • remove() - Removes Feature, Entity, or DataSource from a FeatureGroup.
          • reset_labels() - Removes the labels assigned to the FeatureGroup, that are set using set_labels().
          • set_labels() - Sets the Features as labels for a FeatureGroup.
          • Properties
          • features - Get the features of a FeatureGroup.
          • labels - Get the labels of FeatureGroup.
      • FeatureStore
        • Methods
        • apply() - Adds Feature, Entity, DataSource, FeatureGroup to FeatureStore.
        • archive_data_source() - Archives a specified DataSource from a FeatureStore.
        • archive_entity() - Archives a specified Entity from a FeatureStore.
        • archive_feature() - Archives a specified Feature from a FeatureStore.
        • archive_feature_group() - Archives a specified FeatureGroup from a FeatureStore. Method archives underlying Feature, Entity, DataSource also.
        • delete_data_source() - Deletes an archived DataSource.
        • delete_entity() - Deletes an archived Entity.
        • delete_feature() - Deletes an archived Feature.
        • delete_feature_group() - Deletes an archived FeatureGroup.
        • get_data_source() - Get the DataSources associated with FeatureStore.
        • get_dataset() - Get the teradataml DataFrame based on Features, Entities and DataSource from FeatureGroup.
        • get_entity() - Get the Entity associated with FeatureStore.
        • get_feature() - Get the Feature associated with FeatureStore.
        • get_feature_group() - Get the FeatureGroup associated with FeatureStore.
        • list_data_sources() - List DataSources.
        • list_entities() - List Entities.
        • list_feature_groups() - List FeatureGroups.
        • list_features() - List Features.
        • list_repos() - List available repos which are configured for FeatureStore.
        • repair() - Repairs the underlying FeatureStore schema on database.
        • set_features_active() - Marks the Features as active.
        • set_features_inactive() - Marks the Features as inactive.
        • setup() - Set up the FeatureStore for a repo.
        • Properties
        • repo - Property for FeatureStore repo.
        • grant - Property to Grant access on FeatureStore to user.
        • revoke - Property to Revoke access on FeatureStore from user.
    • teradataml: Table Operator Functions
      • Image2Matrix() - Converts an image into a matrix.
    • teradataml: Analytic Functions
      • Analytics Database Functions
        • CFilter()
        • NaiveBayes()
        • TDNaiveBayesPredict()
        • Shap()
        • SMOTE()
      • teradataml: Unbounded Array Framework (UAF) Functions:
        • CopyArt()
    • General Functions
      • File Management Functions
        • list_files() - List the installed files in Database.
    • OpensourceML: LightGBM

      teradataml adds support for lightGBM package through OpensourceML (OpenML) feature.

      The following functionality is added:

      • td_lightgbm - Interface object to run lightgbm functions and classes through Analytics Database.

        Example usage:

        from teradataml import td_lightgbm, DataFrame
         
        df_train = DataFrame("multi_model_classification")
         
        feature_columns = ["col1", "col2", "col3", "col4"]
        label_columns = ["label"]
        part_columns = ["partition_column_1", "partition_column_2"]
         
        df_x = df_train.select(feature_columns)
        df_y = df_train.select(label_columns)
         
        # Dataset creation.
        # Single model case.
        obj_s = td_lightgbm.Dataset(df_x, df_y, silent=True, free_raw_data=False)
         
        # Multi model case.
        obj_m = td_lightgbm.Dataset(df_x, df_y, free_raw_data=False, partition_columns=part_columns)
        obj_m_v = td_lightgbm.Dataset(df_x, df_y, free_raw_data=False, partition_columns=part_columns)
         
        ## Model training.
        # Single model case.
        opt = td_lightgbm.train(params={}, train_set = obj_s, num_boost_round=30)
         
        opt.predict(data=df_x, num_iteration=20, pred_contrib=True)
         
        # Multi model case.
        opt = td_lightgbm.train(params={}, train_set = obj_m, num_boost_round=30,
                                callbacks=[td_lightgbm.record_evaluation(rec)],
                                valid_sets=[obj_m_v, obj_m_v])
         
        # Passing `label` argument to get it returned in output DataFrame.
        opt.predict(data=df_x, label=df_y, num_iteration=20)
      • Added support for accessing scikit-learn APIs using exposed inteface object td_lightgbm.
    • teradataml: Functions
      • register() - Registers a user defined function (UDF).
      • call_udf() - Calls a registered user defined function (UDF) and returns ColumnExpression.
      • list_udfs() - List all the UDFs registered using 'register()' function.
      • deregister() - Deregisters a user defined function (UDF).
    • teradaml: Options
      • Configuration Options

        table_operator - Specifies the name of table operator.

  • Updates
    • General functions

      set_auth_token() - Added base_url parameter which accepts the CCP url.

      'ues_url' will be deprecated in future, and you will need to specify 'base_url' instead.

    • teradataml: DataFrame function
      • join()

        Now supports compound ColumExpression having more than one binary operator in on argument.

        Now supports ColumExpression containing FunctionExpression(s) in on argument.

        self-join now expects aliased DataFrame in other argument.

    • teradataml: GeoDataFrame function
      • join()

        Now supports compound ColumExpression having more than one binary operator in on argument.

        Now supports ColumExpression containing FunctionExpressions in on argument.

        self-join now expects aliased DataFrame in other argument.

    • teradataml: Unbounded Array Framework (UAF) Functions
      • SAX() - Default value added for window_size and output_frequency.
      • DickeyFuller()

        Supports TDAnalyticResult as input.

        Default value added for max_lags.

        Removed parameter drift_trend_formula.

        Updated permitted values for algorithm.

    • teradataml: AutoML
      • AutoML, AutoRegressor, and AutoClassifier

        Now supports DECIMAL datatype as input.

    • teradataml: Analytics Database Analytic Functions
      • TextParser()

        Argument name covert_to_lowercase changed to convert_to_lowercase.

  • Bug Fixes:
    • db_list_tables() now returns correct results when '%' is used.
October 2024 20.00.00.02
  • New Features/Functionality
    • teradataml: Analytics Database Analytic Functions
      • New Analytics Database Analytic Functions:
        • TFIDF()
        • Unpivoting()
        • Pivoting()
      • New Unbounded Array Framework (UAF) Functions:
        • AutoArima()
        • DWT()
        • DWT2D()
        • FilterFactory1d()
        • IDWT()
        • IDWT2D()
        • IQR()
        • Matrix2Image()
        • SAX()
        • WindowDFFT()
    • teradataml: Functions
      • udf() - Creates a user defined function (UDF) and returns ColumnExpression.
      • materialize() - Persists dataframe into database for current session.
      • create_temp_view() - Creates a temporary view for session on the DataFrame.
    • teradataml: DataFrame
      • New function set_session_param() is added to set the database session parameters.
      • New function unset_session_param() is added to unset database session parameters.
    • teradataml DataFrameColumn a.k.a. ColumnExpression
      • _Date Time Functions_
        • DataFrameColumn.to_timestamp() - Converts string or integer value to a TIMESTAMP data type or TIMESTAMP WITH TIME ZONE data type.
        • DataFrameColumn.extract() - Extracts date component to a numeric value.
        • DataFrameColumn.to_interval() - Converts a numeric value or string value into INTERVAL_DAY_TO_SECOND or INTERVAL_YEAR_TO_MONTH value.
      • _String Functions_
        • DataFrameColumn.parse_url() - Extracts a part from a URL.
      • _Arithmetic Functions _
        • DataFrameColumn.log - Returns the logarithm value of the column with respect to 'base'.
    • teradataml: AutoML
      • New Methods added for AutoML(), AutoRegressor(), and AutoClassifier():
        • evaluate() - Added new method in AutoML to perform evaluation on the data using the best model or the model of users choice from the leaderboard.
        • New function added: load(), deploy() and remove_saved_model().
        • load() - Loads the saved model from database.
        • deploy() - Saves the trained model inside database.
        • remove_saved_model() - Removes the saved model in database.
        • model_hyperparameters() - Returns the hyperparameter of fitted or loaded models.
  • Updates
    • teradataml: AutoML
      • AutoML(), AutoRegressor()
        • New performance metrics added for task type regression i.e., "MAPE", "MPE", "ME", "EV", "MPD" and "MGD".
      • AutoML(), AutoRegressor() and AutoClassifier()
        • New arguments added: volatile, persist.
        • predict() - Data input is now mandatory for generating predictions. Default model evaluation is now removed.
    • teradataml: Options
      • set_config_params()
        • Following arguments will be deprecated in the future: ues_url and auth_token
    • DataFrameColumn.cast(): Accepts 2 new arguments format and timezone.
    • DataFrame.assign(): Accepts ColumnExpressions returned by udf().
  • teradataml DataFrame
    • to_pandas()- Function returns the pandas dataframe with Decimal column types as float instead of object. If user want to datatype to be object set argument coerce_float to False.
  • Database Utility
    • list_td_reserved_keywords() - Accepts a list of strings as argument.
  • Updates to existing UAF Functions:
    • ACF() - round_results parameter removed as it was used for internal testing.
    • BreuschGodfrey() - Added default_value 0.05 for parameter significance_level.
    • GoldfeldQuandt()
      • Removed parameters weights and formula.
      • Replaced parameter orig_regr_paramcnt with const_term.
      • Changed description for parameter algorithm. Please refer document for more details.
      • Note: This will break backward compatibility.
    • HoltWintersForecaster() - Default value of parameter seasonal_periods removed.
    • IDFFT2() - Removed parameter output_fmt_row_major as it is used for internal testing.
    • Resample() - Added parameter output_fmt_index_style.
  • Bug Fixes
    • KNN predict() function can now predict on test data which doesnt contain target column.
    • Metrics functions are supported on the Lake system.
    • The following OpensourceML functions from different sklearn modules are fixed.
      • sklearn.ensemble:
        • ExtraTreesClassifier - apply()
        • ExtraTreesRegressor - apply()
        • RandomForestClassifier - apply()
        • RandomForestRegressor - apply()
      • sklearn.impute:
        • SimpleImputer - transform(), fit_transform(), inverse_transform()
        • MissingIndicator - transform(), fit_transform()
      • sklearn.kernel_approximations:
        • Nystroem - transform(), fit_transform()
        • PolynomialCountSketch - transform(), fit_transform()
        • RBFSampler - transform(), fit_transform()
      • sklearn.neighbours:
        • KNeighborsTransformer - transform(), fit_transform()
        • RadiusNeighborsTransformer - transform(), fit_transform()
      • sklearn.preprocessing:
        • KernelCenterer - transform()
        • OneHotEncoder - transform(), inverse_transform()
    • OpensourceML returns teradataml objects for model attributes and functions instead of sklearn objects so that the user can perform further operations like score(), predict() etc on top of the returned objects.
    • AutoML predict() function now generates correct ROC-AUC value for positive class.
    • deploy() method of Script and Apply classes retries model deployment if there is any intermittent network issues.
August 2024 20.00.00.01
  • teradataml no longer supports Python versions less than 3.8.
  • Added new feature - Personal Access Token (PAT) support in teradataml

    set_auth_token() - teradataml now supports authentication via PAT in addition to OAuth 2.0 Device Authorization Grant (formerly known as the Device Flow).

    It accepts UES URL, Personal AccessToken (PAT), and Private Key file generated from VantageCloud Lake Console and optional argument username and expiration_time in seconds.

  • Updated Analytics Database analytic functions:
    • ANOVA() - New arguments added: group_name_column, group_value_name, group_names, num_groups for data containing group values and group names.
    • FTest() - New arguments added: sample_name_column, sample_name_value, first_sample_name, second_sample_name.
    • GLM()
      • Supports stepwise regression and accept new arguments stepwise_direction, max_steps_num, and initial_stepwise_columns.
      • New arguments added: attribute_data, parameter_data, iteration_mode, and partition_column.
    • GetFutileColumns() - Arguments category_summary_column and threshold_value are now optional.
    • KMeans() - New argument added: initialcentroids_method.
    • NonLinearCombineFit() - Argument result_column is now optional.
    • ROC() - Argument positive_class is now optional.
    • SVMPredict() - New argument added: model_type.
    • ScaleFit()
      • New arguments added: ignoreinvalid_locationscale, unused_attributes, attribute_name_column, attribute_value_column.
      • Arguments attribute_name_column, attribute_value_column, and target_attributes are supported for sparse input.
      • Arguments attribute_data, parameter_data, and partition_column are supported for partitioning.
    • ScaleTransform() - New arguments added: attribute_name_column and attribute_value_column support for sparse input.
    • TDGLMPredict() - New arguments added: family and partition_column.
    • XGBoost() - New argument base_score is added for initial prediction value for all data points.
    • XGBoostPredict() - New argument detailed is added for detailed information of each prediction.
    • ZTest() - New arguments added: sample_name_column, sample_value_column, first_sample_name, and second_sample_name.
  • teradataml: AutoML
    • AutoML(), AutoRegressor(), and AutoClassifier() - New argument max_models is added as an early stopping criterion to limit the maximum number of models to be trained.
  • teradataml: DataFrame functions
    • DataFrame.agg() - Accepts ColumnExpressions and list of ColumnExpressions as arguments.
  • teradataml: General Functions
    • Data Transfer Utility - fastload() updates
      • Improved error and warning table handling with following new arguments.
        • err_staging_db
        • err_tbl_name
        • warn_tbl_name
        • err_tbl_1_suffix
        • err_tbl_2_suffix
    • Change in behavior of save_errors argument. When save_errors is set to True, error information will be available in two persistent tables ERR_1 and ERR_2. When save_errors is set to False, error information will be available in single pandas dataframe.
    • Garbage collector location is now configurable. You can set configure.local_storage to a desired location.
  • Updates:
    • UAF functions now work if the database name has special characters.
    • OpensourceML can now read and process NULL/nan values.
    • Boolean values output will now be returned as VARBYTE column with 0 or 1 values in OpensourceML.
    • Fixed bug for Apply's deploy().
    • Issue with volatile table creation is fixed where it is created in the correct database, i.e., user's spool space, regardless of the temp database specified.
    • ColumnTransformer function now processes its arguments in the order they are passed.
March 2024 20.00.00.00
  • Added new feature - teradataml open-source machine learning functions (teradataml OpenSourceML) that dynamically exposes open-source packages through Teradata Vantage. It provides an interface object through which exposed classes and functions of open-source packages can be accessed with the same syntax and arguments.
  • Added new feature - AutoML that automates the process of building, training, and validating machine learning models. It involves automation of various aspects of the machine learning workflow, such as feature exploration, feature engineering, data preparation, model training and evaluation for given dataset.
  • Added new deploy() method to deploy models generated after running script, in database when connected to VantageCloud Enterprise (as part of Script table operator), or in user environment when connected to VantageCloud Lake (as part of Apply table operator).
  • Added new DataFrame manipulation functions cube(), rollup(), replace.
  • Added eight categories of new DataFrame Column functions:
    • Bit Byte Manipulation Functions
    • Comparison Functions
    • Date Time Functions
    • Hyperbolic Functions
    • Regular Arithmetic Functions
    • Regular Expression Functions
    • String Functions
    • Trigonometric Functions
  • Removed functionalities that have been deprecated:
    • Machine Learning Engine functions
    • Model Cataloging feature
    • Sandbox feature that supports testing script in both Script table operator and Apply table operator.
Februrary 2024 17.20.00.07 Updated Open Analytics Framework APIs to support VantageCloud Lake use of Anaconda for building conda environments to run Python analytic workload on Open Analytics Framework:
  • Updated create_env() with new argument conda_env to specify whether the environment to be created is a conda environment or not.
  • Output of list environment APIs have a new column "conda" to show whether the environment is a conda environment or not.
  • Updated set_auth_token to address Open Analytics Login Issue with teradataml 17.20.00.05 and 17.20.00.06.
  • Updated list_user_envs() with new argument conda_env to specify whether to filter the conda environments when listing user environments.
January 2024 17.20.00.06
  • New teradataml DataFrame Column functions:
    • 19 new Bit Byte Manipulation Functions
    • 4 new Regular Expression Functions
    • 2 new Display Functions
  • New and updated Open Analytics Framework APIs:
    • Updated create_env() so user can create one or more user environments using the new argument template by providing specifications in template json file.
    • New UserEnv Class property models, and methods install_model() and uninstall_model() to list, install and uninstall models in user environment.
    • New UserEnv Class method snapshot() to take snapshot of user environment.
  • New BYOM function DataRobotPredict() to score the data in Vantage using the model trained externally in datarobot and stored in Vantage.
  • Updated DataFrame functions:
    • DataFrame.describe() method to accept argument statistics to specify the aggregate operation to perform.
    • DataFrame.sort() method to accept ColumnExpression, and enable sorting.
    • DataFrame.sample() method to support column stratification.
  • Updated general function view_log() to download the APPLY query logs.
  • Updated Analytics Database analytic functions so arguments which accept floating numbers will accept integers.
  • Updated DataFrame.plot() function to ignore the null values while plotting data.
October 2023 17.20.00.05
  • New hyperparameter tuning feature to determine the optimal set of hyperparameters for the given dataset and learning model.
    • GridSearch algorithm covers all possible parameter values to identify optimal hyperparameters.
    • RandomSearch algorithm performs random sampling on hyperparameter space to identify optimal hyperparameters.
  • New plotting feature to visualize analytic results.
  • New teradataml DataFrame functions:
    • DataFrame.plot() to generate plots on teradataml DataFrame.
    • DataFrame.itertuples() to iterate over teradataml DataFrame rows as namedtuples or list.
  • New teradataml GeoDataFrame function GeoDataFrame.plot() to generate plots on teradataml GeoDataFrame.
  • New BYOM function DataikuPredict() to score the data in Vantage using the model trained externally in Dataiku UI and stored in Vantage.
  • New teradataml DataFrame Column functions:
    • Regular Arithmetic Functions
    • Trigonometric Functions
    • Hyperbolic Functions
    • String Functions
  • New general function async_run_status() to check the status of asynchronous runs using unique run ids.
  • New teradataml configuration option configure.indb_install_location to specify the installation location of in-database Python package.
  • Updated Open Analytics Framework APIs:
    • set_auth_token() does not accept username and password anymore. Instead, function opens up a browser session and user should authenticate in browser.
    • User environments, files and libraries related APIs updated to support R environment.
  • Updated Unbounded Array Framework (UAF) function ArimaEstimate() to support for CSS algorithm via algorithm argument.