Teradata Package for R Function Reference | 17.20 - Readme - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
Language
English (United States)
Last Update
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
Product Category
Teradata Vantage

Teradata R Package - tdplyr 17.20.00.00
Teradata Vantage Client R Analytic library
Copyright ©2018 - 2024, Teradata. All Rights Reserved.

General product information is available in the Teradata Documentation website, https://docs.teradata.com/
- Teradata R Package User Guide – B700-4005
- Teradata R Function Reference – B700-4007

Minimum System Requirements:


Teradata Vantage:
- Vantage 1.0 - Maintenance Update 2 or later versions
- Advanced SQL (was NewSQL) Engine 16.20 Feature Update 1 or later versions
- Teradata Machine Learning Engine 08.00.03.00 or later versions

Supported Drivers:
- Teradata SQL Driver 17.20.0.26 (Recommended)
- Teradata ODBC Driver 16.20

Operating Systems: (64-bit only)
- Windows 7
- MacOS OSX 10.9
- Ubuntu 16
- RHEL 7
- SLES/OpenSUSE 12

Release Notes:


tdplyr 17.20.00.00

  • New Features/Functions:
  • Support added for following Analytics Database Analytic Functions

    • MODEL SCORING functions:

      • td_glm_predict_per_segment()
      • td_kmeans_predict()
      • td_one_class_svm_predict()
      • td_svm_predict()
      • td_glm_predict()
      • td_xg_boost_predict()
    • FEATURE ENGINEERING TRANSFORM functions:

      • td_antiselect()
      • td_bincode_fit()
      • td_bincode_transform()
      • td_column_transformer()
      • td_fit()
      • td_non_linear_combine_fit()
      • td_non_linear_combine_transform()
      • td_one_hot_encoding_fit()
      • td_one_hot_encoding_transform()
      • td_ordinal_encoding_fit()
      • td_ordinal_encoding_transform()
      • td_polynomial_features_fit()
      • td_polynomial_features_transform()
      • td_random_projection_fit()
      • td_random_projection_min_components()
      • td_random_projection_transform()
      • td_row_normalize_fit()
      • td_row_normalize_transform()
      • td_scale_fit()
      • td_scale_transform()
      • td_target_encoding_fit()
      • td_target_encoding_transform()
      • td_transform()
    • PATH AND PATTERN ANALYSIS functions:

      • td_attribution()
      • td_npath()
      • td_sessionize()
    • SCORING WITH MACHINE LEARNING ENGINE MODELS functions:

      • td_decision_forest_predict_mle_sqle()
      • td_decision_tree_predict_mle_sqle()
      • td_glm_predict_mle_sqle()
      • td_naivebayes_predict_mle_sqle()
      • td_svm_sparse_predict_mle_sqle()
    • DATA EXPLORATION functions:

      • td_categorical_summary()
      • td_column_summary()
      • td_get_rows_with_missing_values()
      • td_histogram()
      • td_moving_average()
      • td_qq_norm()
      • td_univariate_statistics()
      • td_which_max()
      • td_which_min()
    • TEXT ANALYTIC functions:

      • td_naivebayes_textclassifier_trainer()
      • td_naivebayes_textclassifier_predict()
      • td_ngramsplitter()
      • td_sentiment_extractor()
      • td_text_parser()
      • td_word_embeddings()
    • DATA CLEANING functions:

      • td_convert_to()
      • td_get_futile_columns()
      • td_get_rows_without_missing_values()
      • td_outlier_filter_fit()
      • td_outlier_filter_transform()
      • td_pack()
      • td_simple_impute_fit()
      • td_simple_impute_transform()
      • td_string_similarity()
      • td_unpack()
    • HYPOTHESIS TESTING functions:

      • td_anova()
      • td_chi_sq()
      • td_ftest()
      • td_ztest()
    • MODEL EVALUATION functions:

      • td_classification_evaluator()
      • td_regression_evaluator()
      • td_roc()
      • td_silhouette()
      • td_train_test_split()
    • MODEL TRAINING functions:

      • td_decision_forest()
      • td_glm()
      • td_glm_per_segment()
      • td_kmeans()
      • td_knn()
      • td_one_class_svm()
      • td_svm()
      • td_vector_distance()
      • td_xg_boost()
    • FEATURE ENGINEERING UTILITY functions:

      • td_fill_row_id()
      • td_num_apply()
      • td_round_columns()
      • td_str_apply()
    • TABLE OPERATOR Functions:

      • td_read_nos()
      • td_write_nos()
    • BYOM Functions:

      • td_dataiku_predict()
      • td_h2o_predict()
      • td_onnx_predict()
      • td_pmml_predict()
      • td_data_robot_predict()
  • display_analytic_functions() - API displays all the available Analytic functions based on Vantage version and categorizes them by function type.

  • Following analytic functions that accept MLE models as input are renamed:

    • td_decision_forest_predict_sqle() --> td_decision_forest_predict_mle_sqle()
    • td_decision_tree_predict_sqle() --> td_decision_tree_predict_mle_sqle()
    • td_glm_predict_sqle() --> td_glm_predict_mle_sqle()
    • td_naivebayes_textclassifier_predict_sqle() --> td_naivebayes_textclassifier_predict_mle_sqle()
    • td_naivebayes_predict_sqle() --> td_naivebayes_predict_mle_sqle()
    • td_svm_sparse_predict_sqle() --> td_svm_sparse_predict_mle_sqle()
  • Following functions do not accept models generated by MLE, but they now accept the model generated by the Analytics Database Analytic Functions exposed by SQLE.

    • Note - If the Vantage version does not expose the function, the following functions won't be available.
      • td_decision_forest_predict()
      • td_glm_predict()
      • td_naivebayes_predict()
      • td_svm_sparse_predict()
  • The following functions overlap between SQLE and MLE. They will now point to their SQLE equivalent based on the vantage version they are connected to. In order to use the corresponding MLE function, we should add the "_mle" suffix.

    • td_decision_forest()
    • td_glm()
    • td_histogram()
    • td_kmeans()
    • td_knn()
    • td_text_parser()
    • td_naivebayes_textclassifier_predict()
    • td_text_parser()
    • td_vector_distance()

tdplyr 17.00.00.04

  • Important Notification:
    • tdplyr is now compatible with R 4.3.0.
    • Minimum teradatasql version required is 17.20.0.26 or later.
  • Bug Fixes:
    • VAL functions now raise appropriate error in case of failures. For example, if there is no room in database or insufficient permissions.
    • cumsum() and row_number() functions work with tdplyr 17.00.00.04.

tdplyr 17.00.00.03

  • Important Notification:
    • tdplyr is now compatible with latest version of dbplyr 2.3.2.
    • Minimum teradatasql version required is 17.20.0.19 or later.
  • New Features/Functions:
    • attach_attributes(): User can now attach attributes to the 'tbl' object using attach_atrributes().
      The attributes attached are:
      • databaseName - The name of the database in which the table exists.
      • sourceDefinition - Name of the source of input i.e table name or sql query.
      • sourceType - Type of input source i.e table or query.
      • object - Full name of the table when source type is table, else NULL.
      • baseQuery - Base Query for the tbl.
      • columnDataType - DataTypes of the columns.
      • columnNames - Names of the columns.
  • Bug Fixes:
  • td_fastload() now uploads single column dataframe.

tdplyr 17.00.00.02

  • Important Notification:
    For tdplyr, minimum teradatasql version required is 17.10.0.10.
  • New Features/Functions
    • Data Export: to_csv support is added:
      • td_to_csv()
  • Updates/Improvements:
    • Newly added tdplyr options:
      • tdplyr has added one new option that provide flexibility to the user to point to the database where byom is intalled.
      • User can use functions like td_pmml_predict() while being connected to another database.
      • Following is the name of the option:
        • byom.install.location - Specifies the database where byom is installed.
    • td_fastexport()
      • Function allows the user to export the data to a CSV file.
      • User can also use field.separator and field.quote.char to specify the separator and quote character to be used for writing the CSV file.
    • Performance of exporting data to CSV, instead of data.frame is better, while using td_fastexport and td_to_csv.

tdplyr 17.00.00.01

  • Important Notification:
    For tdplyr, minimum dbplyr version required is 2.0. tdplyr has been updated to support dbplyr version 2.0 or later.
  • New Features/Functions:
    • Bring Your Own Models (BYOM):
      • td_pmml_predict()
    • Data Export: FastExport support is added
      • td_fastexport()
    • Vantage Analytic Library (VAL) Functions:
      • Association Rules
        • td_association_valib()
      • Decision Tree
        • td_decision_tree_valib()
        • td_decision_tree_evaluator_valib()
        • td_decision_tree_predict_valib()
      • Descriptive Statistics (HTML File Available with Introduction)
        • td_adaptive_histogram_valib()
        • td_explore_valib()
        • td_frequency_valib()
        • td_histogram_valib()
        • td_overlap_valib()
        • td_statistics_valib()
        • td_text_analyzer_valib()
        • td_values_valib()
      • Factor Analysis
        • td_pca_valib()
        • td_pca_evaluator_valib()
        • td_pca_predict_valib()
      • Fast KMeans Clustering
        • td_kmeans_valib()
        • td_kmeans_predict_valib()
      • Linear Regression
        • td_lin_reg_valib()
        • td_lin_reg_evaluator_valib()
        • td_lin_reg_predict_valib()
      • Logistic Regression
        • td_log_reg_valib()
        • td_log_reg_evaluator_valib()
        • td_log_reg_predict_valib()
      • Matrix Building
        • td_matrix_valib()
      • Statistical Tests (HTML File Available with Introduction)
        • td_binomial_test_valib()
        • td_chi_square_test_valib()
        • td_ks_test_valib()
        • td_parametric_test_valib()
        • td_rank_test_valib()
      • Reports
        • td_xml_to_html_report_valib()
      • Variable Transformation
        • td_transform_valib()
        • Transformation Techniques to use with 'Transform'
          • tdBinning()
          • tdDerive()
          • tdFillNa()
          • tdLabelEncoder()
          • tdMinMaxScalar()
          • tdOneHotEncoder()
          • tdRetain()
          • tdSigmoid()
          • tdZScore()
      • Supporting Utilies/Functionality
        • S3 Generic: print() - Allows user to print the output of any of the VAL function.
        • View Underlying SQL:
          • show_query() - Allows user to print the underlying SQL used for function execution using the output object.
          • Option print.val.query allows user to print the SQL while the VAL function is running. This can be used for debugging purpose.
    • Sandbox Container Utility Functions
      • td_cleanup_sandbox_env()
      • td_copy_files_from_container()
      • td_setup_sandbox_env()
  • Updates/Improvements
    • Newly added tdplyr options:
      • tdplyr has added two new options that provide flexibility to the user to control which database must be used to create tables and views, generated by the tdplyr APIs, which are garbage collected at the end of the session. Following are the two options:
        • temp.table.database.name - Specifies the database to create tables into.
        • temp.view.database.name - Specifies the database to create views into.
          These options allow user to set the output database without re-creating context.
    • td_test_script()
      • Function now also allows testing on local client, i.e., outside of sandbox environment, in addition to existing node of sandbox testing.
      • Function td_setup_test_env() is deprecated now. Use newly added sandbox container utility function td_setup_sandbox_env() instead.

tdplyr 17.00.00.00

  • New Features/Functions
    • Model Cataloging - Functionality to catalog model metadata and related information in the Model Catalog.
      • td_save_model() - Save a tdplyr analytic function model.
      • td_retrieve_model() - Retrieve a saved model.
      • td_list_models() - List accessible models.
      • td_describe_model() - List the details of a model.
      • td_delete_model() - Remove a model from Model Catalog.
      • td_publish_model() - Share a model.
    • Script - An interface to run user scripts in Advanced SQL Engine using Teradata's Script Table Operator (STO).
      • Script() - Creates and initializes an object of "ScriptTableOperator" class.
      • The following functions enable the user to run user scripts locally in a containerized enviroment:
        • td_setup_test_env() - Setup up the test environment.
        • td_test_script() - Test user script in containerized environment.
        • td_set_data() - Update/set test data parameters.
      • The following functions enable the user to run user scripts in Advanced SQL Engine:
        • td_execute_script() - Execute user script in Vantage.
        • td_set_data() - Update/set test data parameters.
    • Vantage File Management Functions
      • td_install_file() - Install or replace file in Vantage.
      • td_remove_file() - Remove an installed file from Vantage.
    • Time series aggregate functions - Added new SQL translations to support the following time series aggregate functions. Please refer the vignette time_series_aggregates for the usage of all time series aggregate functions.
      • ts.perentile() - Calculate the nth percentile value of a column.
      • ts.delta_t() - Calculate the time difference (i.e, DELTA_T) between starting and ending events.
      • ts.kurtosis() - Calculate the kurtosis value of a column.
      • ts.skew() - Measure the skewness of the distribution of a column.
      • ts.sum() - Calculate the sum of values in the column.
      • ts.sd() - Calculate the sample standard deviation of a column.
      • ts.sdp() - Calculate the population standard deviation of a column.
      • ts.var() - Calculate the sample variance of a column.
      • ts.varp() - Calculate the population variance of a column.
      • ts.min() - Calculate the minimum value of a column.
      • ts.max() - Calculate the maximum value of a column.
      • ts.mean() - Calculate the average value of a column.
      • ts.n() - Calculate the count of qualified rows in a column.
      • ts.describe() - Calculate the statistics of a column.
    • Regular aggregate functions - Added new SQL translations to support the following regular aggregate functions. Please refer the vignette regular_aggregates for the usage of all regular aggregate functions.
      • kurtosis() - Calculate the kurtosis value of a column.
      • skew() - Measure the skewness of the distribution of a column.
      • n() - Calculate the count of qualified rows in a column. This dbplyr SQL aggregate function is overwritten to take a column name, with default value as "*" and builds a SQL expression to return count value of type bit64::integer64.
      • n_distinct() - This SQL aggregate function is updated to return count value of type bit64::integer64.
    • Window aggregate functions - Added new SQL translations to support the following window aggregate function. Please refer the vignette sql-translation for the usage of all window aggregate functions.
      • n() - Calculate the count of qualified rows in a column. This dbplyr SQL aggregate function is overwritten to take a column name, with default value as "*" and builds a SQL expression to return count value of type bit64::integer64.
  • Updates/Improvements:
    • Support added for Stored Password Protection feature. Please refer to the documentation and examples of td_create_context() function.
    • Support added in td_create_context() to connect to Teradata Vantage using JWT logon mechanism.
    • Using DBI::dbConnect(tdplyr::NativeDriver(), ...) creates context unlike earlier releases. To create connection using Teradata SQL Driver for R, please refer the documentation here.
    • Updated the way td_set_context() works based on the type of the connection object.
    • The td_nrow() function returns the count of the rows in tbl_teradata as bit64::integer64 type.

tdplyr 16.20.00.06

  • Compatible with Vantage 1.1.1.\ The following ML Engine functions have new and/or updated arguments to support the Vantage version:
    • td_adaboost_predict_mle()
    • td_decision_forest_predict_mle()
    • td_glm_predict_mle()
    • td_lda_mle()
    • td_naivebayes_predict_mle()
    • td_naivebayes_textclassifier_predict_mle()
    • td_svm_dense_predict_mle()
    • td_svm_sparse_mle()
    • td_svm_sparse_predict_mle()
    • td_xgboost_predict_mle()
  • Added support to use generic predict function for td_adaboost_predict_mle(). Now td_adaboost_predict_mle() can be called using predict(<td_adaboost_mle_obj>, ...).

tdplyr 16.20.00.05

  • Improvements
    • Improved performance when using output of analytic functions as input to dplyr verbs or copy_to().
    • Support added to load time series Primary Time Index (PTI) tables using copy_to().
    • Added more examples and updated documentation for some APIs.
  • New Features/Functions
    • tdplyr in conjunction with Teradata SQL Driver for R supports integration with RStudio Connections Pane for exploring the data source once connection is established with Vantage. Please refer the link for more information about RStudio Connection Pane.
    • td_fastload() - Enable high performance data loading of large amount of data into a table in Teradata Vantage.
    • td_sample() - Sample proportion of data.
    • Time series functions
      • group_by_time() - Group the tbl_teradata based on time.
      • Time series aggregate functions - Added new SQL translations to perform aggregate operations on time series objects of class "tbl_teradata" grouped by time. Please refer the vignette time_series_aggregates for usage.
        • ts.bottom() - Calculate the smallest n values in a column.
        • ts.top() - Calculate the largest n values in a column.
        • ts.first() - Calculate the oldest value, determined by timecode, in a column.
        • ts.last() - Calculate the newest value, determined by timecode, in a column.
        • ts.mad() - Calculate the Median Absolute Deviation of the values in a column.
        • ts.median() - Calculate the median of the values in a column.
        • ts.mode() - Calculate the mode of the values in a column.
    • summarise() - Summarise the tbl_teradata using both regular and time series aggregate operations.
    • Added a new SQL translation function as.Date() which typecasts character data type column to SQL DATE data type column. SQL translations for two new character stringr functions - str_detect() and str_replace_all() are also added. Please refer the vignette sql-translation for usage.
    • Added 10 new ML Engine analytic functions.
      • td_adaboost_mle()
      • td_adaboost_predict_mle()
      • td_glml1l2_mle()
      • td_glml1l2_predict_mle()
      • td_pos_tagger_mle()
      • td_text_chunker_mle()
      • td_text_classifier_evaluator_mle()
      • td_text_classifier_mle()
      • td_text_classifier_trainer_mle()
      • td_text_morph_mle()

tdplyr 16.20.00.04

  • Added 4 new SQL Engine analytic functions, which will work only with Vantage 1.1.
    • td_antiselect_sqle()
    • td_pack_sqle()
    • td_string_similarity_sqle()
    • td_unpack_sqle()

tdplyr 16.20.00.03

  • Support for Teradata SQL driver for R has been added. Teradata recommends to use Teradata SQL driver for R with tdplyr.
  • ODBC Driver with tdplyr will be deprecated. Warning message will be shown when ODBC driver is used to connect to Teradata Vantage.
  • Support added to td_create_context()(connect to Teradata Vantage) to support different logon mechanisms with Teradata SQL Driver. Logon mechanisms supported are: TD2, LDAP, TDNEGO, KRB5(for Kerberos).
  • Below ML Engine functions have new arguments added and the new arguments will work only with Vantage 1.1 or later releases.
    • td_decision_forest_mle()
    • td_decision_tree_mle()
    • td_knn_mle()
    • td_varmax_mle()
  • Simplified procedure to install tdplyr package with its dependent packages including Teradata SQL driver for R.

tdplyr 16.20.00.02

  • 45 new MLEngine Analytic functions added. The package now has 105 analytic functions(9 SQLEngine and 96 MLEngine functions)
  • Enhanced copy_to function to have "append" option.
  • Updated analytic function names to contain information about the execution engine. Analytic function names will have "_sqle" or "_mle" extension in the function names.

tdplyr 16.20.00.01

  • tdplyr 16.20.00.01 is the first release version. Please refer to the Teradata R Package User Guide for a list of Limitations and Usage Considerations.