Teradata R Package - tdplyr 17.20.00.00
Teradata Vantage Client R Analytic library
Copyright ©2018 - 2024, Teradata. All Rights Reserved.
General product information is available in the Teradata Documentation website, https://docs.teradata.com/
- Teradata R Package User Guide – B700-4005
- Teradata R Function Reference – B700-4007
Minimum System Requirements:
Teradata Vantage:
- Vantage 1.0 - Maintenance Update 2 or later versions
- Advanced SQL (was NewSQL) Engine 16.20 Feature Update 1 or later versions
- Teradata Machine Learning Engine 08.00.03.00 or later versions
Supported Drivers:
- Teradata SQL Driver 17.20.0.26 (Recommended)
- Teradata ODBC Driver 16.20
Operating Systems: (64-bit only)
- Windows 7
- MacOS OSX 10.9
- Ubuntu 16
- RHEL 7
- SLES/OpenSUSE 12
Release Notes:
tdplyr 17.20.00.00
- New Features/Functions:
-
Support added for following Analytics Database Analytic Functions
-
MODEL SCORING functions:
td_glm_predict_per_segment()
td_kmeans_predict()
td_one_class_svm_predict()
td_svm_predict()
td_glm_predict()
td_xg_boost_predict()
-
FEATURE ENGINEERING TRANSFORM functions:
td_antiselect()
td_bincode_fit()
td_bincode_transform()
td_column_transformer()
td_fit()
td_non_linear_combine_fit()
td_non_linear_combine_transform()
td_one_hot_encoding_fit()
td_one_hot_encoding_transform()
td_ordinal_encoding_fit()
td_ordinal_encoding_transform()
td_polynomial_features_fit()
td_polynomial_features_transform()
td_random_projection_fit()
td_random_projection_min_components()
td_random_projection_transform()
td_row_normalize_fit()
td_row_normalize_transform()
td_scale_fit()
td_scale_transform()
td_target_encoding_fit()
td_target_encoding_transform()
td_transform()
-
PATH AND PATTERN ANALYSIS functions:
td_attribution()
td_npath()
td_sessionize()
-
SCORING WITH MACHINE LEARNING ENGINE MODELS functions:
td_decision_forest_predict_mle_sqle()
td_decision_tree_predict_mle_sqle()
td_glm_predict_mle_sqle()
td_naivebayes_predict_mle_sqle()
td_svm_sparse_predict_mle_sqle()
-
DATA EXPLORATION functions:
td_categorical_summary()
td_column_summary()
td_get_rows_with_missing_values()
td_histogram()
td_moving_average()
td_qq_norm()
td_univariate_statistics()
td_which_max()
td_which_min()
-
TEXT ANALYTIC functions:
td_naivebayes_textclassifier_trainer()
td_naivebayes_textclassifier_predict()
td_ngramsplitter()
td_sentiment_extractor()
td_text_parser()
td_word_embeddings()
-
DATA CLEANING functions:
td_convert_to()
td_get_futile_columns()
td_get_rows_without_missing_values()
td_outlier_filter_fit()
td_outlier_filter_transform()
td_pack()
td_simple_impute_fit()
td_simple_impute_transform()
td_string_similarity()
td_unpack()
-
HYPOTHESIS TESTING functions:
td_anova()
td_chi_sq()
td_ftest()
td_ztest()
-
MODEL EVALUATION functions:
td_classification_evaluator()
td_regression_evaluator()
td_roc()
td_silhouette()
td_train_test_split()
-
MODEL TRAINING functions:
td_decision_forest()
td_glm()
td_glm_per_segment()
td_kmeans()
td_knn()
td_one_class_svm()
td_svm()
td_vector_distance()
td_xg_boost()
-
FEATURE ENGINEERING UTILITY functions:
td_fill_row_id()
td_num_apply()
td_round_columns()
td_str_apply()
-
TABLE OPERATOR Functions:
td_read_nos()
td_write_nos()
-
BYOM Functions:
td_dataiku_predict()
td_h2o_predict()
td_onnx_predict()
td_pmml_predict()
td_data_robot_predict()
-
-
display_analytic_functions()
- API displays all the available Analytic functions based on Vantage version and categorizes them by function type. -
Following analytic functions that accept MLE models as input are renamed:
td_decision_forest_predict_sqle()
-->td_decision_forest_predict_mle_sqle()
td_decision_tree_predict_sqle()
-->td_decision_tree_predict_mle_sqle()
td_glm_predict_sqle()
-->td_glm_predict_mle_sqle()
td_naivebayes_textclassifier_predict_sqle()
-->td_naivebayes_textclassifier_predict_mle_sqle()
td_naivebayes_predict_sqle()
-->td_naivebayes_predict_mle_sqle()
td_svm_sparse_predict_sqle()
-->td_svm_sparse_predict_mle_sqle()
-
Following functions do not accept models generated by MLE, but they now accept the model generated by the Analytics Database Analytic Functions exposed by SQLE.
- Note -
If the Vantage version does not expose the function,
the following functions won't be available.
td_decision_forest_predict()
td_glm_predict()
td_naivebayes_predict()
td_svm_sparse_predict()
- Note -
If the Vantage version does not expose the function,
the following functions won't be available.
-
The following functions overlap between SQLE and MLE. They will now point to their SQLE equivalent based on the vantage version they are connected to. In order to use the corresponding MLE function, we should add the "_mle" suffix.
td_decision_forest()
td_glm()
td_histogram()
td_kmeans()
td_knn()
td_text_parser()
td_naivebayes_textclassifier_predict()
td_text_parser()
td_vector_distance()
tdplyr 17.00.00.04
- Important Notification:
- tdplyr is now compatible with
R 4.3.0
. - Minimum teradatasql version required is 17.20.0.26 or later.
- tdplyr is now compatible with
- Bug Fixes:
- VAL functions now raise appropriate error in case of failures. For example, if there is no room in database or insufficient permissions.
cumsum()
androw_number()
functions work with tdplyr 17.00.00.04.
tdplyr 17.00.00.03
- Important Notification:
- tdplyr is now compatible with latest version of
dbplyr 2.3.2
. - Minimum teradatasql version required is 17.20.0.19 or later.
- tdplyr is now compatible with latest version of
- New Features/Functions:
attach_attributes()
: User can now attach attributes to the 'tbl' object using attach_atrributes().
The attributes attached are:- databaseName - The name of the database in which the table exists.
- sourceDefinition - Name of the source of input i.e table name or sql query.
- sourceType - Type of input source i.e table or query.
- object - Full name of the table when source type is table, else NULL.
- baseQuery - Base Query for the tbl.
- columnDataType - DataTypes of the columns.
- columnNames - Names of the columns.
- Bug Fixes:
td_fastload()
now uploads single column dataframe.
tdplyr 17.00.00.02
- Important Notification:
For tdplyr, minimum teradatasql version required is 17.10.0.10. - New Features/Functions
- Data Export: to_csv support is added:
td_to_csv()
- Data Export: to_csv support is added:
- Updates/Improvements:
- Newly added tdplyr options:
- tdplyr has added one new option that provide flexibility to the user to point to the database where byom is intalled.
- User can use functions like td_pmml_predict() while being connected to another database.
- Following is the name of the option:
byom.install.location
- Specifies the database where byom is installed.
td_fastexport()
- Function allows the user to export the data to a CSV file.
- User can also use field.separator and field.quote.char to specify the separator and quote character to be used for writing the CSV file.
- Performance of exporting data to CSV, instead of data.frame is better, while using td_fastexport and td_to_csv.
- Newly added tdplyr options:
tdplyr 17.00.00.01
- Important Notification:
For tdplyr, minimum dbplyr version required is 2.0. tdplyr has been updated to support dbplyr version 2.0 or later. - New Features/Functions:
- Bring Your Own Models (BYOM):
td_pmml_predict()
- Data Export: FastExport support is added
td_fastexport()
- Vantage Analytic Library (VAL) Functions:
- Association Rules
td_association_valib()
- Decision Tree
td_decision_tree_valib()
td_decision_tree_evaluator_valib()
td_decision_tree_predict_valib()
- Descriptive Statistics (HTML File Available with Introduction)
td_adaptive_histogram_valib()
td_explore_valib()
td_frequency_valib()
td_histogram_valib()
td_overlap_valib()
td_statistics_valib()
td_text_analyzer_valib()
td_values_valib()
- Factor Analysis
td_pca_valib()
td_pca_evaluator_valib()
td_pca_predict_valib()
- Fast KMeans Clustering
td_kmeans_valib()
td_kmeans_predict_valib()
- Linear Regression
td_lin_reg_valib()
td_lin_reg_evaluator_valib()
td_lin_reg_predict_valib()
- Logistic Regression
td_log_reg_valib()
td_log_reg_evaluator_valib()
td_log_reg_predict_valib()
- Matrix Building
td_matrix_valib()
- Statistical Tests (HTML File Available with Introduction)
td_binomial_test_valib()
td_chi_square_test_valib()
td_ks_test_valib()
td_parametric_test_valib()
td_rank_test_valib()
- Reports
td_xml_to_html_report_valib()
- Variable Transformation
td_transform_valib()
- Transformation Techniques to use with 'Transform'
tdBinning()
tdDerive()
tdFillNa()
tdLabelEncoder()
tdMinMaxScalar()
tdOneHotEncoder()
tdRetain()
tdSigmoid()
tdZScore()
- Supporting Utilies/Functionality
- S3 Generic:
print()
- Allows user to print the output of any of the VAL function. - View Underlying SQL:
show_query()
- Allows user to print the underlying SQL used for function execution using the output object.- Option
print.val.query
allows user to print the SQL while the VAL function is running. This can be used for debugging purpose.
- S3 Generic:
- Association Rules
- Sandbox Container Utility Functions
td_cleanup_sandbox_env()
td_copy_files_from_container()
td_setup_sandbox_env()
- Bring Your Own Models (BYOM):
- Updates/Improvements
- Newly added tdplyr options:
- tdplyr has added two new options that provide flexibility to the user
to control which database must be used to create tables and views,
generated by the tdplyr APIs, which are garbage collected at the end
of the session.
Following are the two options:
temp.table.database.name
- Specifies the database to create tables into.temp.view.database.name
- Specifies the database to create views into.
These options allow user to set the output database without re-creating context.
- tdplyr has added two new options that provide flexibility to the user
to control which database must be used to create tables and views,
generated by the tdplyr APIs, which are garbage collected at the end
of the session.
Following are the two options:
td_test_script()
- Function now also allows testing on local client, i.e., outside of sandbox environment, in addition to existing node of sandbox testing.
- Function
td_setup_test_env()
is deprecated now. Use newly added sandbox container utility functiontd_setup_sandbox_env()
instead.
- Newly added tdplyr options:
tdplyr 17.00.00.00
- New Features/Functions
- Model Cataloging - Functionality to catalog model metadata and related information in the Model Catalog.
td_save_model()
- Save a tdplyr analytic function model.td_retrieve_model()
- Retrieve a saved model.td_list_models()
- List accessible models.td_describe_model()
- List the details of a model.td_delete_model()
- Remove a model from Model Catalog.td_publish_model()
- Share a model.
- Script - An interface to run user scripts in Advanced SQL Engine using Teradata's Script Table Operator (STO).
Script()
- Creates and initializes an object of "ScriptTableOperator" class.- The following functions enable the user to run user scripts locally in a containerized enviroment:
td_setup_test_env()
- Setup up the test environment.td_test_script()
- Test user script in containerized environment.td_set_data()
- Update/set test data parameters.
- The following functions enable the user to run user scripts in Advanced SQL Engine:
td_execute_script()
- Execute user script in Vantage.td_set_data()
- Update/set test data parameters.
- Vantage File Management Functions
td_install_file()
- Install or replace file in Vantage.td_remove_file()
- Remove an installed file from Vantage.
- Time series aggregate functions - Added new SQL translations to support the following time series aggregate functions. Please refer the vignette
time_series_aggregates
for the usage of all time series aggregate functions.ts.perentile()
- Calculate the nth percentile value of a column.ts.delta_t()
- Calculate the time difference (i.e, DELTA_T) between starting and ending events.ts.kurtosis()
- Calculate the kurtosis value of a column.ts.skew()
- Measure the skewness of the distribution of a column.ts.sum()
- Calculate the sum of values in the column.ts.sd()
- Calculate the sample standard deviation of a column.ts.sdp()
- Calculate the population standard deviation of a column.ts.var()
- Calculate the sample variance of a column.ts.varp()
- Calculate the population variance of a column.ts.min()
- Calculate the minimum value of a column.ts.max()
- Calculate the maximum value of a column.ts.mean()
- Calculate the average value of a column.ts.n()
- Calculate the count of qualified rows in a column.ts.describe()
- Calculate the statistics of a column.
- Regular aggregate functions - Added new SQL translations to support the following regular aggregate functions. Please refer the vignette
regular_aggregates
for the usage of all regular aggregate functions.kurtosis()
- Calculate the kurtosis value of a column.skew()
- Measure the skewness of the distribution of a column.n()
- Calculate the count of qualified rows in a column. This dbplyr SQL aggregate function is overwritten to take a column name, with default value as "*" and builds a SQL expression to return count value of typebit64::integer64
.n_distinct()
- This SQL aggregate function is updated to return count value of typebit64::integer64
.
- Window aggregate functions - Added new SQL translations to support the following window aggregate function. Please refer the vignette
sql-translation
for the usage of all window aggregate functions.n()
- Calculate the count of qualified rows in a column. This dbplyr SQL aggregate function is overwritten to take a column name, with default value as "*" and builds a SQL expression to return count value of typebit64::integer64
.
- Model Cataloging - Functionality to catalog model metadata and related information in the Model Catalog.
- Updates/Improvements:
- Support added for Stored Password Protection feature. Please refer to the documentation and examples of
td_create_context()
function. - Support added in
td_create_context()
to connect to Teradata Vantage using JWT logon mechanism. - Using
DBI::dbConnect(tdplyr::NativeDriver(), ...)
creates context unlike earlier releases. To create connection using Teradata SQL Driver for R, please refer the documentation here. - Updated the way
td_set_context()
works based on the type of the connection object. - The
td_nrow()
function returns the count of the rows in tbl_teradata asbit64::integer64
type.
- Support added for Stored Password Protection feature. Please refer to the documentation and examples of
tdplyr 16.20.00.06
- Compatible with Vantage 1.1.1.\
The following ML Engine functions have new and/or updated arguments to support the Vantage version:
td_adaboost_predict_mle()
td_decision_forest_predict_mle()
td_glm_predict_mle()
td_lda_mle()
td_naivebayes_predict_mle()
td_naivebayes_textclassifier_predict_mle()
td_svm_dense_predict_mle()
td_svm_sparse_mle()
td_svm_sparse_predict_mle()
td_xgboost_predict_mle()
- Added support to use generic predict function for
td_adaboost_predict_mle()
. Nowtd_adaboost_predict_mle()
can be called usingpredict(<td_adaboost_mle_obj>, ...)
.
tdplyr 16.20.00.05
- Improvements
- Improved performance when using output of analytic functions as input to dplyr verbs or
copy_to()
. - Support added to load time series Primary Time Index (PTI) tables using
copy_to()
. - Added more examples and updated documentation for some APIs.
- Improved performance when using output of analytic functions as input to dplyr verbs or
- New Features/Functions
- tdplyr in conjunction with Teradata SQL Driver for R supports integration with RStudio Connections Pane for exploring the data source once connection is established with Vantage. Please refer the link for more information about RStudio Connection Pane.
td_fastload()
- Enable high performance data loading of large amount of data into a table in Teradata Vantage.td_sample()
- Sample proportion of data.- Time series functions
group_by_time()
- Group the tbl_teradata based on time.- Time series aggregate functions - Added new SQL translations to perform aggregate operations on time series objects of class "tbl_teradata" grouped by time. Please refer the vignette
time_series_aggregates
for usage.ts.bottom()
- Calculate the smallest n values in a column.ts.top()
- Calculate the largest n values in a column.ts.first()
- Calculate the oldest value, determined by timecode, in a column.ts.last()
- Calculate the newest value, determined by timecode, in a column.ts.mad()
- Calculate the Median Absolute Deviation of the values in a column.ts.median()
- Calculate the median of the values in a column.ts.mode()
- Calculate the mode of the values in a column.
summarise()
- Summarise the tbl_teradata using both regular and time series aggregate operations.- Added a new SQL translation function
as.Date()
which typecasts character data type column to SQL DATE data type column. SQL translations for two new character stringr functions -str_detect()
andstr_replace_all()
are also added. Please refer the vignettesql-translation
for usage. - Added 10 new ML Engine analytic functions.
td_adaboost_mle()
td_adaboost_predict_mle()
td_glml1l2_mle()
td_glml1l2_predict_mle()
td_pos_tagger_mle()
td_text_chunker_mle()
td_text_classifier_evaluator_mle()
td_text_classifier_mle()
td_text_classifier_trainer_mle()
td_text_morph_mle()
tdplyr 16.20.00.04
- Added 4 new SQL Engine analytic functions, which will work only with Vantage 1.1.
td_antiselect_sqle()
td_pack_sqle()
td_string_similarity_sqle()
td_unpack_sqle()
tdplyr 16.20.00.03
- Support for Teradata SQL driver for R has been added. Teradata recommends to use Teradata SQL driver for R with tdplyr.
- ODBC Driver with tdplyr will be deprecated. Warning message will be shown when ODBC driver is used to connect to Teradata Vantage.
- Support added to
td_create_context()
(connect to Teradata Vantage) to support different logon mechanisms with Teradata SQL Driver. Logon mechanisms supported are: TD2, LDAP, TDNEGO, KRB5(for Kerberos). - Below ML Engine functions have new arguments added and the new arguments will work only with Vantage 1.1 or later releases.
td_decision_forest_mle()
td_decision_tree_mle()
td_knn_mle()
td_varmax_mle()
- Simplified procedure to install tdplyr package with its dependent packages including Teradata SQL driver for R.
tdplyr 16.20.00.02
- 45 new MLEngine Analytic functions added. The package now has 105 analytic functions(9 SQLEngine and 96 MLEngine functions)
- Enhanced copy_to function to have "append" option.
- Updated analytic function names to contain information about the execution engine. Analytic function names will have "_sqle" or "_mle" extension in the function names.
tdplyr 16.20.00.01
- tdplyr 16.20.00.01 is the first release version. Please refer to the Teradata R Package User Guide for a list of Limitations and Usage Considerations.