Introduction to tdplyr Package | Teradata Package for R - The tdplyr Package - Teradata Package for R

Teradata® Package for R User Guide

Product
Teradata Package for R
Release Number
17.00
Published
July 2021
Language
English (United States)
Last Update
2023-08-08
dita:mapPath
yih1585763700215.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4005
Product Category
Teradata Vantage

The tdplyr package runs on the client system and is designed for data management, exploration, and execution of analytic functions.

The current version of the tdplyr package includes over 100 functions, organized into these functional areas:

  • Utility and database management functions
  • Data exploration and preparation functions
  • Analytic functions in ML Engine and Analytics Database:

    These functions support high-speed analytics processing required to operationalize analytics and automate data partitioning and parallel processing in Vantage.

tdplyr and dplyr

For R users familiar with the dplyr R package, the tdplyr package is a complement to dplyr and dbplyr packages that enables enhanced interaction with Vantage. The tdplyr package follows the syntax of the dplyr package. Currently, you can use many of the dplyr verbs (methods) to connect out-of-the-box to Vantage and perform the corresponding tasks.

The tdplyr package adopts the dplyr verbs, and further enhances its compatibility with Vantage to provide advanced and platform-related functionality that is unavailable in the dplyr package. For example:
  • The dplyr modulo operator %% falsely translates into the % operator when the back end is a database created in Vantage. The correct functionality is restored in the tdplyr package.

The result is a near dplyr-compatible and seamless experience for R users who want to perform analytics on Vantage from their preferred R client.

dplyr verbs can be used with the tdplyr package to access and manipulate data in the Analytics Database. The look and feel is very similar to that of interacting with a regular data frame in R. Instead of a data frame, a tibble is used to represent data. A remote tibble represents a table, view, or query in an Analytics Database. The tdplyr and dplyr functions that access or manipulate a remote tibble are translated to equivalent SQL to be executed in the Analytics Database or ML Engine through a Teradata SQL Driver for R connection. Only a subset of data is ever retrieved from Vantage unless explicitly requested to retrieve all data.

The tdplyr and dplyr functions that access or manipulate a remote tibble are translated to equivalent SQL to be executed in the Analytics Database or ML Engine through a Teradata SQL Driver for R connection.