Introduction to tdplyr Package | Teradata R Package - 17.00 - The tdplyr Package - Teradata R Package

Teradata® R Package User Guide

prodname
Teradata R Package
vrm_release
17.00
created_date
November 2020
category
User Guide
featnum
B700-4005-090K

The tdplyr package runs on the client system and is designed for data management, exploration, and execution of analytic functions.

The current version of the tdplyr package includes over 100 functions, organized into these functional areas:

  • Utility and database management functions
  • Data exploration and preparation functions
  • Analytic functions in ML Engine and Advanced SQL Engine:

    These functions support high-speed analytics processing required to operationalize analytics and automate data partitioning and parallel processing in Vantage.

tdplyr and dplyr

For R users familiar with the dplyr R package, the tdplyr package is a complement to dplyr and dbplyr packages that enables enhanced interaction with Vantage. The tdplyr package follows the syntax of the dplyr package. Currently, you can use many of the dplyr verbs (methods) to connect out-of-the-box to Vantage and perform the corresponding tasks.

The tdplyr package adopts the dplyr verbs, and further enhances its compatibility with Vantage to provide advanced and platform-related functionality that is unavailable in the dplyr package. For example:
  • The dplyr modulo operator %% falsely translates into the % operator when the back end is a Vantage database. The correct functionality is restored in the tdplyr package.

The result is a near dplyr-compatible and seamless experience for R users who want to perform analytics on Vantage from their preferred R client.

dplyr verbs can be used with the tdplyr package to access and manipulate data in the Advanced SQL Engine. The look and feel is very similar to that of interacting with a regular data frame in R. Instead of a data frame, a tibble is used to represent data. A remote tibble represents a table, view, or query in an Advanced SQL Engine. The tdplyr and dplyr functions that access or manipulate a remote tibble are translated to equivalent SQL to be executed in the Advanced SQL Engine or ML Engine through a Teradata SQL Driver for R connection. Only a subset of data is ever retrieved from Vantage unless explicitly requested to retrieve all data.

The tdplyr and dplyr functions that access or manipulate a remote tibble are translated to equivalent SQL to be executed in the Advanced SQL Engine or ML Engine through a Teradata SQL Driver for R connection.