Introduction to tdplyr Package | Teradata Package for R - 16.20 - The tdplyr Package - Teradata R Package

Teradata® R Package User Guide

Product
Teradata R Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2022-05-02
dita:mapPath
qbt1519078127352.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval

The tdplyr package runs on the client system and is designed for data management, exploration, and execution of analytic functions.

The current version of the tdplyr package includes over 100 functions, organized into these functional areas:

  • Utility and database management functions
  • Data exploration and preparation functions
  • Analytic functions in ML Engine and Advanced SQL Engine:

    These functions support high-speed analytics processing required to operationalize analytics and automate data partitioning and parallel processing in Vantage.

tdplyr and dplyr

For R users familiar with the dplyr R package, the tdplyr package is a complement to dplyr and dbplyr packages that enables enhanced interaction with Vantage. The tdplyr package follows the syntax of the dplyr package. Currently, you can use many of the dplyr verbs (methods) to connect out-of-the-box to Vantage and perform the corresponding tasks.

The tdplyr package adopts the dplyr verbs, and further enhances its compatibility with Vantage to provide advanced and platform-related functionality that is unavailable in the dplyr package. For example:
  • The dplyr modulo operator %% falsely translates into the % operator when the back end is a Vantage database. The correct functionality is restored in the tdplyr package.

The result is a near dplyr-compatible and seamless experience for R users who want to perform analytics on Vantage from their preferred R client.

dplyr verbs can be used with the tdplyr package to access and manipulate data in the Advanced SQL Engine. The look and feel is very similar to that of interacting with a regular data frame in R. Instead of a data frame, a tibble is used to represent data. A remote tibble represents a table, view, or query in an Advanced SQL Engine. The tdplyr and dplyr functions that access or manipulate a remote tibble are translated to equivalent SQL to be executed in the Advanced SQL Engine or ML Engine through a Teradata SQL Driver for R connection. Only a subset of data is ever retrieved from Vantage unless explicitly requested to retrieve all data.

The tdplyr and dplyr functions that access or manipulate a remote tibble are translated to equivalent SQL to be executed in the Advanced SQL Engine or ML Engine through a Teradata SQL Driver for R connection.