td_fastload() | Creating tables in Vantage | Teradata Package for R - td_fastload() - Teradata Package for R

Teradata® Package for R User Guide

Product
Teradata Package for R
Release Number
17.00
Published
July 2021
Language
English (United States)
Last Update
2023-08-08
dita:mapPath
yih1585763700215.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4005
Product Category
Teradata Vantage

The td_fastload() API writes records from an R data.frame to Vantage using FastLoad, and can be used to quickly load large amounts of data in an empty table on Vantage.

Teradata recommends to use this API when number of rows in the R data.frame is greater than 100,000, to have better performance. To insert lesser rows, use copy_to() for optimized performance. The data is loaded in batches.

  • The td_fastload() API cannot load duplicate rows of the data.frame even if the table is a MULTISET table.
  • If there are any incorrect rows due to constraint violations, data type conversion errors, and so on, FastLoad protocol ignores those rows and inserts all valid rows.
  • Rows in the data.frame that failed to get inserted are categorized into errors and warnings by FastLoad protocol and these errors and warnings are stored into respective error and warning tables by FastLoad API.
  • FastLoad does not support all Teradata SQL data types.

    For example, data.frame having BLOB and CLOB data type columns cannot be loaded.

  • If the save.errors argument is set to TRUE, the errors and warnings information are persisted and names of error and warning tables are returned once the FastLoad operation is completed.
FastLoad returns a named list containing the following attributes:
  • errors.dataframe: It is an R data.frame containing error messages thrown by FastLoad protocol.

    It has an empty data.frame if there are no errors.

  • warning.dataframe: It is an R data.frame containing warning message thrown by FastLoad Protocol.

    It has an empty data.frame if there are no errors.

  • errors.table: It shows the name of the table containing errors.

    It is NULL, if argument save.errors is set to FALSE.

  • warnings.table: It shows the name of the table containing warnings.

    It is NULL, if argument save.errors is set to FALSE.

See the FastLoad section of https://github.com/Teradata/r-driver for more information about FastLoad protocol through teradatasql driver.

Example Prerequisites

  • Install the "nycflights13" package on your R client, if it is not already installed.
    > install.packages('nycflights13', repos='https://cloud.r-project.org', quiet = TRUE)
  • Load the library.
    > library(nycflights13)
  • Create a data.frame.
    flights_df <- as.data.frame(nycflights13::flights)

Example 1: Save an R data.frame with default signature

> td_fastload(con, flights_df, table.name = "flights_fl")
[tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 336776 row(s) in 3 batches with each batch having at least 112258 rows.
Fastload: Processed 112258 rows in batch 1.
Fastload: Processed 112258 rows in batch 2.
Fastload: Processed 112260 rows in batch 3.
$errors.dataframe
data frame with 0 columns and 0 rows
 
$warnings.dataframe
data frame with 0 columns and 0 rows
 
$errors.table
NULL
 
$warnings.table
NULL

Example 2: Save an R data.frame to overwrite existing "flights_fl" table

> td_fastload(con, flights_df, table.name = "flights_fl", overwrite = TRUE)
Warning message:
In td_fastload(con, flights_df, table.name = 'flights_fl', overwrite = TRUE):
[tdplyr - (TDR_W1011)] Setting 'overwrite = TRUE' will drop existing table 'flights_fl' and recreate it with new schema.
[tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 336776 row(s) in 3 batches with each batch having at least 112258 rows.
Fastload: Processed 112258 rows in batch 1.
Fastload: Processed 112258 rows in batch 2.
Fastload: Processed 112260 rows in batch 3.
$errors.dataframe
data frame with 0 columns and 0 rows
 
$warnings.dataframe
data frame with 0 columns and 0 rows
 
$errors.table
NULL
 
$warnings.table
NULL

Example 3: Append rows in an R data.frame to an already available table

> df1 <- head(flights_df, n = 200000)

> df2 <- tail(flights_df, n = 136776)
> fl_list1 <- td_fastload(con, df1, table.name = "flights_fl_append")
[tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 200000 row(s) in 2 batches with each batch having at least 1e+05 rows.
Fastload: Processed 1e+05 rows in batch 1.
Fastload: Processed 1e+05 rows in batch 2.

> fl_list2 <- td_fastload(con, df2, table.name = "flights_fl_append", append = TRUE)
Warning message:
In td_fastload(con, df2, table.name = 'flights_fl_append', append = TRUE):
[tdplyr - (TDR_W1010)] Table 'flights_fl_append' is a MULTISET table. Duplicate rows will not be loaded with Fastload.
[tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 136776 row(s) in 1 batches with each batch having at least 136776 rows.
Fastload: Processed 136776 rows in batch 1.

Example 4: Save an R data.frame specifying column types as unnamed character vector

> td_fastload(con, flights_df, table.name = "flights_fl_coltypes", types = c("INTEGER", "INTEGER", "INTEGER", "INTEGER", "INTEGER", "FLOAT", "INTEGER", "INTEGER", "FLOAT", "VARCHAR(50)", "INTEGER", "VARCHAR(100)", "VARCHAR(50)", "VARCHAR(50)","FLOAT","FLOAT","FLOAT", "FLOAT","TIMESTAMP(6)"))
[tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 336776 row(s) in 3 batches with each batch having at least 112258 rows.
Fastload: Processed 112258 rows in batch 1.
Fastload: Processed 112258 rows in batch 2.
Fastload: Processed 112260 rows in batch 3.
$errors.dataframe
data frame with 0 columns and 0 rows
 
$warnings.dataframe
data frame with 0 columns and 0 rows
 
$errors.table
NULL
 
$warnings.table
NULL

Example 5: Save an R data.frame containing duplicate rows, with save.errors set to FALSE

The errors and warnings are returned in the data.frames when the save.errors argument is set to FALSE.
  • Create a data.frame containing duplicate rows.
    > df <- data.frame("col1" = c(1,2,1,3,4,4), "col2" = c(6,7,6,8,9,9))
    > df
     col1 col2
    1   1   6
    2   2   7
    3   1   6
    4   3   8
    5   4   9
    6   4   9
  • Save an R data.frame containing duplicate rows.
    > fl_list <- td_fastload(con, df = df, table.name="fastload_duplicates")
    [tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 6 row(s) in 1 batches with each batch having at least 6 rows.
    Fastload: Processed 6 rows in batch 1.
    > fl_list
    $errors.dataframe
    data frame with 0 columns and 0 rows
     
    $warnings.dataframe
          batch_num
    1 batch_summary
                                                                                                                                                                                                                       warning_message
    1 [Version 17.0.0.2] [Session 16351] [Teradata SQL Driver] [Warning 518] Found 2 duplicate or faulty row(s) while ending FastLoad of database table "alice"."fastload_duplicates": expected a row count of 6, got a row count of 4
     
    $errors.table
    NULL
     
    $warnings.table
    NULL
    > df_f <- tbl(con, "fastload_duplicates")
    > df_f
    # Source:   table<fastload_duplicates> [?? x 2]
    # Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2] [TDAPUSER@<hostname>/TDAPUSERDB]
       col1  col2
      <dbl> <dbl>
    1     3     8
    2     2     7
    3     4     9
    4     1     6

Example 6: Save an R data.frame containing duplicate rows, with save.errors set to TRUE

The errors and warnings tables are returned along with errors and warnings data.frames when save.errors is set to TRUE.
> fl_list <- td_fastload(con, df = df, table.name="fastload_duplicates", overwrite = TRUE, save.errors = TRUE)
Warning message:
In td_fastload(con, df = df, table.name = 'fastload_duplicates', overwrite = TRUE, save.errors = TRUE):
[tdplyr - (TDR_W1011)] Setting 'overwrite = TRUE' will drop existing table 'fastload_duplicates' and recreate it with new schema.
[tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 6 row(s) in 1 batches with each batch having at least 6 rows.
Fastload: Processed 6 rows in batch 1.
[tdplyr - (TDR_E1001)] Persisted all 1 fastload warning(s) to table 'td_fl_fastload_duplicates_warn_81'...!
> fl_list
$errors.dataframe
data frame with 0 columns and 0 rows
 
$warnings.dataframe
      batch_num
1 batch_summary
                                                                                                                                                                                                                   warning_message
1 [Version 17.0.0.2] [Session 16361] [Teradata SQL Driver] [Warning 518] Found 2 duplicate or faulty row(s) while ending FastLoad of database table "alice"."fastload_duplicates": expected a row count of 6, got a row count of 4
 
$errors.table
NULL
 
$warnings.table
[1] "td_fl_fastload_duplicates_warn_81"
> tbl(con, "td_fl_fastload_duplicates_warn_81")
# Source:   table<td_fl_fastload_duplicates_warn_81> [?? x 2]
# Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2] [TDAPUSER@<hostname>/TDAPUSERDB]
  batch_num     warning_message                                                                             
  <chr>         <chr>                                                                                       
1 batch_summary "[Version 17.0.0.2] [Session 16361] [Teradata SQL Driver] [Warning 518] Found 2 duplicate or~