The td_fastload() API writes records from an R data.frame to Vantage using FastLoad, and can be used to quickly load large amounts of data in an empty table on Vantage.
Teradata recommends to use this API when number of rows in the R data.frame is greater than 100,000, to have better performance. To insert lesser rows, use copy_to() for optimized performance. The data is loaded in batches.
- The td_fastload() API cannot load duplicate rows of the data.frame even if the table is a MULTISET table.
- If there are any incorrect rows due to constraint violations, data type conversion errors, and so on, FastLoad protocol ignores those rows and inserts all valid rows.
- Rows in the data.frame that failed to get inserted are categorized into errors and warnings by FastLoad protocol and these errors and warnings are stored into respective error and warning tables by FastLoad API.
- FastLoad does not support all Teradata SQL data types.
For example, data.frame having BLOB and CLOB data type columns cannot be loaded.
- If the save.errors argument is set to TRUE, the errors and warnings information are persisted and names of error and warning tables are returned once the FastLoad operation is completed.
- errors.dataframe: It is an R data.frame containing error messages thrown by FastLoad protocol.
It has an empty data.frame if there are no errors.
- warning.dataframe: It is an R data.frame containing warning message thrown by FastLoad Protocol.
It has an empty data.frame if there are no errors.
- errors.table: It shows the name of the table containing errors.
It is NULL, if argument save.errors is set to FALSE.
- warnings.table: It shows the name of the table containing warnings.
It is NULL, if argument save.errors is set to FALSE.
See the FastLoad section of https://github.com/Teradata/r-driver for more information about FastLoad protocol through teradatasql driver.
Example Prerequisites
- Install the "nycflights13" package on your R client, if it is not already installed.
> install.packages('nycflights13', repos='https://cloud.r-project.org', quiet = TRUE)
- Load the library.
> library(nycflights13)
- Create a data.frame.
flights_df <- as.data.frame(nycflights13::flights)
Example 1: Save an R data.frame with default signature
> td_fastload(con, flights_df, table.name = "flights_fl") [tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 336776 row(s) in 3 batches with each batch having at least 112258 rows. Fastload: Processed 112258 rows in batch 1. Fastload: Processed 112258 rows in batch 2. Fastload: Processed 112260 rows in batch 3. $errors.dataframe data frame with 0 columns and 0 rows $warnings.dataframe data frame with 0 columns and 0 rows $errors.table NULL $warnings.table NULL
Example 2: Save an R data.frame to overwrite existing "flights_fl" table
> td_fastload(con, flights_df, table.name = "flights_fl", overwrite = TRUE) Warning message: In td_fastload(con, flights_df, table.name = 'flights_fl', overwrite = TRUE): [tdplyr - (TDR_W1011)] Setting 'overwrite = TRUE' will drop existing table 'flights_fl' and recreate it with new schema. [tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 336776 row(s) in 3 batches with each batch having at least 112258 rows. Fastload: Processed 112258 rows in batch 1. Fastload: Processed 112258 rows in batch 2. Fastload: Processed 112260 rows in batch 3. $errors.dataframe data frame with 0 columns and 0 rows $warnings.dataframe data frame with 0 columns and 0 rows $errors.table NULL $warnings.table NULL
Example 3: Append rows in an R data.frame to an already available table
> df1 <- head(flights_df, n = 200000) > df2 <- tail(flights_df, n = 136776)
> fl_list1 <- td_fastload(con, df1, table.name = "flights_fl_append") [tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 200000 row(s) in 2 batches with each batch having at least 1e+05 rows. Fastload: Processed 1e+05 rows in batch 1. Fastload: Processed 1e+05 rows in batch 2. > fl_list2 <- td_fastload(con, df2, table.name = "flights_fl_append", append = TRUE) Warning message: In td_fastload(con, df2, table.name = 'flights_fl_append', append = TRUE): [tdplyr - (TDR_W1010)] Table 'flights_fl_append' is a MULTISET table. Duplicate rows will not be loaded with Fastload. [tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 136776 row(s) in 1 batches with each batch having at least 136776 rows. Fastload: Processed 136776 rows in batch 1.
Example 4: Save an R data.frame specifying column types as unnamed character vector
> td_fastload(con, flights_df, table.name = "flights_fl_coltypes", types = c("INTEGER", "INTEGER", "INTEGER", "INTEGER", "INTEGER", "FLOAT", "INTEGER", "INTEGER", "FLOAT", "VARCHAR(50)", "INTEGER", "VARCHAR(100)", "VARCHAR(50)", "VARCHAR(50)","FLOAT","FLOAT","FLOAT", "FLOAT","TIMESTAMP(6)")) [tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 336776 row(s) in 3 batches with each batch having at least 112258 rows. Fastload: Processed 112258 rows in batch 1. Fastload: Processed 112258 rows in batch 2. Fastload: Processed 112260 rows in batch 3. $errors.dataframe data frame with 0 columns and 0 rows $warnings.dataframe data frame with 0 columns and 0 rows $errors.table NULL $warnings.table NULL
Example 5: Save an R data.frame containing duplicate rows, with save.errors set to FALSE
- Create a data.frame containing duplicate rows.
> df <- data.frame("col1" = c(1,2,1,3,4,4), "col2" = c(6,7,6,8,9,9)) > df col1 col2 1 1 6 2 2 7 3 1 6 4 3 8 5 4 9 6 4 9
- Save an R data.frame containing duplicate rows.
> fl_list <- td_fastload(con, df = df, table.name="fastload_duplicates") [tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 6 row(s) in 1 batches with each batch having at least 6 rows. Fastload: Processed 6 rows in batch 1.
> fl_list $errors.dataframe data frame with 0 columns and 0 rows $warnings.dataframe batch_num 1 batch_summary warning_message 1 [Version 17.0.0.2] [Session 16351] [Teradata SQL Driver] [Warning 518] Found 2 duplicate or faulty row(s) while ending FastLoad of database table "alice"."fastload_duplicates": expected a row count of 6, got a row count of 4 $errors.table NULL $warnings.table NULL
> df_f <- tbl(con, "fastload_duplicates") > df_f # Source: table<fastload_duplicates> [?? x 2] # Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2] [TDAPUSER@<hostname>/TDAPUSERDB] col1 col2 <dbl> <dbl> 1 3 8 2 2 7 3 4 9 4 1 6
Example 6: Save an R data.frame containing duplicate rows, with save.errors set to TRUE
> fl_list <- td_fastload(con, df = df, table.name="fastload_duplicates", overwrite = TRUE, save.errors = TRUE) Warning message: In td_fastload(con, df = df, table.name = 'fastload_duplicates', overwrite = TRUE, save.errors = TRUE): [tdplyr - (TDR_W1011)] Setting 'overwrite = TRUE' will drop existing table 'fastload_duplicates' and recreate it with new schema. [tdplyr - (TDR_P3400)] td_fastload() will insert the data containing 6 row(s) in 1 batches with each batch having at least 6 rows. Fastload: Processed 6 rows in batch 1. [tdplyr - (TDR_E1001)] Persisted all 1 fastload warning(s) to table 'td_fl_fastload_duplicates_warn_81'...!
> fl_list $errors.dataframe data frame with 0 columns and 0 rows $warnings.dataframe batch_num 1 batch_summary warning_message 1 [Version 17.0.0.2] [Session 16361] [Teradata SQL Driver] [Warning 518] Found 2 duplicate or faulty row(s) while ending FastLoad of database table "alice"."fastload_duplicates": expected a row count of 6, got a row count of 4 $errors.table NULL $warnings.table [1] "td_fl_fastload_duplicates_warn_81"
> tbl(con, "td_fl_fastload_duplicates_warn_81") # Source: table<td_fl_fastload_duplicates_warn_81> [?? x 2] # Database: [Teradata 16.20.50.01] [Teradata Native Driver 17.0.0.2] [TDAPUSER@<hostname>/TDAPUSERDB] batch_num warning_message <chr> <chr> 1 batch_summary "[Version 17.0.0.2] [Session 16361] [Teradata SQL Driver] [Warning 518] Found 2 duplicate or~