Teradata R Package Function Reference | 17.00 - 17.00 - td_fastload - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
17.00
created_date
September 2020
category
Programming Reference
featnum
B700-4007-090K

Description

This function loads data from a local dataframe into a remote data source using the fastload protocol. A MULTISET table will be created if it is not present in the Teradata SQL Engine.

Usage Notes:

  1. td_fastload function cannot be used with ODBC driver.

  2. td_fastload function cannot load duplicate rows of the dataframe even if the table is a MULTISET table.

  3. Only use td_fastload function to load many rows (at least 100,000 rows) so that the row-loading performance exceeds the overhead of opening additional connections. To insert small amount of data, please use copy_to for better performance.

  4. td_fastload function does not support all Teradata SQL data types. For example, dataframe having BLOB and CLOB data type columns cannot be loaded.

  5. If there are any incorrect rows, fastload protocol ignores those rows and inserts all valid rows. Failed rows are categorised into errors and warnings by fastload protocol and these errors and warnings are returned by the td_fastload function.

  6. For additional information about fastload protocol through teradatasql driver, please refer the FASTLOAD section of terdatasql driver documentation.

Usage

td_fastload(
  conn,
  df = NULL,
  table.name = NULL,
  overwrite = FALSE,
  types = NULL,
  append = FALSE,
  save.errors = FALSE,
  batch.size = NULL
)

Arguments

conn

Required Argument.
Specifies the remote data source connection.

df

Required Argument.
Specifies the dataframe which contains the data to be inserted.
Types : data.frame

table.name

Required Argument.
Specifies the name of the destination remote table.
Types : character

overwrite

Optional Argument.
Specifies whether to overwrite an existing table or not. If TRUE, will overwrite an existing table with name 'table.name'. If FALSE, will throw an error if table with 'table.name' already exists.
Default : FALSE
Types : logical
Note : The existing table will be dropped and new table schema will replace the existing table.

types

Optional Argument.
Specifies the data types for the columns in remote table. This argument accepts both named and unnamed character vectors. If it is NULL, the function will use data types of the columns present in R dataframe.
Suppose the dataframe with columns 'col1' and 'col2' is to be loaded into the remote data source.

  1. When an unnamed character vector is specified, the data types of all the columns must be specified.
    Example:
    If types = c("FLOAT", "VARCHAR(10)"), then the table is created with 2 columns, 'col1' and 'col2' with data types 'FLOAT' and 'VARCHAR(10)' respectively.

  2. If a named character vector is specified, then data types for only a subset of columns can be specified. If the names of the column data types do not match with any in the dataframe, then an error will be raised.
    Examples:

    1. If types = c("col1" = "FLOAT", col2 = "VARCHAR(10)"), then the table is created with 2 columns, 'col1' and 'col2' with data types 'FLOAT' and 'VARCHAR(10)' respectively.

    2. If types = c("col1" = "FLOAT"), then the table is created with 2 columns - 'col1' with data type 'FLOAT' and 'col2' with data type of the column in R dataframe.

Default : NULL
Types : Named or unnamed vector of characters

append

Optional Argument.
Specifies if the dataframe needs to be appended to a table.
Default : FALSE
Types : logical

save.errors

Optional Argument.
Specifies whether to persist the errors and warnings information in Teradata SQL Engine or not. If this argument is set to TRUE, the error and warnings information are presisted and names of error and warning tables are returned by td_fastload function. Otherwise, the function return NULL for the names of the tables.
Default : FALSE
Types : logical
Note : Persisting of the error and warning messages have performance impact on td_fastload in terms of time, based on the number of error and warning messages. However, the errors and warnings are returned as R dataframes by td_fastload which can then be persisted, if needed.

batch.size

Optional Argument.
Specifies the number of rows to be inserted per batch using fastload. For better performance, Teradata recommend the batch size to be at least 100,000 rows. batch.size takes positive integers.
If this argument is NULL, there are two cases based on the number of rows (say N) in the dataframe 'df' as explained below:

  1. If N is greater than 100,000, the rows are divided into batches of equal size with each batch having at least 100,000 rows (except the last batch which might have more rows).

  2. If N is less than 100,000, the rows are inserted in one batch after notifying the user that insertion happens with degradation of performance.

If this argument is specified, the rows are inserted in batches, each batch with size given in this argument, irrespective of the recommended batch size. Last batch will have rows less than the batch size specified, if the number of rows is not in integral multiples of the argument batch.size.
Default : NULL
Types : numeric
Note : This argument is especially useful when there are memory constraints in client machines, in which case user can use smaller batch size.
Examples:

  1. If batch.size = 170,000, the rows are inserted in the batches of size given in this argument.

  2. If batch.size = NULL and number of rows is 52,300, the rows are inserted in one batch with 52,300 rows.

  3. If batch.size = NULL and number of rows is 200,500, the rows are inserted in two batches, each inserting 100,250 rows.

  4. If batch.size = NULL and number of rows is 300,521, there will be three batches, of which two batches inserting 100,173 rows and one batch inserting 100,175 rows.

Value

A named list containing attributes:

  1. errors.dataframe : Has the error messages thrown by fastload protocol. It has an empty dataframe if there are no error messages.

  2. warnings.dataframe : Has the warning messages thrown by fastload protocol. It has an empty dataframe if there are no warning messages.

  3. errors.table : Has the table name of error messages. This is NULL if the argument save.errors is FALSE.

  4. warnings.table : Has the table name of warning messages. This is NULL if the argument save.errors is FALSE.

See Also

copy_to

Examples

# Get remote data source connection
con <- td_get_context()$connection

# Install the "nycflights13" package on your R client, if it is not already
# installed and load the library.
install.packages('nycflights13', repos='https://cloud.r-project.org', quiet = TRUE)
library(nycflights13)

# Create a dataframe.
flights_df <- as.data.frame(nycflights13::flights)

# Example 1: Using td_fastload to load data from dataframe.
fl_list <- td_fastload(con, flights_df, table.name = "flights_fl")

# Prints the loaded table.
tbl(con, "flights_fl")

# Example 2: Using td_fastload to overwrite existing "flights_fl" table.
fl_list <- td_fastload(con, flights_df, table.name = "flights_fl", 
                       overwrite = TRUE)

# Prints the loaded table.
tbl(con, "flights_fl")

# Example 3: Using td_fastload to append dataframe to an already available 
# table.
df1 <- head(flights_df, n = 200000)
df2 <- tail(flights_df, n = 136776)
fl_list1 <- td_fastload(con, df1, table.name = "flights_fl_append")
fl_list2 <- td_fastload(con, df2, table.name = "flights_fl_append", 
                        append = TRUE)

# Prints the loaded table.
tbl(con, "flights_fl_append")

# Example 4: Using td_fastload specifying column types as unnamed character 
#vector.
fl_list <- td_fastload(con, flights_df, table.name = "flights_fl_coltypes", 
           types = c("INTEGER", "INTEGER", "INTEGER", "INTEGER", "INTEGER", 
           "FLOAT", "INTEGER", "INTEGER", "FLOAT", "VARCHAR(50)", "INTEGER", 
           "VARCHAR(100)", "VARCHAR(50)", "VARCHAR(50)","FLOAT","FLOAT","FLOAT",
           "FLOAT","TIMESTAMP(6)"))

# Prints the loaded table.
tbl(con, "flights_fl_coltypes")

# Example 5: Using td_fastload specifying column types as named character 
#vector.
fl_list <- td_fastload(con, flights_df, table.name = "flights_fl_coltypes1", 
            types = c(tailnum = "varchar(100)", dest = "varchar(50)"))

# Prints the loaded table.
tbl(con, "flights_fl_coltypes1")

# Read erroneous data using td_fastload into R dataframe.
file_name <- system.file("extdata", "LargeErreneousData.csv", package="tdplyr")
df_err <- read.csv(file_name, header = TRUE,  stringsAsFactors = FALSE)

# Example 6: Using td_fastload to insert the data with duplicate rows and
# invalid columns. Note that errors and warnings are not persisted.

fl_list <- td_fastload(con, df_err, table.name = "invalid_rows")

# Prints errors dataframe.
fl_list$errors.dataframe

# Prints warnings dataframe.
fl_list$warnings.dataframe

# Prints the loaded table.
tbl(con, "invalid_rows")

# Example 7: Using td_fastload to insert the data with duplicate rows and
# invalid type columns and save the errors and warnings information.

fl_list <- td_fastload(con, df_err, table.name = "invalid_saved_rows",
                       save.errors = TRUE)

# Prints errors table; will be NULL, if there are no errors.
fl_list$errors.table

# Prints warnings table; will be NULL, if there are no warnings.
fl_list$warnings.table 

# Prints the loaded table.
tbl(con, "invalid_saved_rows")