Teradata R Package Function Reference | 17.00 - 17.00 - copy_to - Teradata R Package

Teradata® R Package Function Reference

prodname
Teradata R Package
vrm_release
17.00
created_date
September 2020
category
Programming Reference
featnum
B700-4007-090K

Description

This function uploads a local dataframe or copies data from a tbl_teradata into a remote data source by creating the table definition. It can also be used to upload/copy data into time series Primary Time Index (PTI) tables.
Note :

  1. When the connection is set up using Teradata SQL Driver for R, the function inserts data in batches of 16384 rows. To insert large number of rows, please refer to td_fastload function.

  2. When the connection is set up using ODBC Driver,the function inserts data in batches of 1024 rows. Please refer to ODBC Driver page for more information.

Usage

## S3 method for class 'Teradata'
copy_to(
  dest,
  df,
  name = deparse(substitute(df)),
  overwrite = FALSE,
  types = NULL,
  table.type = "PI",
  primary.index = NULL,
  row.names = FALSE,
  temporary = FALSE,
  analyze = FALSE,
  append = FALSE,
  pti.name = NULL,
  timecode.column.info = NULL,
  timezero.date = NULL,
  timebucket.duration = NULL,
  pti.columns = NULL,
  sequence.column = NULL,
  seq.max = NULL,
  ...
)

Arguments

dest

Required Argument.
Specifies the remote data source connection.

df

Required Argument.
Specifies the local dataframe or tbl_teradata which contains the data to be inserted.
Types : R dataframe OR tbl_teradata

name

Optional Argument.
Specifies the name of new remote table.
Default : Name of the dataframe.
Types : character

overwrite

Optional Argument.
Specifies whether to overwrite existing table or not.

  1. If 'TRUE', copy_to will overwrite an existing table with name 'name'. The existing table will be dropped and new table schema will replace the existing table.

  2. If 'FALSE', copy_to will throw an error if table already exists.

Default : FALSE
Types : logical

types

Optional Argument.
Specifies the SQL data types for the columns in remote table. This argument accepts both named and unnamed character vectors. If it is NULL, the function uses data types of the columns present in R dataframe to map to correct SQL data type.
While creating PTI table, the data types provided for the timecode and sequence columns are ignored as the columns are auto-generated during PTI table creation. This is same for both named and unnamed types. However, for unnamed types, all the data types must be specified like that for non-PTI tables.
Suppose a dataframe with columns 'col1' and 'col2' is to be loaded into the remote data source.

  1. When an unnamed character vector is specified, the data types of all the columns must be specified.
    Example:
    If types = c("FLOAT", "VARCHAR(10)"), then the table is created with 2 columns, 'col1' and 'col2' with data types 'FLOAT' and 'VARCHAR(10)' respectively.

  2. If a named character vector is specified, then data types for only a subset of columns can be specified. If the names of the column data types do not match with any in the dataframe, then an error will be raised.
    Examples:

    1. If types = c("col1" = "FLOAT", col2 = "VARCHAR(10)"), then the table is created with 2 columns, 'col1' and 'col2' with data types 'FLOAT' and 'VARCHAR(10)' respectively.

    2. If types = c("col1" = "FLOAT"), then the table is created with 2 columns - 'col1' with data type 'FLOAT' and 'col2' with data type of the column in R dataframe.

Default : NULL
Types : Named OR unnamed vector of characters

table.type

Optional Argument.
Specifies the type of remote table.
Permitted Values : "PI" (Primary Index), "NOPI" (No Primary Index)
Default : "PI"
Types : character

primary.index

Optional Argument.
Specifies the name of the primary index column. You can provide multiple columns as a vector. When primary.index is not specified and table.type is "PI", the behavior is specified by the "PrimaryIndexDefault" field in DBS Control.
Default : NULL
Types : character OR vector of characters

row.names

Optional Argument.
Specifies whether to insert row names in remote table or not.
If dataframe contains row_names column, regardless of row.names option, row names will be inserted in remote table. If TRUE and dataframe doesn't contain row_names column, row names are converted to a column named "row_names". If FALSE, row names are ignored.
Default : FALSE
Types : logical
Note : This argument is ignored when PTI tables are created.

temporary

Optional Argument.
Specifies whether to create temporary(Teradata volatile) remote table or not. Temporary tables are always created in the login user space, regardless of the current default database setting. If FALSE, a permanent table is created, that is available across all sessions. If TRUE, a temporary table is created that is not available in any other sessions.
To use a temporary table, a tbl_teradata for the table should be created using the in_schema() function.
Example: tbl(con, in_schema("username", "temporary table name")
Default : FALSE
Types : logical

analyze

Optional Argument.
Calls ANALYZE(collect statistics) on the target table after the creation.
If table.type is NOPI, it will collect statistics on first column of table, else i.e. for PI table, it will collect statistics on primary index column of the table.
To run analyze on multiple columns, directly call db_analyze function and pass multiple columns as character vector. This option is not supported and hence ignored when creating table from tbl_teradata.
Default : FALSE
Types : logical

append

Optional Argument.
Specifies if the dataframe needs to be appended to a table.
When appending data in 'df' to a PTI table, the argument 'timecode.column.info' is required. The SQL data type in the argument 'timecode.column.info' is ignored if the 'append' is set to TRUE. If the PTI table is sequenced, the argument 'sequence.column' is also required.
Default : FALSE
Types : logical

pti.name

Optional Argument.
Used when 'df' is to be saved as a PTI table.
Secifies the name of the Primary Time Index (PTI) when the table to be created is a PTI table.
Default : NULL
Types : character

timecode.column.info

Optional Argument.
Required when the table to be created is PTI table.
Specfies the column name (in the dataframe or tbl_teradata) and SQL data type of the column which contains timecode information as a named vector. This timecode column is TD_TIMECODE column in the table created. The SQL data type must be either of DATE, TIMESTAMP(n) or TIMESTAMP(n) WITH TIME ZONE where, n is the decimal precision of the fractional seconds in the timestamp. For more information on these data types, see Teradata Vantage Data Types and Literals.
Default : NULL
Types : Named vector of column name and SQL type.
Example : timecode.column.info = c("col1" = "TIMESTAMP(3) WITH TIME ZONE") creates the column TD_TIMECODE of type TIMESTAMP(3) WITH TIME ZONE.
Note :

  1. This argument should be a named vector of length one.

  2. copy_to() creates a new column named TD_TIMECODE in the PTI table and inserts/appends the data in the column specified in this argument into TD_TIMECODE column of remote table.

timezero.date

Optional Argument.
Used when 'df' is to be saved as a PTI table.
Specifies the earliest time series data that the PTI table will accept. In other words, it is the date that precedes the earliest date in the time series data. If it is not specified, Advanced SQL Engine creates table with timezero as DATE '1970-01-01'.
Default : NULL
Types : Date in character format 'YYYY-MM-DD'
Note : Although this argument is optional, Teradata recommends to specify a date near to the earliest timestamp in the time series data.

timebucket.duration

Optional Argument.
Required if pti.columns is not specified.
Used when 'df' is to be saved as a PTI table.
Specifies the duration that serves to break up the time continuum in the time series data into discrete groups or buckets.
This argument can be specified using any of the units of time as shown below:

------------------------------------------------------- ---------------------------------------------------------------------
Time Unit Formal Form Example Shorthand Equivalents
------------------------------------------------------- ---------------------------------------------------------------------
Calendar Years CAL_YEARS(N) Ncy, Ncyear, Ncyears
Calendar Months CAL_MONTHS(N) Ncm, Ncmonth, Ncmonths
Calendar Days CAL_DAYS(N) Ncd, Ncday, Ncdays
Weeks WEEKS(N) Nw, Nweek, Nweeks
Days DAYS(N) Nd, Nday, Ndays
Hours HOURS(N) Nh, Nhr, Nhrs, Nhour, Nhours
Minutes MINUTES(N) Nm, Nmins, Nminute, Nminutes
Seconds SECONDS(N) Ns, Nsec, Nsecs, Nsecond, Nseconds
Milliseconds MILLISECONDS(N) Nms, Nmsec, Nmsecs, Nmillisecond, Nmilliseconds
Microseconds MICROSECONDS(N) Nus, Nusec, Nusecs, Nmicrosecond, Nmicroseconds
------------------------------------------------------- ---------------------------------------------------------------------

Where, N is a 16-bit positive integer with a maximum value of 32767.
Default : NULL
Types: str
Examples:

  1. MINUTES(23) which is equal to 23 minutes

  2. CAL_MONTHS(5) which is equal to 5 calendar months

pti.columns

Optional Arugment.
Required if timebucket.duration is not specified.
Used when 'df' is to be saved as a PTI table.
Specifies one or more PTI table column names. These columns are used, together with the PTI, to determine how rows are distributed among the AMPs.
Default : NULL
Types : character OR vector of characters

sequence.column

Optional Argument.
Used when 'df' is to be saved as a PTI table.
Specifies the name of the column which contains the unique sequence numbers that differentiate the readings with the same timestamp. If this argument is provided, a SEQUENCED PTI table is created. Otherwise, a NONSEQUENCED PTI table is created. If the table is SEQUENCED PTI table, there can be multiple readings with the same timestamp from the same sensor. For nonsequenced data, there is one sensor reading per timestamp.
Default : NULL
Types : character
Note :

  1. copy_to() creates a new column named TD_SEQNO in the PTI table and inserts the data in the column specified in this argument into TD_SEQ column of remote table.

  2. Advanced SQL Engine orders the rows in a sequenced table based on the timestamp, and within a single timestamp value, orders the rows according to the sequence number.

seq.max

Optional Argument.
Used only when the argument 'sequence.column' is specified and 'df' must be saved as a PTI table.
Specifies the maximum number of data rows that can have the same timestamp. It takes a positive integer from 1 to 2147483647. If the argument is NULL, Advanced SQL Engine creates the table with seq.max as 20000.
Default : NULL
Types : numeric

...

Other parameters passed to methods.

Value

A 'tbl' object with a multiset PTI/non-PTI table present in the remote database.

PTI Usage

  1. PTI table creation is only supported with Teradata SQL Driver for R.

  2. When the argument timecode.column.info is specified, copy_to attempts to create a PTI table. This argument is not required when a non-PTI table is to be created. If this argument is specified, primary.index argument is ignored.

  3. The arguments pti.name, timezero.date, timebucket.duration, pti.columns, sequence.column, seq.max are not required or used when the table to be created is non-PTI table. These arguments are ignored if specified without the argument timecode.column.info.

  4. Choose PTI related arguments that will identify each row to result in the most even distribution of time series data among the AMPs in the map used by the table. Include columns in the argument pti.columns that are frequently queried to speed data access for those queries.

  5. For more information on PTI tables, please refer to the documentation Teradata Vantage Time Series Tables and Operations.

See Also

td_fastload, collect function in the dplyr package, which downloads the data from a remote tbl into a local tbl.

Examples

# Get remote data source connection.
con <- td_get_context()$connection

# Examples 1 through 16 demonstrates how to create non-PTI table.

# Create a dataframe.
df <- data.frame(column1 = c(1,2,3), column2 = c('teradata','vantage','release'),
                 column3 = c(TRUE,FALSE,TRUE))

# Example 1: Create table from R dataframe "df" and collect statistics.
copy_to(con, df, name = "test_table", analyze = TRUE, overwrite = TRUE)

# Example 2: Append dataframe to an already created table.
copy_to(con, df, name = "test_table", append = TRUE)

# Example 3: Append to a non-existent table. A new table will be created.
copy_to(con, df, name = "test_table_append", append = TRUE)

# Example 4: Overwrite existing table "test_table" table.
copy_to(con, df, name = "test_table", overwrite = TRUE)

# Example 5: Create table in a different database using dbplyr::in_schema.
copy_to(con, df, name = dbplyr::in_schema(td_get_context()$temp.database, "table_in_schema"))

# Example 6: Create temporary(Teradata volatile) table.
copy_to(con, df, name = "table_temp", temporary = TRUE)

# Example 7: Creating PI table.
copy_to(con, df, name = "table_PI", table.type = "PI", primary.index = c("column1","column2"))

# Example 8: Create NOPI table.
copy_to(con, df, name = "table_NOPI", table.type = "NOPI")

# Example 9: Create table with row_names of dataframe.
copy_to(con, df, name = "table_row_names", row.names = TRUE)

# Example 10: Create table by specifying column types as unnamed character vector.
copy_to(con, df, name = "table_coltypes", types = c("integer","varchar(10)","integer"))

# Example 11: Create table by specifying column types as named character vector.
copy_to(con, df, name = "table_coltypes1", types = c(column1="integer", column3="byteint"))

# Get the tbl_teradata whose data is to be copied to new table.
test_table <- tbl(con,"test_table")

# Example 12: Create table from tbl_teradata.
copy_to(con, test_table, name = "table_tbl")

# Example 13: Create NOPI table from tbl_teradata.
copy_to(con, test_table, name = "table_tbl_nopi", table.type = "NOPI")

# Example 14: Create PI table from tbl_teradata - primary.index uses the first column 
# by default.
copy_to(con, test_table, name = "table_tbl_pi", table.type = "PI")

# Example 15: Create temporary(Teradata volatile) table from tbl_teradata.
copy_to(con, test_table, name = "table_tbl_temp", temporary = TRUE)

# Example 16: Create table using "%>%" operator.
# This creates a temporay table with a randomly generated name, and the table will be dropped
# when the current session ends.
test_table_temp <- test_table %>% copy_to(con,df = .)

## Working on PTI tables.

loadExampleData("sessionize_example", "sessionize_table")
df_tbl <- tbl(con, "sessionize_table")
df1 <- as.data.frame(df_tbl)

# Example 17 : Using R dataframe to create a non-PTI table. Note that the argument 
# 'timecode.column.info' is not used here. Other PTI related arguments are ignored.
copy_to(con, df1, name = "df_to_nonpti_table", timebucket.duration = "2h", 
        timezero.date = "2008-01-01", sequence.column = "adid", seq.max = 200)

# Example 18 : Using R dataframe to create a SEQUENCED PTI table with column 'clicktime' 
# (TIMESTAMP type) having timecode information and column 'adid' having sequence information. 
# Without seq.max, largest value that sequence column can hold is 20000 (by default).
copy_to(con, df1, name = "df_to_pti_table", overwrite = TRUE, timebucket.duration = "2h", 
        timecode.column.info = c("clicktime" = "TIMESTAMP(3)"), timezero.date = "2008-01-01", 
        sequence.column = "adid")

# Example 19 : Using R dataframe to create a NONSEQUENCED TEMPORARY PTI table with column 
# 'clicktime' (TIMESTAMP type) having timecode information. By providing 'types' argument, the
# column 'partition_id' is created with BIGINT datatype. The argument 'sequence.column' should
# not be provided for creating non-sequenced PTI tables.
copy_to(con, df1, name = "df_to_temp_pti_table", timebucket.duration = "2h", temporary = TRUE,
        timecode.column.info = c("clicktime" = "TIMESTAMP(6)"), timezero.date = "2008-01-01",
        types = c("partition_id" = "BIGINT"))

# Append to the same table using types argument.
copy_to(con, df1, name = "df_to_temp_pti_table", types = c("partition_id" = "BIGINT"), 
timecode.column.info = c("clicktime" = "TIMESTAMP(6)"), append = TRUE)


# Example 20 : Using non-PTI tbl_teradata to create a SEQUENCED PTI table with column 
# 'clicktime' (TIMESTAMP type) having timecode information and with sequence column taking
# largest value of 1000.
copy_to(con, df_tbl, name = "dftbl_to_seq_pti_table", timebucket.duration = "2h", 
        timecode.column.info = c("clicktime" = "TIMESTAMP(6)"), sequence.column = "adid", 
        seq.max = 1000)

# Example 21 : Create a non-PTI table using PTI tbl_teradata. Note that there are no PTI
# related arguments specified.

# Load sequenced PTI table, if not exists.
loadExampleData("time_series_example", "ocean_buoys_seq")

# Create tbl_teradata from PTI table.
df2 <- tbl(con, "ocean_buoys_seq")

# Copies data of PTI table to non-PTI table.
copy_to(con, df2, "pti_to_nonpti_table")

# Example 22 : Create a non-sequenced PTI table from sequenced PTI table with modified 
# time series specifications. The source PTI table is sequenced, has time zero of 
# DATE '2012-01-01' and timebucket duration of HOURS(1). The target PTI table is non-sequenced
# (the column of TD_SEQNO of source table is no longer generated column), has time zero of 
# DATE '2010-01-01' and timebucket duration of HOURS(2). Note that the TD_TIMECODE column's
# timestamp precision cannot be increased/decreased, i.e., the SQL type of TD_TIMECODE column 
# cannotbe modified. When the PTI related arguments of target table are same as that of source  
# PTI table, then target PTI table is exact same copy as that of source PTI table.
copy_to(con, df2, "pti_to_modified_pti_table", timebucket.duration = "2h", 
        timecode.column.info = c("TD_TIMECODE" = "TIMESTAMP(6)"), timezero.date = "2010-01-01")