Teradata Package for R Function Reference | 17.20 - WriteNOS - Teradata Package for R - Look here for syntax, methods and examples for the functions included in the Teradata Package for R.

Teradata® Package for R Function Reference

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for R
Release Number
17.20
Published
March 2024
ft:locale
en-US
ft:lastEdition
2024-05-03
dita:id
TeradataR_FxRef_Enterprise_1720
lifecycle
latest
Product Category
Teradata Vantage

WriteNOS

Description

td_write_nos_sqle() function enables access to write input tbl_teradata to external storage like Amazon S3, Azure Blob storage, or Google Cloud Storage.
You must have the EXECUTE FUNCTION privilege on TD_SYSFNLIB.WRITE_NOS.

Usage

  td_write_nos_sqle (
      data = NULL,
      location = NULL,
      authorization = NULL,
      stored.as = "PARQUET",
      naming = "RANGE",
      header = TRUE,
      row.format = NULL,
      manifest.file = NULL,
      manifest.only = FALSE,
      overwrite = FALSE,
      include.ordering = NULL,
      include.hashby = NULL,
      max.object.size = "16MB",
      compression = NULL,
      ...
  )

Arguments

data

Required Argument.
Specifies the input tbl_teradata.
Types: tbl_teradata

location

Optional Argument.
Specifies the location value, which is a Uniform Resource Identifier (URI) pointing to location in the external object storage system.
A URI identifying the external storage system in the format: "/connector/endpoint/bucket_or_container/prefix"
The "location" string cannot exceed 2048 characters.
Types: character

authorization

Optional Argument.
Specifies the authorization for accessing the external storage.
Following are the options for using "authorization":

  • Passing access id and key as dictionary: authorization = list('Access_ID'= 'access_id', 'Access_Key'= 'secret_key')

  • Passing access id and key as string in JSON format: authorization = '{"Access_ID":"access_id", "Access_Key":"secret_key"}'

  • Using Authorization Object: On any platform, authorization object can be specified as authorization = "[DatabaseName.]AuthorizationObjectName"
    EXECUTE privilege on AuthorizationObjectName is needed.

Below table specifies the USER (identification) and PASSWORD (secret_key) for accessing external storage. The following table shows the supported credentials for USER and PASSWORD (used in the CREATE AUTHORIZATION SQL command):

-------------------------------------------------------------------
System/Scheme USER PASSWORD
-------------------------------------------------------------------
AWS Access Key ID Access Key Secret
Azure / Shared Key Storage Account Name Storage Account Key
Azure Shared Access Signature (SAS) Storage Account Name Account SAS Token
Google Cloud (S3 interop mode) Access Key ID Access Key Secret
Google Cloud (native) Client Email Private Key
Public access object stores '<empty string>' Enclose the empty string in single straight quotes: USER '' '<empty string>' Enclose the empty string in single straight quotes: PASSWORD ''
---------------------------------------------------------------------------

When accessing GCS, Analytics Database uses either the S3-compatible connector or the native Google connector, depending on the user credentials.
Notes:

  • If using AWS IAM credentials, "authorization" can be omitted.

  • If S3 user account requires the use of physical or virtual security, session token can be used with Access_ID and Access_Key in this syntax:
    list("Access_ID" = "access_id", "Access_Key" = "secret_key", "Session_Token" = "session_token")
    In which, the session token can be obtained using the AWS CLI. For example: "aws sts get-session-token"

Types: character or list

stored.as

Optional Argument.
Specifies the formatting style of the external data.
PARQUET means the external data is formatted as Parquet.
Permitted Values:

  • PARQUET

  • CSV

Default Value: "PARQUET"
Types: character

naming

Optional Argument.
Specifies how the objects containing the rows of data are named
in the external storage:
Permitted Values:

  • DISCRETE- Discrete naming uses the ordering column values as part of the object names in external storage. For example, if the "data_order_column" has list("dateColumn", "intColumn"), the discrete form name of the objects written to external storage would include the values for those columns as part of the object name, which would look similar to this:
    "S3/ceph-s3.teradata.com/xz186000/2019-03-01/13/object_33_0_1.parquet"
    Where '2019-03-01' is the value for the first ordering column, "dateColumn", and '13' is the value for the second ordering column, "intColumn".
    All rows stored in this external Parquet-formatted object contain those two values.

  • RANGE- Range naming includes part of the object name as range of values included in the partition for each ordering column.
    For example, using the same "data_order_column" as above the object names would look similar to this: "S3/ceph-s3.teradata.com/xz186000/2019-01-01/2019-03-02/9/10000/object_33_0_1.parquet"
    In this external Parquet-formatted object:

    • '2019-01-01' and '2019-03-02' are the minimum and maximum values for the first ordering column, dateColumn.

    • '9' and '10000' are the minimum and maximum values for the second ordering column, intColumn.

Default Value: "RANGE"
Types: character

header

Optional Argument.
Specifies wheather the first record contains the column names.
Default Value: TRUE
Types: logical

row.format

Optional Argument.
Specifies the encoding format for the rows in the file type specified in "stored.as".
Notes:

  • For CSV data, encoding format is: {"field_delimiter": "fd_value", "record_delimiter": "\n", "character_set": "cs_value"}

  • For Parquet data, encoding format is: {"character_set": "cs_value"}

Following descibes the fields for the encoding format:

  • field_delimiter:
    Specifies the field delimiter. Default field delimiter is "," (comma). User can also specify a custom field delimiter, such as tab "\t". The key name and "fd_value" are case-sensitive.

  • record_delimiter:
    Specifies the record delimiter, which is the line feed character, "\n". The key name and "\n" are case-sensitive.

  • character_set:
    Specifies the field character set "UTF8" or "LATIN". The key name is case-sensitive, but "cs_value" is not.


Types: character or list

manifest.file

Optional Argument.
Specifies the fully qualified path and file name where the manifest file is written.
Use the format "/connector/end point/bucket_or_container/prefix/manifest_file_name".
For example:"/S3/ceph-s3.teradata.com/xz186000/manifest/manifest.json".
If "manifest.file" argument is not included, no manifest file is written.
Types: character

manifest.only

Optional Argument.
Specifies whether to write only a manifest file in external storage or not. No actual data objects are written to external storage when "manifest.only" is set to TRUE. One must also use the "manifest.file" option to create a manifest file in external storage. Use this option to create a new manifest file in the event that a td_write_nos_sqle() fails due to a database abort or restart, or when network connectivity issues interrupt and stop a td_write_nos_sqle() before all data has been written to external storage. The manifest is created from the tbl_teradata that is input to td_write_nos_sqle(). The input must be a tbl_teradata of storage object names and sizes, with one row per object.
Note:

  • The input to td_write_nos_sqle() with "manifest.only" can itself incorporate td_read_nos_sqle(), similar to this, which uses function mappings for td_write_nos_sqle() and td_read_nos_sqle():
    read_nos_obj = td_read_nos(return.type="NOSREAD_KEYS")
    td_write_nos(data=read_nos_obj$result manifest.file=manifest_file, manifest.only=TRUE)

  • Function calls like above can be used if a td_write_nos_sqle() fails before it can create a manifest file. The new manifest file created using td_read_nos_sqle() reflects all data objects currently in the external storage location, and can aid in determining which data objects resulted from the incomplete td_write_nos_sqle(). For more information, see Teradata Vantage - Native Object Store Getting Started Guide.

Default Value: FALSE
Types: logical

overwrite

Optional Argument.
Specifies whether an existing manifest file in external storage is overwritten with a new manifest file that has the same name.
When set to FALSE, td_write_nos_sqle() returns an error if a manifest file exists in external storage that is named identically to the value of "manifest.file".
Note:

  • This argument must be used with "manifest.only" set to TRUE.

Default Value: FALSE
Types: logical

include.ordering

Optional Argument.
Specifies whether column(s) specified in argument "data_order_column" and their values are written to external storage.
Types: logical

include.hashby

Optional Argument.
Specifies whether column(s) specified in argument "data_hash_column" and their values are written to external storage.
Types: logical

max.object.size

Optional Argument.
Specifies the maximum output object size in megabytes, where maximum object size can range between 4 and 16. The default is the value of the DefaultRowGroupSize field in DBS Control. For more information on DBS Control, see Teradata Vantage - Database Utilities.
Default Value: "16MB"
Types: character

compression

Optional Argument.
Specifies the compression algorithm used to compress the objects written to external storage.
Note:

  • For Parquet files the compression occurs inside parts of the parquet file instead of for the entire file, so the file extension on external objects remains '.parquet'.

Permitted Values: "GZIP", "SNAPPY"
Types: character

...

Specifies the generic keyword arguments SQLE functions accept.
Below are the generic keyword arguments:

persist:
Optional Argument.
Specifies whether to persist the results of the function in table or not.
When set to TRUE, results are persisted in table; otherwise, results are garbage collected at the end of the session.
Default Value: FALSE
Types: logical

volatile:
Optional Argument.
Specifies whether to put the results of the function in volatile table or not.
When set to TRUE, results are stored in volatile table, otherwise not.
Default Value: FALSE
Types: logical

Function allows the user to partition, hash, order or local order the input data. These generic arguments are available for each argument that accepts tbl_teradata as input and can be accessed as:

  • "<input.data.arg.name>.partition.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.hash.column" accepts character or vector of character (Strings)

  • "<input.data.arg.name>.order.column" accepts character or vector of character (Strings)

  • "local.order.<input.data.arg.name>" accepts logical

Note:

  • These generic arguments are supported by tdplyr if the underlying Analytic Database function supports it, else an exception is raised.

Value

Function returns an object of class "td_write_nos_sqle" which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator using the name(s):result

Examples

  
    
    # Get the current context/connection
    con <- td_get_context()$connection
    
    # Check the list of available analytic functions.
    display_analytic_functions()
    
    # Load the example data.
    loadExampleData("tdplyr_example", "titanic")
    
    # Create tbl_teradata object.
    titanic_data <- tbl(con, "titanic")
    
    # Example 1: Writing the tbl_teradata to an AWS S3 location.
    obj <-  td_write_nos_sqle(
              data=titanic_data,
              location='/S3/s3.amazonaws.com/Your-Bucket/location/',
              authorization=list("Access_ID"= "YOUR-ID", "Access_Key"= "YOUR-KEY"),
              stored.as='PARQUET')
    
    # Print the result.
    print(obj$result)
    
    # Example 2: Write tbl_teradata to external storage with partition and order by
    #            column 'sex'.
    obj <- td_write_nos_sqle(
            data=titanic_data,
            location='/S3/s3.amazonaws.com/Your-Bucket/location/',
            authorization=list("Access_ID"= "YOUR-ID", "Access_Key"= "YOUR-KEY"), 
            data.partition.columns="sex",
            data.order.columns="sex",
            stored.as='PARQUET')
    
    # Print the result.
    print(obj$result)
    
    # Example 3: Write tbl_teradata to external storage with hashing and order by
    #            column 'sex'.
    obj <- td_write_nos_sqle(
            data=titanic_data,
            location='/S3/s3.amazonaws.com/Your-Bucket/location/',
            authorization=list("Access_ID"= "YOUR-ID", "Access_Key"= "YOUR-KEY"),
            data.hash.columns="sex",
            data.order.columns="sex",
            local.order.data=TRUE,
            include.hashing=TRUE,
            stored.as='PARQUET')
    
    # Print the result.
    print(obj$result)
    
    # Example 4: Write tbl_teradata to external storage with max object size as 4MB.
    obj <- td_write_nos_sqle(
            data=titanic_data,
            location='/S3/s3.amazonaws.com/Your-Bucket/location/',
            authorization=list("Access_ID"= "YOUR-ID", "Access_Key"= "YOUR-KEY"), 
            include.ordering=TRUE,
            max.object.size='4MB',
            compression='GZIP',
            stored.as='PARQUET')
    
    # Print the result.
    print(obj$result)
    
    # Example 5: Write tbl_teradata of manifest table into a manifest file.
    obj_names <- c('/S3/s3.amazonaws.com/YOUR-STORAGE-ACCOUNT/20180701/ManifestFile/object_33_0_1.parquet',
                 '/S3/s3.amazonaws.com/YOUR-STORAGE-ACCOUNT/20180701/ManifestFile/object_33_6_1.parquet',
                 '/S3/s3.amazonaws.com/YOUR-STORAGE-ACCOUNT/20180701/ManifestFile/object_33_1_1.parquet')
    obj_size <- c(2803, 2733, 3009)
    manifest_df <- data.frame('ObjectName'=obj_names, 'ObjectSize'= obj_size)
    copy_to(con, manifest_df, "manifest_df")
    
    manifest_df = tbl(con, "manifest_df")
    obj <- td_write_nos_sqle(data=manifest_df,
            location='YOUR-STORAGE-ACCOUNT/20180701/ManifestFile2/',
            authorization=list("Access_ID"= "YOUR-ID", "Access_Key"= "YOUR-KEY"),
            manifest.file='YOUR-STORAGE-ACCOUNT/20180701/ManifestFile2/manifest2.json',
            manifest.only=TRUE,
            stored.as='PARQUET')
    
    # Print the result.
    print(obj$result)
    
    # Example 7: Write tbl_teradata to external object store in CSV format with a header
    #            and field delimiter as '\t' (tab).
    obj <- td_write_nos_sqle(
            location=location,
            authorization=list("Access_ID"= "YOUR-ID", "Access_Key"= "YOUR-KEY"),
            stored.as="CSV",
            header=TRUE,
            row.format=list("field_delimiter"="\t", "record_delimiter"="\n"),
            data=titanic_data)
    
    # Print the result.
    print(obj$result)
    
    # Note: 
    #   Before proceeding, verify with your database administrator that you have the 
    #   correct privileges, an authorization object, and a function mapping (for WRITE_NOS 
    #   function).
    
    # If function mapping for WRITE_NOS Analytic database function is created 
    # as 'WRITE_NOS_FM' and location and authorization object are mapped,
    # then set function mapping with tdplyr options as below.
    
    # Example 8: Writing the tbl_teradata using function mapping.
    options(write.nos.function.mapping = "WRITE_NOS_FM")
    
    obj <-  td_write_nos_sqle(data=titanic_data, stored.as='PARQUET')
    
    # Print the result.
    print(obj$result)