WriteNOS
Description
td_write_nos_sqle()
function enables access to write storage,
input tbl_teradata to external storage like Amazon S3, Azure Blob storage,
or Google Cloud Storage.
You must have the EXECUTE FUNCTION privilege on TD_SYSFNLIB.WRITE_NOS.
Usage
td_write_nos_sqle (
data = NULL,
location = NULL,
authorization = NULL,
stored.as = "PARQUET",
naming = "RANGE",
manifest.file = NULL,
manifest.only = FALSE,
overwrite = FALSE,
include.ordering = NULL,
include.hashby = NULL,
max.object.size = "16MB",
compression = NULL,
...
)
Arguments
data |
Required Argument.
Specifies the input tbl_teradata.
Types: tbl_teradata
|
location |
Optional Argument.
Specifies the location value, which is a Uniform Resource Identifier
(URI) pointing to location in the external object storage system.
A URI identifying the external storage system in the format:
"/connector/endpoint/bucket_or_container/prefix"
The "location" string cannot exceed 2048 characters.
Types: character
|
authorization |
Optional Argument.
Specifies the authorization for accessing the external storage.
Following are the options for using "authorization":
Passing access id and key as list:
authorization = list('Access_ID'= 'access_id',
'Access_Key'= 'secret_key')
Passing access id and key as string in JSON format:
authorization = '{"Access_ID":"access_id", "Access_Key":"secret_key"}'
Using Authorization Object:
On any platform, authorization object can be specified as
authorization = "[DatabaseName.]AuthorizationObjectName"
EXECUTE privilege on AuthorizationObjectName is needed.
Below table specifies the USER (identification) and PASSWORD (secret_key)
for accessing external storage. The following table shows the supported
credentials for USER and PASSWORD (used in the CREATE AUTHORIZATION SQL command):
----------------------- | ------------------------- | ------------------- |
System/Scheme | USER | PASSWORD |
----------------------- | ------------------------ | -------------------- |
AWS | Access Key ID | Access Key Secret |
Azure / Shared Key | Storage Account Name | Storage Account Key |
Azure Shared Access Signature (SAS) | Storage Account Name | Account SAS Token |
Google Cloud (S3 interop mode) | Access Key ID | Access Key Secret |
Google Cloud (native) Client Email | Private Key |
Public access object stores | '<empty string>' Enclose the empty string
in single straight quotes: USER '' | '<empty string>' Enclose the empty string
in single straight quotes: PASSWORD '' |
----------------------- | -------------------------- | -------------------------- |
|
When accessing GCS, Analytics Database uses either the S3-compatible connector or
the native Google connector, depending on the user credentials.
Notes:
If using AWS IAM credentials, "authorization" can be omitted.
If S3 user account requires the use of physical or virtual security,
session token can be used with Access_ID and Access_Key in this syntax:
authorization = list("Access_ID" = "access_id",
"Access_Key" = "secret_key",
"Session_Token" = "session_token")
In which, the session token can be obtained using the AWS CLI.
For example:
"aws sts get-session-token"
Types: character or list
|
stored.as |
Optional Argument.
Specifies the formatting style of the external data.
PARQUET means the external data is formatted as Parquet. Objects
created in external storage by td_write_nos_sqle are written only in Parquet
format.
Permitted Values:
"PARQUET"
Default Value: "PARQUET"
Types: character
|
naming |
Optional Argument.
Specifies how the objects containing the rows of data are named
in the external storage:
Permitted Values:
"DISCRETE"-
Discrete naming uses the ordering column values as part of the object
names in external storage. For example, if the "data_partition_column"
has "data_order_column = list("dateColumn", "intColumn")" ,
the discrete form name of the objects written to external storage would
include the values for those columns as part of the object name,
which would look similar to this:
"S3/ceph-s3.teradata.com/xz186000/2019-03-01/13/object_33_0_1.parquet"
'2019-03-01' is the value for dateColumn, the first ordering column, and '13' is
the value for the second ordering column, intColumn.
All rows stored in this external Parquet-formatted object contain those
two values.
"RANGE"-
Range naming, the default, includes as part of the object name the
range of values included in the partition for each ordering column.
For example, using the same "data_order_column" as above the object
names would look similar to this:
"S3/ceph-s3.teradata.com/xz186000/2019-01-01/2019-03-02/9/10000/object_33_0_1.parquet"
where '2019-01-01' is the minimum value in that object for the first
ordering column, dateColumn, '2019-03-02' is the maximum value for the
rows stored in this external Parquet-formatted object. Value'9' is the
minimum value for the second ordering column, intColumn, and '10000' is
the maximum value for that column.
Default Value: "RANGE"
Types: character
|
manifest.file |
Optional Argument.
Specifies the fully qualified path and file name where the manifest
file is written. Use the format
"/connector/end point/bucket_or_container/prefix/manifest_file_name".
For example:"/S3/ceph-s3.teradata.com/xz186000/manifest/manifest.json".
If "manifest.file" argument is not included, no manifest file is written.
Types: character
|
manifest.only |
Optional Argument.
Specifies whether to write only a manifest file in external storage
or not. No actual data objects are written to external storage when
"manifest.only" is set to TRUE. One must also use the "manifest.file"
option to create a manifest file in external storage. Use this option
to create a new manifest file in the event that a td_write_nos_sqle()
fails due to a database abort or restart, or when network connectivity
issues interrupt and stop a td_write_nos_sqle() before all data has
been written to external storage. The manifest is created from the
tbl_teradata that is input to td_write_nos_sqle() .
The input must be a tbl_teradata of storage object names and sizes, with
one row per object.
Note:
The input to td_write_nos_sqle() with "manifest.only" can
itself incorporate td_read_nos_sqle() , similar to this,
which uses function mappings
for td_write_nos_sqle() and td_read_nos_sqle() :
read_nos_obj = td_read_nos(return.type="NOSREAD_KEYS")
td_write_nos(data=read_nos_obj$result
manifest.file=manifest_file,
manifest.only=True)
Function calls like above can be used if a td_write_nos_sqle()
fails before it can create a manifest file.
The new manifest file created using
td_read_nos_sqle() reflects all data objects currently
in the external storage location, and can aid in determining
which data objects resulted from the incomplete
td_write_nos_sqle() .
For more information, see Teradata Vantage - Native Object Store
Getting Started Guide.
Default Value: FALSE
Types: logical
|
overwrite |
Optional Argument.
Specifies whether an existing manifest file in external storage is
overwritten with a new manifest file that has the same name.
When set to FALSE, td_write_nos_sqle() returns an error if a
manifest file exists in external storage that is named
identically to the value of "manifest.file".
Note:
Default Value: FALSE
Types: logical
|
include.ordering |
Optional Argument.
Specifies whether column(s) specified in argument "data_order_column" and
their values are written to external storage.
Types: logical
|
include.hashby |
Optional Argument.
Specifies whether column(s) specified in argument "data_hash_column" and
their values are written to external storage.
Types: logical
|
max.object.size |
Optional Argument.
Specifies the maximum output object size in megabytes, where
maximum object size can range between 4 and 16. The default is the
value of the DefaultRowGroupSize field in DBS Control.
For more information on DBS Control, see Teradata Vantage - Database
Utilities.
Default Value: "16MB"
Types: character
|
compression |
Optional Argument.
Specifies the compression algorithm used to compress the objects
written to external storage.
Note:
Permitted Values: "GZIP", "SNAPPY"
Types: character
|
... |
Specifies the generic keyword arguments SQLE functions accept.
Below are the generic keyword arguments:
persist:
Optional Argument.
Specifies whether to persist the results of the function in table or not.
When set to TRUE, results are persisted in table; otherwise, results
are garbage collected at the end of the session.
Default Value: FALSE
Types: logical
volatile:
Optional Argument.
Specifies whether to put the results of the function in volatile table or not.
When set to TRUE, results are stored in volatile table, otherwise not.
Default Value: FALSE
Types: logical
Function allows the user to partition, hash, order or local order the input
data. These generic arguments are available for each argument that accepts
tbl_teradata as input and can be accessed as:
"<input.data.arg.name>.partition.column" accepts character or vector of character (Strings)
"<input.data.arg.name>.hash.column" accepts character or vector of character (Strings)
"<input.data.arg.name>.order.column" accepts character or vector of character (Strings)
"local.order.<input.data.arg.name>" accepts logical
Note:
|
Value
Function returns an object of class "td_write_nos_sqle"
which is a named list containing object of class "tbl_teradata".
Named list member(s) can be referenced directly with the "$" operator
using the name(s):result
Examples
# Get the current context/connection
con <- td_get_context()$connection
# Check the list of available analytic functions.
display_analytic_functions()
# Load the example data.
loadExampleData("tdplyr_example", "titanic")
# Create tbl_teradata object.
titanic_data <- tbl(con, "titanic")
# Example 1: Writing the tbl_teradata to an AWS S3 location.
obj <- td_write_nos_sqle(
data=titanic_data,
location='/S3/s3.amazonaws.com/Your-Bucket/location/',
authorization=list("Access_ID"= "YOUR-ID", "Access_Key"= "YOUR-KEY"),
stored.as='PARQUET')
# Print the result.
print(obj$result)
# Example 2: Write tbl_teradata to external storage with partition and order by
# column 'sex'.
obj <- td_write_nos_sqle(
data=titanic_data,
location='/S3/s3.amazonaws.com/Your-Bucket/location/',
authorization=list("Access_ID"= "YOUR-ID", "Access_Key"= "YOUR-KEY"),
data.partition.columns="sex",
data.order.columns="sex",
stored.as='PARQUET')
# Print the result.
print(obj$result)
# Example 3: Write tbl_teradata to external storage with hashing and order by
# column 'sex'.
obj <- td_write_nos_sqle(
data=titanic_data,
location='/S3/s3.amazonaws.com/Your-Bucket/location/',
authorization=list("Access_ID"= "YOUR-ID", "Access_Key"= "YOUR-KEY"),
data.hash.columns="sex",
data.order.columns="sex",
local.order.data=TRUE,
include.hashing=TRUE,
stored.as='PARQUET')
# Print the result.
print(obj$result)
# Example 4: Write tbl_teradata to external storage with max object size as 4MB.
obj <- td_write_nos_sqle(
data=titanic_data,
location='/S3/s3.amazonaws.com/Your-Bucket/location/',
authorization=list("Access_ID"= "YOUR-ID", "Access_Key"= "YOUR-KEY"),
include.ordering=TRUE,
max.object.size='4MB',
compression='GZIP',
stored.as='PARQUET')
# Print the result.
print(obj$result)
# Example 5: Write tbl_teradata of manifest table into a manifest file.
obj_names <- c('/S3/s3.amazonaws.com/YOUR-STORAGE-ACCOUNT/20180701/ManifestFile/object_33_0_1.parquet',
'/S3/s3.amazonaws.com/YOUR-STORAGE-ACCOUNT/20180701/ManifestFile/object_33_6_1.parquet',
'/S3/s3.amazonaws.com/YOUR-STORAGE-ACCOUNT/20180701/ManifestFile/object_33_1_1.parquet')
obj_size <- c(2803, 2733, 3009)
manifest_df <- data.frame('ObjectName'=obj_names, 'ObjectSize'= obj_size)
copy_to(con, manifest_df, "manifest_df")
manifest_df = tbl(con, "manifest_df")
obj <- td_write_nos_sqle(
data=manifest_df,
location='YOUR-STORAGE-ACCOUNT/20180701/ManifestFile2/',
authorization=list("Access_ID"= "YOUR-ID", "Access_Key"= "YOUR-KEY"),
manifest.file='YOUR-STORAGE-ACCOUNT/20180701/ManifestFile2/manifest2.json',
manifest.only=TRUE,
stored.as='PARQUET')
# Print the result.
print(obj$result)
# Note:
# Before proceeding, verify with your database administrator that you have the
# correct privileges, an authorization object, and a function mapping (for WRITE_NOS
# function).
# If function mapping for WRITE_NOS Analytic database function is created
# as 'WRITE_NOS_FM' and location and authorization object are mapped,
# then set function mapping with tdplyr options as below.
# Example 7: Writing the tbl_teradata using function mapping.
options(write.nos.function.mapping = "WRITE_NOS_FM")
obj <- td_write_nos_sqle(data=titanic_data, stored.as='PARQUET')
# Print the result.
print(obj$result)