WRITE_NOS Table Operator Syntax | SQL Statements | Teradata Vantage - 17.10 - WRITE_NOS Syntax - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - SQL Operators and User-Defined Functions

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
Administration
Programming Reference
Publication ID
B035-1210-171K
Language
English (United States)
Write_NOS (
  ON { [ database_name.]table_name | (subquery) } 
     [  PARTITION BY column_name [ ,...] ORDER BY column_name [ ,...] |
       HASH BY column_name [ ,...] LOCAL ORDER BY column_name [ ,...] |
       LOCAL ORDER BY column_name [ ,...]]
  USING
    LOCATION ('external_storage_path')
    [  AUTHORIZATION ( { [DatabaseName.]AuthorizationObjectName |
       '{"Access_ID":"identification", "Access_Key":"secret_key"}' } ) ]
    STOREDAS ('PARQUET')
    [ NAMING ({ 'DISCRETE' | 'RANGE' }) ]
    [ MANIFESTFILE ('manifest_name') ]
    [ MANIFESTONLY ('TRUE') ]
    [ OVERWRITE ({'TRUE' | 'FALSE' }) ]
    [ INCLUDE_ORDERING ({'TRUE' | 'FALSE' }) ]
    [ INCLUDE_HASHBY ({'TRUE' | 'FALSE' }) ]
    [ MAXOBJECTSIZE ('max_object_size') ]
    [ COMPRESSION ( { 'GZIP' | 'SNAPPY' } ) ]
)
You must type the bold curly braces shown in '{"Access_ID":"identification", "Access_Key":"secret_key"}'. Those particular curly braces are part of the WRITE_NOS statement.

Syntax Elements

[ database_name.]table_name
Name of the table from which data will be written to external storage.
subquery
Query that returns data to be written to external storage.
PARTITION BY column_name [ ,...] ORDER BY column_name [ ,...]
HASH BY column_name [ ,...] LOCAL ORDER BY column_name [ ,...]
LOCAL ORDER BY column_name [ ,...]
These clauses allow you to control how Vantage sorts and divides the rows of table_name or the rows returned bysubquery among the AMPs before these rows are written to external storage. These clauses affect the external storage object names, which reflect the values each stored object contains according to the NAMING option.
  • PARTITON BY invokes WRITE_NOS multiple times on each AMP, once for each partition, and writes at least one file to external storage per partition. If you specify PARTITION BY, the partitioning columns must be included in the ORDER BY list.
  • HASH BY invokes WRITE_NOS only once per AMP, and writes at least one file to external storage for each AMP.
  • Use LOCAL ORDER BY alone if any of the columns that would have been in the PARTITION BY or HASH BY column lists has only a small variety of values, which would hash all the table data to one or just a small number of AMPs, causing data skew. LOCAL ORDER BY minimizes data skew by preventing data from being redistributed based on the partitioning or hashing columns. Instead the data is ordered in place on each AMP before being written to external storage.
You can specify up to 10 columns to partition, hash, or local order by.
You can see how many partitions will be created for a table or subquery before you write to external storage by using a query that groups the results by the partitioning columns. For example:
SELECT COUNT(*) 
FROM (SELECT * FROM table 
      GROUP BY column1,column2)
 derived_table_name;
The following restrictions on columns used for ORDER BY and LOCAL ORDER BY:
  • Data types are limited to BYTEINT, SMALLINT, INTEGER, BIGINT, DATE, and VARCHAR.
  • VARCHAR columns can only contain alphanumeric and these special characters:
     - _ ! * ' ( ) 
  • VARCHAR columns are limited to 128 characters.
LOCATION
A URI identifying the external storage system in the format
/connector/endpoint/bucket_or_container/prefix
The LOCATION string cannot exceed 2048 characters.
AUTHORIZATION
[Optional] Authorization for accessing external storage.
On any platform, you can specify an authorization object ([DatabaseName.]AuthorizationObjectName). You must have the EXECUTE privilege on AuthorizationObjectName.
On Amazon S3 and Azure Blob storage and Azure Data Lake Storage Gen2, you can specify either an authorization object or a string in JSON format. The string specifies the USER (identification) and PASSWORD (secret_key) for accessing external storage. The following table shows the supported credentials for USER and PASSWORD (used in the CREATE AUTHORIZATION command):
System/Scheme USER PASSWORD
AWS Access Key ID Access Key Secret
Azure / Shared Key Storage Account Name Storage Account Key
Azure Shared Access Signature (SAS) Storage Account Name Account SAS Token
Google Cloud (S3 interop mode) Access Key ID Access Key Secret
Google Cloud (native) Client Email Private Key
On-premises object stores Access Key ID Access Key Secret
Public access object stores <empty string>

Enclose the empty string in single straight quotes: USER ''

<empty string>

Enclose the empty string in single straight quotes: PASSWORD ''

If you use a function mapping to define a wrapper for READ_NOS, you can specify the authorization in the function mapping. With function mappings, you can use only [ INVOKER | DEFINER ] TRUSTED, not system-wide authorization.
If an AWS IAM credential provides access, you can omit the AUTHORIZATION clause.
STOREDAS
Objects created in external storage by WRITE_NOS are written only in Parquet format.
NAMING
This determines how the objects containing the rows of data are named in the external storage:
  • Discrete naming uses the ordering column values as part of the object names in external storage. For example, if the PARTITION BY clause has ORDER BY dateColumn, intColumn, the discrete form name of the objects written to external storage would include the values for those columns as part of the object name, which would look similar to this:
    S3/ceph-s3.teradata.com/xz186000/2019-03-01/13/object_33_0_1.parquet
    2019-03-01 is the value for dateColumn, the first ordering column, and 13 is the value for the second ordering column, intColumn. All rows stored in this external Parquet-formatted object contain those two values.
  • Range naming, the default, includes as part of the object name the range of values included in the partition for each ordering column. For example, using the same ORDER BY as above the object names would look similar to this:
    S3/ceph-s3.teradata.com/xz186000/2019-01-01/2019-03-02/9/10000/object_33_0_1.parquet
    where 2019-01-01 is the minimum value in that object for the first ordering column, dateColumn, 2019-03-02 is the maximum value for the rows stored in this external Parquet-formatted object. Value 9 is the minimum value for the second ordering column, intColumn, and 10000 is the maximum value for that column.
MANIFESTFILE
Specifies the fully qualified path and file name where the manifest file is written. Use the format
/connector/end point/bucket_or_container/prefix/manifest_file_name
For example:
/S3/ceph-s3.teradata.com/xz186000/manifest/manifest.json
If you do not include the MANIFESTFILE parameter, no manifest file is written.
MANIFESTONLY
Writes only a manifest file in external storage.
No actual data objects are written to external storage if you use MANIFESTONLY('TRUE').
You must also use the MANIFESTFILE('manifest_name') option to create a manifest file in external storage.
Use this option to create a new manifest file in the event that a WRITE_NOS operation fails due to a database abort or restart, or when network connectivity issues interrupt and stop a WRITE_NOS operation before all data has been written to external storage.
The manifest is created from the table or query result set that is input to WRITE_NOS. The input must be a list of storage object names and sizes, with one row per object.
The input to WRITE_NOS with MANIFESTONLY can itself incorporate READ_NOS, similar to this, which uses function mappings for WRITE_NOS and READ_NOS:
SELECT * FROM WRITE_NOS_fm 
(ON 
  (SELECT Location, ObjectLength 
   FROM READ_NOS_fm 
   (ON 
     (SELECT CAST(NULL AS JSON CHARACTER SET UNICODE) )
     USING
       LOCATION('your_location')
       RETURNTYPE('NOSREAD_KEYS') 
  ) AS d1 )
USING
 MANIFESTFILE('your_location/orders_manifest.json')
 MANIFESTONLY('TRUE')
 OVERWRITE('TRUE')
) AS d;
A query like this can be used if a WRITE_NOS operation fails before it can create a manifest file. The new manifest file created using READ_NOS will reflect all data objects currently in the external storage location, and can aid in determining which data objects resulted from the incomplete WRITE_NOS operation. For more information, see Teradata Vantage™ - Native Object Store Getting Started Guide, B035-1214.
OVERWRITE
Determines whether an existing manifest file in external storage will be overwritten with a new manifest file that has the same name. If FALSE, the default, WRITE_NOS returns an error if a manifest file exists in external storage that is named identically to the value of MANIFESTFILE.
OVERWRITE must be used with MANIFESTONLY('TRUE').
INCLUDE_ORDERING
INCLUDE_HASHBY
Determines whether the the ORDER BY and HASH BY columns and their values are written to external storage.
MAXOBJECTSIZE
ifies the maximum output object size in megabytes, where max_object_size is a number between 4 and 16. The default is the value of the DefaultRowGroupSize field in DBS Control. For more information on DBS Control, see Teradata Vantage™ - Database Utilities, B035-1102.
COMPRESSION
Determines the compression algorithm used to compress the objects written to external storage.
For Parquet files the compression occurs inside parts of the parquet file instead of for the entire file, so the file extension on external objects remains .parquet.

Returns

WRITE_NOS writes the database data to external storage according to the options passed to it, and returns information about the objects it wrote to external storage. The returned values are a table with the following columns:

NodeId
INTEGER value that identifies the SQL Engine node that contains the data written to the external object.
AmpId
INTEGER value that identifies the AMP that contains the data written to the external object.
Sequence
BIGINT value that uniquely identifies the external object to avoid name conflicts in cases where there might otherwise have been multiple objects with the same path and name.
ObjectName
VARCHAR(1024) UNICODE value that is the full object file name in the format:
/connector/endpoint/bucket_or_container/prefix/object_file_name
  • connector
  • endpoint
  • bucket_or_container
  • prefix depends on the value of the NAMING option, discrete or range naming.
  • object_file_name is in the format:
    <Location Path>/[ <Discrete Naming or Range Naming>]/object_<Node_ID>_<AMP_ID>_<Sequence>.parquet
For example:
/S3/ceph-s3.mycompany.com/test-bucket/2020-06-01/object_33_0_1.parquet
ObjectSize
BIGINT value that is the external object size in bytes.
RecordCount
BIGINT value that indicates the number of SQL Engine records in the external object.

Usage Notes

WRITE_NOS will not overwrite data objects in external storage. If a WRITE_NOS operation encounters an object with the exact path and name it is trying to write, it will stop and generate an error.

In this case, the WRITE_NOS operation will stop, return an error, and will not create a manifest file, even if the MANIFESTFILE option was used. The write operation may have written some data objects to the external storage location, and these must be removed manually before attempting to repeat the same WRITE_NOS operation. For information on how to determine the data objects that must be removed from external storage before repeating the same WRITE_NOS operation, see Teradata Vantage™ - Native Object Store Getting Started Guide, B035-1214.