WRITE_NOS Syntax Elements - Teradata Vantage

Teradata® VantageCloud Lake

Deployment

VantageCloud

Edition

Lake

Product

Teradata Vantage

Published

January 2023

Language

English (United States)

Last Update

2024-04-03

dita:mapPath

phg1621910019905.ditamap

dita:ditavalPath

pny1626732985837.ditaval

dita:id

phg1621910019905

[ database_name.]table_name

Name of the table from which data is to be written to external object storage.

subquery

Query that returns data to be written to external object storage.

The query can include an expression that generates a destination path name to reference in the CREATE FOREIGN TABLE PARTITION BY clause. See Generate Destination Path Name in Subquery.

PARTITION BY column_name [ ,...] ORDER BY column_name [ ,...]

HASH BY column_name [ ,...] LOCAL ORDER BY column_name [ ,...]

LOCAL ORDER BY column_name [ ,...]

These clauses allow you to control how Vantage sorts and divides the rows of table_name or the rows returned bysubquery among the AMPs before these rows are written to external object storage.

These clauses affect the external storage object names, which reflect the values each stored object contains according to the NAMING clause.

PARTITION BY and HASH BY cause the paths to external storage objects to contain the data values. See PARTITION BY with INCLUDE_ORDERING ('TRUE') and PARTITION BY with INCLUDE_ORDERING ('FALSE').

Clause	Description
PARTITION BY … ORDER BY	Calls WRITE_NOS once per partition per AMP (multiple times per AMP). Writes at least one file to external object storage per partition. Column names after PARTITION BY must match those after ORDER BY.
HASH BY … LOCAL ORDER BY	Calls WRITE_NOS once per AMP. Writes at least one file to external object storage per AMP. Column names after HASH BY must match those after LOCAL ORDER BY.
LOCAL ORDER BY	Orders data on each AMP before writing data to external object storage. Use instead of PARTITION BY or HASH BY if any column in PARTITION BY or HASH BY has a small number distinct values. Otherwise, all table data hashes to one or few AMPs, causing data skew.

You can specify up to 10 columns to partition, hash, or local order by.

You can see the number of partitions to be created for a table or subquery before you write to external object storage by using a query that groups the results by the partitioning columns. For example:

SELECT COUNT(*) 
FROM (SELECT * FROM table 
      GROUP BY column1,column2)
 derived_table_name;

The following restrictions on columns used for ORDER BY and LOCAL ORDER BY:

Data types are limited to BYTEINT, SMALLINT, INTEGER, BIGINT, DATE, and VARCHAR.
VARCHAR columns can only contain alphanumeric and these special characters:
```
 - _ ! * ' ( ) 
```
VARCHAR columns are limited to 128 characters.

LOCATION

A URI identifying the external object storage system in the format

/connector/endpoint/bucket_or_container/prefix

The external_obect_storage_path cannot exceed 2048 characters and cannot include the substring '/./' or '/../'.

AUTHORIZATION

[Optional] Authorization for accessing external object storage.

On any platform, you can specify an authorization object ([DatabaseName.]AuthorizationObjectName). You must have the EXECUTE privilege on AuthorizationObjectName.

On Amazon S3 and Azure Blob Storage and Azure Data Lake Storage Gen2, you can specify either an authorization object or a string in JSON format. The string specifies the USER (identification) and PASSWORD (secret_key) for accessing external object storage. The following table shows the supported credentials for USER and PASSWORD (used in the CREATE AUTHORIZATION command):

System/Scheme	USER/ACCESS_ID	PASSWORD/ACCESS_KEY
AWS	Access Key ID	Access Key Secret
Azure / Shared Key	Storage Account Name	Storage Account Key
Azure Shared Access Signature (SAS)	Storage Account Name	Account SAS Token
Google Cloud (S3 interop mode)	Access Key ID	Access Key Secret
Google Cloud (native)	Client Email	Private Key
On-premises object storage	Access Key ID	Access Key Secret
Public access object storage	empty_string Enclose the empty string in single straight quotation marks: USER ''	empty_string Enclose the empty string in single straight quotation marks: PASSWORD ''

The following are alternatives to using an access key or password to secure S3-compatible external object storage. These are included in an authorization object, which is created by the CREATE AUTHORIZATION command:

Amazon Identity and Access Management (IAM)
AWS Assume Role used to allow existing AWS IAM users and service accounts temporary access to AWS resources in other accounts.

The following are alternatives to using Azure Storage Name and Storage Account Key:

Azure service principal used to assign restricted permissions to applications and services accessing Azure external object storage.

STOREDAS

Determines the format in which WRITE_NOS creates objects in external object storage. For information about external object storage formats, see SQL Data Definition Language.

NAMING

Determines how the objects containing the rows of data are named in the external storage:

DISCRETE
Uses the ORDER BY columns as part of the object names in external storage.
For example, for ORDER BY dateColumn, intColumn, the object names look similar to this:
```
S3/ceph-s3.teradata.com/xz186000/2019-03-01/13/object_33_0_1.parquet
```
2019-03-01 is the value for dateColumn, the first ordering column, and 13 is the value for the second ordering column, intColumn. Every row stored in this external Parquet-formatted object contains those two values.
RANGE
Includes as part of the object name the range of values included in the partition for each ordering column.
For example, for the preceding ORDER BY phrase, the object names look similar to this:
```
S3/ceph-s3.teradata.com/xz186000/2019-01-01/2019-03-02/9/10000/object_33_0_1.parquet
```
In this external Parquet-formatted object:
- 2019-01-01 and 2019-03-02 are the minimum and maximum values for the first ordering column, dateColumn.
- 9 and 10000 are the minimum and maximum values for the second ordering column, intColumn.

Option	Description
DISCRETE
RANGE

HEADER

[Optional] Specify whether the first record is to contain the column names.

For more information about headers, see SQL Data Definition Language.

Default: 'TRUE'

ROWFORMAT

[Optional for CSV data, disallowed otherwise.] NOS generates the ROWFORMAT clause.

encoding_format is:

{"field_delimiter":"fd_value", "record_delimiter":"\n"}

record_delimiter

Specify that the record delimiter is the line feed character, "\n". The key name and "\n" and case-sensitive.

field_delimiter

Specify the field delimiter. The key name and fd_value are case-sensitive.

Default: "," (comma)

MANIFESTFILE

Specifies the fully qualified path and file name where the manifest file is written. Use the format

/connector/end point/bucket_or_container/prefix/manifest_file_name

For example:

/S3/ceph-s3.teradata.com/xz186000/manifest/manifest.json

If you do not include the MANIFESTFILE parameter, no manifest file is written.

MANIFESTONLY

Writes only a manifest file in external object storage.

No actual data objects are written to external object storage if you use MANIFESTONLY('TRUE').

You must also use the MANIFESTFILE('manifest_name') option to create a manifest file in external object storage.

Use this option to create a new manifest file if a WRITE_NOS operation fails due to a database abnormal termination or restart, or when network connectivity issues interrupt and stop a WRITE_NOS operation before all data has been written to external storage.

The manifest is created from the table or query result set that is input to WRITE_NOS. The input must be a list of external storage object names and sizes, with one row per object.

The input to WRITE_NOS with MANIFESTONLY can itself incorporate READ_NOS, similar to this, which uses function mappings for WRITE_NOS and READ_NOS:

SELECT * FROM WRITE_NOS
(ON
   (SELECT Location, ObjectLength
   FROM (
     LOCATION='your_location'
     RETURNTYP='NOSREAD_KEYS'
   ) AS d1 )
USING
 MANIFESTFILE('your_location/orders_manifest.json')
 MANIFESTONLY('TRUE')
 OVERWRITE('TRUE')
) AS d;

A query like this can be used if a WRITE_NOS operation fails before creating a manifest file. The new manifest file created using READ_NOS reflecst all data objects in the external object storage location, and can help determine which data objects resulted from the incomplete WRITE_NOS operation. See Working with Manifest Files.

OVERWRITE

Determines whether an existing manifest file in external storage is to be overwritten with a new manifest file that has the same name. If FALSE, the default, WRITE_NOS returns an error if a manifest file exists in external storage that is named identically to the value of MANIFESTFILE.

OVERWRITE must be used with MANIFESTONLY('TRUE').

INCLUDE_ORDERING

INCLUDE_HASHBY

Determines whether the the ORDER BY and HASH BY columns and their values are written to external object storage.

See PARTITION BY with INCLUDE_ORDERING ('TRUE') and PARTITION BY with INCLUDE_ORDERING ('FALSE').

MAXOBJECTSIZE

Identifies the maximum output object size in megabytes, where max_object_size is a number between 4 and 16. The default is set by the system.

COMPRESSION

Determines the compression algorithm used to compress the objects written to external object storage.

For Parquet files the compression occurs inside parts of the parquet file instead of for the entire file, so the file extension on external objects remains .parquet.