WRITE_NOS Syntax Elements - Analytics Database - Teradata Vantage

SQL Data Manipulation Language

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-05
dita:mapPath
pon1628111750298.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
esx1472246586715
lifecycle
latest
Product Category
Teradata Vantage™
[ database_name.]table_name
Name of the table from which data is to be written to external storage.
subquery
Query that returns data to be written to external storage.
The query can include an expression that generates a destination path name to reference in the CREATE FOREIGN TABLE PARTITION BY clause. See Generate Destination Path Name in Subquery.
PARTITION BY column_name [ ,...] ORDER BY column_name [ ,...]
HASH BY column_name [ ,...] LOCAL ORDER BY column_name [ ,...]
LOCAL ORDER BY column_name [ ,...]
These clauses control how Vantage sorts and divides the rows of table_name or the rows returned by subquery among the AMPs before writing them to external storage.
These clauses affect the external storage object names, which reflect the values each stored object contains according to the NAMING clause.
PARTITION BY and HASH BY cause the paths to external storage objects to contain the data values. See PARTITION BY with INCLUDE_ORDERING ('TRUE') and PARTITION BY with INCLUDE_ORDERING ('FALSE').
Clause Description
PARTITION BY … ORDER BY Calls WRITE_NOS once per partition per AMP (multiple times per AMP). Writes at least one file to external storage per partition.

Column names after PARTITION BY must match those after ORDER BY.

HASH BY … LOCAL ORDER BY Calls WRITE_NOS once per AMP. Writes at least one file to external storage per AMP.

Column names after HASH BY must match those after LOCAL ORDER BY.

LOCAL ORDER BY Orders data on each AMP before writing it to external storage.

Use instead of PARTITION BY or HASH BY if any column in PARTITION BY or HASH BY has few distinct values. Otherwise, all table data hashes to one or few AMPs, causing data skew.

You can specify up to 10 columns to partition, hash, or local order by.
You can see how many partitions are to be created for a table or subquery before you write to external storage by using a query that groups the results by the partitioning columns. For example:
SELECT COUNT(*) 
FROM (SELECT * FROM table 
      GROUP BY column1,column2)
 derived_table_name;
The following restrictions on columns used for ORDER BY and LOCAL ORDER BY:
  • Data types are limited to BYTEINT, SMALLINT, INTEGER, BIGINT, DATE, and VARCHAR.
  • VARCHAR columns can only contain alphanumeric and these special characters:
     - _ ! * ' ( ) 
  • VARCHAR columns are limited to 128 characters.
LOCATION
A URI identifying the external storage system in the format
/connector/endpoint/bucket_or_container/prefix
The external_storage_path cannot exceed 2048 characters and cannot include the substring '/./' or '/../'.
AUTHORIZATION
[Optional] Authorization for accessing external storage.
On any platform, you can specify an authorization object ([DatabaseName.]AuthorizationObjectName). You must have the EXECUTE privilege on AuthorizationObjectName.
On Amazon S3 and Azure Blob storage and Azure Data Lake Storage Gen2, you can specify either an authorization object or a string in JSON format. The string specifies the USER (identification) and PASSWORD (secret_key) for accessing external storage. The following table shows the supported credentials for USER and PASSWORD (used in the CREATE AUTHORIZATION command):
System/Scheme USER/ACCESS_ID PASSWORD/ACCESS_KEY
AWS Access Key ID Access Key Secret
Azure / Shared Key Storage Account Name Storage Account Key
Azure Shared Access Signature (SAS) Storage Account Name Account SAS Token
Google Cloud (S3 interop mode) Access Key ID Access Key Secret
Google Cloud (native) Client Email Private Key
On-premises object storage Access Key ID Access Key Secret
Public access object storage empty_string

Enclose the empty string in single straight quotation marks: USER ''

empty_string

Enclose the empty string in single straight quotation marks: PASSWORD ''

The following are alternatives to using an access key or password to secure S3-compatible external object storage. These are included in an authorization object, which is created by the CREATE AUTHORIZATION command:
  • Amazon Identity and Access Management (IAM)
  • AWS Assume Role used to allow existing AWS IAM users and service accounts temporary access to AWS resources in other accounts. See "Assume Role Authorization" in Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.
If you use a function mapping to define a wrapper for READ_NOS, you can specify the authorization in the function mapping. With function mappings, you can use only [ INVOKER | DEFINER ] TRUSTED, not system-wide authorization.
If an AWS IAM credential provides access, you can omit the AUTHORIZATION clause.
STOREDAS
Determines the format in which WRITE_NOS creates objects in external storage. For information about external storage formats, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.
NAMING
Determines how the objects containing the rows of data are named in the external storage:
  • DISCRETE

    Uses the ORDER BY columns as part of the object names in external storage.

    For example, for ORDER BY dateColumn, intColumn, the object names look similar to this:
    S3/ceph-s3.teradata.com/xz186000/2019-03-01/13/object_33_0_1.parquet

    2019-03-01 is the value for dateColumn, the first ordering column, and 13 is the value for the second ordering column, intColumn. Every row stored in this external Parquet-formatted object contains those two values.

  • RANGE

    Includes as part of the object name the range of values included in the partition for each ordering column.

    For example, for the preceding ORDER BY phrase, the object names look similar to this:
    S3/ceph-s3.teradata.com/xz186000/2019-01-01/2019-03-02/9/10000/object_33_0_1.parquet
    In this external Parquet-formatted object:
    • 2019-01-01 and 2019-03-02 are the minimum and maximum values for the first ordering column, dateColumn.
    • 9 and 10000 are the minimum and maximum values for the second ordering column, intColumn.
Option Description
DISCRETE  
RANGE  
ROWFORMAT
[Optional for CSV data, disallowed otherwise.] NOS generates the ROWFORMAT clause.
encoding_format is:
{"field_delimiter":"fd_value", "record_delimiter":"\n"}
record_delimiter
Specify that the record delimiter is the line feed character, "\n". The key name and "\n" are case-sensitive.
field_delimiter
Specify the field delimiter. The key name and fd_value are case-sensitive.
Default: "," (comma)
MANIFESTFILE
Specifies the fully qualified path and file name where the manifest file is written. Use the format
/connector/end point/bucket_or_container/prefix/manifest_file_name
For example:
/S3/ceph-s3.teradata.com/xz186000/manifest/manifest.json
If you do not include the MANIFESTFILE parameter, no manifest file is written.
MANIFESTONLY
Writes only a manifest file in external storage.
No actual data objects are written to external storage if you use MANIFESTONLY('TRUE').
You must also use the MANIFESTFILE('manifest_name') option to create a manifest file in external storage.
Use this option to create a new manifest file if a WRITE_NOS operation fails due to a database abnormal termination or restart, or when network connectivity issues interrupt and stop a WRITE_NOS operation before all data has been written to external storage.
The manifest is created from the table or query result set that is input to WRITE_NOS. The input must be a list of storage object names and sizes, with one row per object.
The input to WRITE_NOS with MANIFESTONLY can itself incorporate READ_NOS, similar to this, which uses function mappings for WRITE_NOS and READ_NOS:
SELECT * FROM WRITE_NOS
(ON
   (SELECT Location, ObjectLength
   FROM (
     LOCATION='your_location'
     RETURNTYP='NOSREAD_KEYS'
   ) AS d1 )
USING
 MANIFESTFILE('your_location/orders_manifest.json')
 MANIFESTONLY('TRUE')
 OVERWRITE('TRUE')
) AS d;
You can use such a query if a WRITE_NOS operation fails before it can create a manifest file. The new manifest file created using READ_NOS reflects all data objects in the external storage location, and can help you determine which data objects resulted from the incomplete WRITE_NOS operation. See Teradata Vantage™ - Native Object Store Getting Started Guide, B035-1214.
OVERWRITE
Determines whether an existing manifest file in external storage is to be overwritten with a new manifest file that has the same name. If FALSE, the default, WRITE_NOS returns an error if a manifest file exists in external storage that is named identically to the value of MANIFESTFILE.
OVERWRITE must be used with MANIFESTONLY('TRUE').
INCLUDE_ORDERING
INCLUDE_HASHBY
Determines whether the the ORDER BY and HASH BY columns and their values are written to external storage.
See PARTITION BY with INCLUDE_ORDERING ('TRUE') and PARTITION BY with INCLUDE_ORDERING ('FALSE').
MAXOBJECTSIZE
Specifies maximum output object size in megabytes, where max_object_size is a number between 4 and 16.
Default: Value of DefaultRowGroupSize field in DBS Control. See Teradata Vantage™ - Database Utilities, B035-1102.
COMPRESSION
Determines the compression algorithm used to compress the objects written to external storage.
For Parquet files the compression occurs inside parts of the parquet file instead of for the entire file, so the file extension on external objects remains .parquet.