READ_NOS Table Operator Syntax | SQL Statements | Teradata Vantage - READ_NOS Syntax - Advanced SQL Engine

READ_NOS Table Operator Syntax | SQL Statements | Teradata Vantage - READ_NOS Syntax - Advanced SQL Engine - Teradata Database

SQL Operators and User-Defined Functions

Product

Advanced SQL Engine

Teradata Database

Release Number

17.05

Published

January 2021

Language

English (United States)

Last Update

2023-04-27

dita:mapPath

xwv1596137968859.ditamap

dita:ditavalPath

hoy1596145193032.ditaval

dita:id

B035-1210

lifecycle

Product Category

Teradata Vantage™

READ_NOS (
[ON { table_name | view_name | ( query_expression ) }]
USING (
  LOCATION ( 'external_file_path' ) 
  [  BUFFERSIZE ( 'buffer_size' )  ]
  [  RETURNTYPE ( { 'NOSREAD_RECORD' | 'NOSREAD_KEYS' | 
                    'NOSREAD_RAW' | 'NOSREAD_PARQUET_SCHEMA'  } )  ] 
  [  SAMPLE_PERC ( 'row_sampling_value' ) ]
  [  STOREDAS ( { 'PARQUET' | 'TEXTFILE' } ) ]
  [  FULLSCAN ( { 'TRUE' | 'FALSE' } ) ]
  [  MANIFEST ( { 'TRUE' | 'FALSE' } ) ]
  [  ACCESS_ID ( 'identification' ) ]
  [  ACCESS_KEY ( 'secret_key' ) ]
  [  ROWFORMAT ( 'rowformat_value' ) ]
  [  HEADER ( { 'TRUE' | 'FALSE' } ) ]
)

Syntax Elements

table_name

Name of table.

view_name

Name of view.

query_expression

Expression to specify input to table operator.

LOCATION

You must specify a LOCATION value, which is a Uniform Resource Identifier (URI) pointing to the data in the external object storage system. The LOCATION value includes the following components:

Amazon S3: /connector/bucket.endpoint/[key_prefix]
Azure Blob storage and Azure Data Lake Storage Gen2: /connector/container.endpoint/[key_prefix]
Google Cloud Storage: /connector/endpoint/bucket/[key_prefix]

connector

Identifies the type of external storage system where the data is located.

Teradata requires the storage location to start with the following for all external storage locations:

Amazon S3 storage location must begin with /S3 or /s3
Azure Blob storage location (including Azure Data Lake Storage Gen2 in Blob Interop Mode) must begin with /AZ or /az
Google Cloud Storage location must begin with /GS or /gs

endpoint

A URL that identifies the system-specific entry point for the external object storage system.

bucket (Amazon S3, Google Cloud Storage) or container (Azure Blob storage and Azure Data Lake Storage Gen2)

A container that logically groups stored objects in the external storage system.

key_prefix

Identifies one or more objects in the logical organization of the bucket data. Because it is a key prefix, not an actual directory path, the key prefix may match one or more objects in the external storage. For example, the key prefix '/fabrics/cotton/colors/b/' would match objects: /fabrics/cotton/colors/blue, /fabrics/cotton/colors/brown, and /fabrics/cotton/colors/black. If there were organization levels below those, such as /fabrics/cotton/colors/blue/shirts, the same key prefix would gather those objects too.

Vantage validates only the first file it encounters from the location key prefix.

For example, this LOCATION value might specify all objects on an Amazon cloud storage system for the month of December, 2001:

LOCATION('/S3/YOUR-BUCKET.s3.amazonaws.com/csv/US-Crimes/csv-files/2001/Dec/')

connector	bucket	endpoint	key_prefix
S3	YOUR-BUCKET	s3.amazonaws.com	csv/US-Crimes/csv-files/2001/Dec/

This LOCATION could specify an individual storage object (or file), Day1.csv:

LOCATION('/S3/YOUR-BUCKET.s3.amazonaws.com/csv/US-Crimes/csv-files/2001/Dec/Day1.csv')

connector	bucket	endpoint	key_prefix
S3	YOUR-BUCKET	s3.amazonaws.com	csv/US-Crimes/csv-files/2001/Dec/Day11.csv

This LOCATION specifies an entire container in an Azure external object store (Azure Blob storage or Azure Data Lake Storage Gen2). The container may contain multiple file objects:

LOCATION('/AZ/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/nos-csv-data')

connector	container	endpoint	key_prefix
AZ	YOUR-STORAGE-ACCOUNT	blob.core.windows.net	nos-csv-data

This is an example of a Google Cloud Storage location:

LOCATION('/gs/storage.googleapis.com/YOUR-BUCKET/CSVDATA/RIVERS/rivers.csv')

connector	endpoint	bucket	key_prefix
GS	storage.googleapis.com	YOUR-BUCKET	CSVDATA/RIVERS/rivers.csv

BUFFERSIZE

Size of the network buffer to allocate when retrieving data from the external storage repository.

The default value is 16 MB, which is the maximum value.

NOSREAD_RECORD

Returns one row for each external record along with its metadata. This is the default.

Access external records by specifying one of the following:

Input table and LOCATION and an empty table. For CSV, you can include a schema definition.
Input table with a row for each external file. For CSV, this method does not support a schema definition.

For an empty single-column input table, do the following:

Define an input table with a single column, Payload, with the appropriate data type:
- JSON
- DATASET with a Storage Format of CSV
This column determines the output Payload column return type.

See Example: Using READ_NOS with NOSREAD_RECORD Return Type.
For LOCATION, specify the filepath.

For a multiple-column input table, define an input table with the following columns:

Location VARCHAR(2048) CHARACTER SET UNICODE
ObjectVersionID VARCHAR(1024) CHARACTER SET UNICODE
ObjectTimeStamp TIMESTAMP(6)
OffsetIntoObject BIGINT
ObjectLength BIGINT
Payload JSON or VARCHAR for CSV

This table can be populated using the output of the NOSREAD_KEYS return type. See Example: Using READ_NOS with NOSREAD_RECORD Return Type.

NOSREAD_KEYS

Retrieve the list of files from the path specified in the LOCATION USING clause.

A schema definition is not necessary.

Returns:

Location
ObjectVersionID
ObjectTimeStamp
ObjectLength, size of external file.

NOSREAD_RAW

Retrieves file data from the external storage services, not specific records.

Retrieved data is returned as CLOB/BLOB. You can retrieve a complete file from external storage and save in Teradata CLOB/BLOB format. The maximum amount of data that can be retrieved from the external storage and saved in the Teradata column is 2GB, the Vantage limit for LOBs. The ObjectLength corresponds to the length of CLOB/BLOB column read from the external storage.

This information is provided in the form of a table returned to the READ_NOS table operator.

The Payload column in the input table is only used to determine the datatype of the column in which the returned data is stored.

Define the input table with the following columns:

Location VARCHAR(2048) CHARACTER SET UNICODE
ObjectVersionID VARCHAR(1024) CHARACTER SET UNICODE
ObjectTimeStamp TIMESTAMP(6)
OffsetIntoObject BIGINT
BytesToRead BIGINT
Payload CLOB/BLOB

READ_NOS returns a table with the following columns:

Location VARCHAR(2048) CHARACTER SET UNICODE
ObjectVersionID VARCHAR(1024) CHARACTER SET UNICODE
ObjectTimeStamp TIMESTAMP(6)
OffsetIntoObject BIGINT
ObjectLength BIGINT
Payload CLOB/BLOB, based on input table CLOB/BLOB Column.

NOSREAD_PARQUET_SCHEMA

Returns information about the Parquet data schema. For information about the mapping between Parquet data types and Teradata data types, see Parquet External Files in Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.

SAMPLE_PERC

Specifies the percentage of rows to retrieve from the external storage repository when RETURNTYPE is NOSREAD_RECORD. The valid range of values is from '0.0' to '1.0', where '1.0' represents 100% of the rows.

The default value is 1.0.

STOREDAS

Specifies the formatting style of the external data.

PARQUET means the external data is formatted as Parquet. This is a required parameter for Parquet data.
TEXTFILE means the external data uses a text-based format, such as CSV or JSON.

The default is TEXTFILE.

FULLSCAN

Determines whether READ_NOS scans columns of variable length types (CHAR, VARCHAR, BYTE, VARBYTE, JSON, and BSON) to discover the maximum length.

TRUE means the sizes of variable length data is determined from the Parquet data.
Choosing this value can impact performance because all variable length data type columns in each Parquet file at the location must be scanned to assess the value having the greatest length.
FALSE means variable length field sizes are assigned the Vantage maximum value for the particular data type.

The default is FALSE.

MANIFEST

Specifies whether the LOCATION value points to a manifest file (a file containing a list of files to read) or object name. The object name can include the full path or a partial path. It must identify a single file containing the manifest.

The individual entries within the manifest file must show complete paths.

Below is an example of a manifest file that contains a list of entries to locations in JSON format.

{
  "entries": [
     {"url":"s3://nos-core-us-east-1/UNICODE/JSON/mln-key/data-10/data-8_9_02-10.json"},
     {"url":"s3://nos-core-us-east-1/UNICODE/JSON/mln-key/data-10/data-8_9_02-101.json"},
     {"url":"s3://nos-core-us-east-1/UNICODE/JSON/mln-key/data-10/data-10-01/data-8_9_02-102.json"},
     {"url":"s3://nos-core-us-east-1/UNICODE/JSON/mln-key/data-10/data-10-01/data-8_9_02-103.json"}
  ]
}

ACCESS_ID

Identification to access external storage.

Specifying this option is not necessary if an authentication is defined, for example, in the EXTERNAL SECURITY clause of a function mapping or if IAM roles are defined for S3 repositories. See CREATE FUNCTION MAPPING in Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.

ACCESS_KEY

Password to access external storage.

ROWFORMAT

Specifies the encoding format of the external row, for example:

ROWFORMAT('{"field_delimiter":",", "record_delimiter":"\n", "character_set":"LATIN"}')

Specify ROWFORMAT using JSON format. It can include only the three keys shown above. Key names and values are case-specific, except for the value for "character_set", which can use any combination of letter cases.

The ROWFORMAT character set specification must be compatible with character set of the Payload column.

Do not specify ROWFORMAT for Parquet format data.

For a JSON column, these are the default values:

Payload Character Set	Default Definition
UNICODE	ROWFORMAT('{"record_delimiter":"\n", "character_set":"UTF8"}')
LATIN	ROWFORMAT('{"record_delimiter":"\n", "character_set":"LATIN"}')

For a CSV column, these are the default values:

Payload Character Set	Default Definition
UNICODE	ROWFORMAT('{"character_set":"UTF8"}') This is the default if you do not specify an input table for READ_NOS.
LATIN	ROWFORMAT('{"character_set":"LATIN"}')

Payload Character Set

Default Definition

UNICODE

ROWFORMAT('{"character_set":"UTF8"}')

This is the default if you do not specify an input table for READ_NOS.

LATIN

ROWFORMAT('{"character_set":"LATIN"}')

You can specify the following options:

field_delimiter: The default is "," (comma). You can also specify a custom field delimiter, such as tab "\t".
record_delimiter: New line feed character: "\n". A line feed (\n) is the only acceptable record delimiter.
character_set: "UTF8" or "LATIN"; If you do not specify a ROWFORMAT or payload column, Vantage assumes UTF8 Unicode.

HEADER

Specifies whether the first row of data in an input CSV file is interpreted as column headings for the subsequent rows of data.

Use this parameter only when a CSV input file is not associated with a separate schema object that defines columns for the CSV data.

The value for HEADER can be 'TRUE' or 'FALSE'. The default is 'TRUE'.