17.10 - READ_NOS Syntax Elements - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - SQL Operators and User-Defined Functions

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
Administration
Programming Reference
Publication ID
B035-1210-171K
Language
English (United States)
table_name
Name of table.
view_name
Name of view.
query_expression
Expression to specify input to table operator.
LOCATION
You must specify a LOCATION value, which is a Uniform Resource Identifier (URI) pointing to the data in the external object storage system. The LOCATION value includes the following components:
  • Amazon S3: /connector/bucket.endpoint/[key_prefix]
  • Azure Blob storage and Azure Data Lake Storage Gen2: /connector/storage-account.endpoint/container/[key_prefix]
  • Google Cloud Storage (GCS): /connector/endpoint/bucket/[key_prefix]
connector
Identifies the type of external storage system where the data is located.
Teradata requires the storage location to start with the following for all external storage locations:
  • Amazon S3 storage location must begin with /S3 or /s3
  • Azure Blob storage location (including Azure Data Lake Storage Gen2 in Blob Interop Mode) must begin with /AZ or /az.
  • Google Cloud Storage location must begin with /GS or /gs
storage-account
Used by Azure. The Azure storage account contains your Azure storage data objects.
endpoint
A URL that identifies the system-specific entry point for the external object storage system.
bucket (Amazon S3, GCS) or container (Azure Blob storage and Azure Data Lake Storage Gen2)
A container that logically groups stored objects in the external storage system.
key_prefix
Identifies one or more objects in the logical organization of the bucket data. Because it is a key prefix, not an actual directory path, the key prefix may match one or more objects in the external storage. For example, the key prefix '/fabrics/cotton/colors/b/' would match objects: /fabrics/cotton/colors/blue, /fabrics/cotton/colors/brown, and /fabrics/cotton/colors/black. If there were organization levels below those, such as /fabrics/cotton/colors/blue/shirts, the same key prefix would gather those objects too.
Vantage validates only the first file it encounters from the location key prefix.

For example, this LOCATION value might specify all objects on an Amazon cloud storage system for the month of December, 2001:

LOCATION('/S3/YOUR-BUCKET.s3.amazonaws.com/csv/US-Crimes/csv-files/2001/Dec/')
connector bucket endpoint key_prefixes
S3 YOUR-BUCKET s3.amazonaws.com csv/US-Crimes/csv-files/2001/Dec/

This LOCATION could specify an individual storage object (or file), Day1.csv:

LOCATION('/S3/YOUR-BUCKET.s3.amazonaws.com/csv/US-Crimes/csv-files/2001/Dec/Day1.csv')
connector bucket endpoint key_prefixes
S3 YOUR-BUCKET s3.amazonaws.com csv/US-Crimes/csv-files/2001/Dec/Day11.csv

This LOCATION specifies an entire container in an Azure external object store (Azure Blob storage or Azure Data Lake Storage Gen2). The container may contain multiple file objects:

LOCATION('/AZ/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/CONTAINER/nos-csv-data')
connector storage-account endpoint container key_prefixes
AZ YOUR-STORAGE-ACCOUNT blob.core.windows.net CONTAINER nos-csv-data

This is an example of a Google Cloud Storage location:

LOCATION('/gs/storage.googleapis.com/YOUR-BUCKET/CSVDATA/RIVERS/rivers.csv')
connector endpoint bucket key_prefixes
GS storage.googleapis.com YOUR-BUCKET CSVDATA/RIVERS/rivers.csv
AUTHORIZATION
[Optional] Authorization for accessing external storage.
On any platform, you can specify an authorization object ([DatabaseName.]AuthorizationObjectName). You must have the EXECUTE privilege on AuthorizationObjectName.
On Amazon S3 and Azure Blob storage and Azure Data Lake Storage Gen2, you can specify either an authorization object or a string in JSON format. The string specifies the USER (identification) and PASSWORD (secret_key) for accessing external storage. The following table shows the supported credentials for USER and PASSWORD (used in the CREATE AUTHORIZATION command):
System/Scheme USER PASSWORD
AWS Access Key ID Access Key Secret
Azure / Shared Key Storage Account Name Storage Account Key
Azure Shared Access Signature (SAS) Storage Account Name Account SAS Token
Google Cloud (S3 interop mode) Access Key ID Access Key Secret
Google Cloud (native) Client Email Private Key
On-premises object stores Access Key ID Access Key Secret
Public access object stores <empty string>

Enclose the empty string in single straight quotes: USER ''

<empty string>

Enclose the empty string in single straight quotes: PASSWORD ''

If you use a function mapping to define a wrapper for READ_NOS, you can specify the authorization in the function mapping. Note that [ INVOKER | DEFINER ] TRUSTED must be used with function mapping.
If you are using AWS IAM credentials, you can omit the AUTHORIZATION clause.
When accessing GCS, Advanced SQL Engine uses either the S3-compatible connector or the native Google connector, depending on the user credentials.
BUFFERSIZE
Size of the network buffer to allocate when retrieving data from the external storage repository.
The default value is 16 MB, which is the maximum value.
NOSREAD_RECORD
[Default] Returns one row for each external record along with its metadata.
Access external records by specifying one of the following:
  • Input table and LOCATION and an empty table.
  • Input table with a row for each external file.
For an empty single-column input table, do the following:
For a multiple-column input table, define an input table with the following columns:
  • Location VARCHAR(2048) CHARACTER SET UNICODE
  • ObjectVersionID VARCHAR(1024) CHARACTER SET UNICODE
  • ObjectTimeStamp TIMESTAMP(6)
  • OffsetIntoObject BIGINT
  • ObjectLength BIGINT
  • Payload JSON

This table can be populated using the output of the NOSREAD_KEYS return type.

NOSREAD_KEYS
Retrieves the list of files from the path specified in the LOCATION USING clause.
A schema definition is not necessary.
Returns:
  • Location
  • ObjectVersionID
  • ObjectTimeStamp
  • ObjectLength, size of external file.

See Example: Using READ_NOS with NOSREAD_KEYS Return Type.

NOSREAD_PARQUET_SCHEMA
Returns information about the Parquet data schema. If you are using the FULLSCAN option, use NOSREAD_PARQUET_SCHEMA; otherwise you can use NOSREAD_SCHEMA to get information about the Parquet schema. For information about the mapping between Parquet data types and Teradata data types, see Parquet External Files in Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.
NOSREAD_SCHEMA
Returns the name and data type of each column of the file specified by external_file_path. Schema format can be JSON, CSV, or Parquet.
SAMPLE_PERC
Specifies the percentage of rows to retrieve from the external storage repository when RETURNTYPE is NOSREAD_RECORD. The valid range of values is from '0.0' to '1.0', where '1.0' represents 100% of the rows.
The default value is 1.0.
STOREDAS
Specifies the formatting style of the external data.
  • PARQUET means the external data is formatted as Parquet. This is a required parameter for Parquet data.
  • TEXTFILE means the external data uses a text-based format, such as CSV or JSON.
The default is TEXTFILE.
FULLSCAN
Determines whether READ_NOS scans columns of variable length types (CHAR, VARCHAR, BYTE, VARBYTE, JSON, and BSON) to discover the maximum length.
  • TRUE means the sizes of variable length data is determined from the Parquet data.
    Choosing this value can impact performance because all variable length data type columns in each Parquet file at the location must be scanned to assess the value having the greatest length.
  • FALSE means variable length field sizes are assigned the Vantage maximum value for the particular data type.
The default is FALSE.
FULLSCAN is only used with a RETURNTYPE of NOSREAD_PARQUET_SCHEMA.
MANIFEST
Specifies whether the LOCATION value points to a manifest file (a file containing a list of files to read) or object name. The object name can include the full path or a partial path. It must identify a single file containing the manifest.
The individual entries within the manifest file must show complete paths.
Below is an example of a manifest file that contains a list of entries to locations in JSON format.
{
  "entries": [
     {"url":"s3://nos-core-us-east-1/UNICODE/JSON/mln-key/data-10/data-8_9_02-10.json"},
     {"url":"s3://nos-core-us-east-1/UNICODE/JSON/mln-key/data-10/data-8_9_02-101.json"},
     {"url":"s3://nos-core-us-east-1/UNICODE/JSON/mln-key/data-10/data-10-01/data-8_9_02-102.json"},
     {"url":"s3://nos-core-us-east-1/UNICODE/JSON/mln-key/data-10/data-10-01/data-8_9_02-103.json"}
  ]
}
ROWFORMAT
Specifies the encoding format of the external row, for example:
ROWFORMAT('{"field_delimiter":",", "record_delimiter":"\n", "character_set":"LATIN"}')
Specify ROWFORMAT using JSON format. It can include only the three keys shown above. Key names and values are case-specific, except for the value for "character_set", which can use any combination of letter cases.
The ROWFORMAT character set specification must be compatible with character set of the Payload column.
Do not specify ROWFORMAT for Parquet format data.
For a JSON column, these are the default values:
Payload Character Set Default Definition
UNICODE
ROWFORMAT('{"record_delimiter":"\n", "character_set":"UTF8"}')
LATIN
ROWFORMAT('{"record_delimiter":"\n", "character_set":"LATIN"}')
For a CSV column, these are the default values:
Payload Character Set Default Definition
UNICODE

ROWFORMAT('{"character_set":"UTF8"}')

This is the default if you do not specify an input table for READ_NOS.

LATIN ROWFORMAT('{"character_set":"LATIN"}')
You can specify the following options:
field_delimiter
The default is "," (comma). You can also specify a custom field delimiter, such as tab "\t".
record_delimiter
New line feed character: "\n". A line feed (\n) is the only acceptable record delimiter.
character_set
"UTF8" or "LATIN"
If you do not specify a ROWFORMAT or payload column, Vantage assumes UTF8 Unicode.
HEADER
Specifies whether the first row of data in an input CSV file is interpreted as column headings for the subsequent rows of data.
Use this parameter only when a CSV input file is not associated with a separate schema object that defines columns for the CSV data.
The value for HEADER can be 'TRUE' or 'FALSE'. The default is 'TRUE'.