Native Object Store Limitations - Analytics Database - Teradata Vantage

Teradata Vantage™ - Native Object Store Getting Started Guide - 17.20

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-05
dita:mapPath
tsq1628112323282.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
jjn1567647976698
Product Category
Teradata Vantage

These Native Object Store limitations are relevant to the CREATE FOREIGN TABLE statement.

CSV and JSON Payload Size

  • The maximum payload size is 16,776,192 bytes for READ_NOS for CSV and JSON data formats.
    • For UNICODE, that is equivalent to 8,388,096 characters.
    • For LATIN, that is equivalent to 16,776,192 characters.

The actual payload size is equivalent to the dataset individual record size.

Parquet Page Size

The maximum Parquet page size supported is for both READ_NOS and WRITE_NOS is 16,776,192 bytes (including LOB and LOB-based UDT columns).

If some of the data in the Parquet record is binary, the maximum number of characters is proportionately reduced.

Foreign Table Limitations

  • The following are not supported on foreign tables:
    • Primary indexes
    • Secondary indexes
    • Join indexes
    • Hash indexes
    • Row or column partitioning (except for Parquet foreign tables, which require column partitioning)
    • DML operations that change table contents (INSERT, UPDATE, DELETE)

Parquet Format Limitations

  • Certain complex data types are not supported, including STRUCT, MAP, LIST, and ENUM.
  • Because support for the STRUCT data type is not available, nested Parquet object stores cannot be processed by Native Object Store.

Bad Values in a Numeric Field

As with any CSV or JSON datasets, a NOS table can become unusable if a newly uploaded Amazon S3 file contains a single bad value in a numeric field. If this happens, you can identify the bad record/field by using TRYCAST. You can first upload new Amazon S3 files to a temporary location in the object storage and use TRYCAST to make sure the numeric data is not corrupted. Then, you can move the data to its permanent location in the object store. See Teradata Vantage™ - SQL Functions, Expressions, and Predicates, B035-1145.

For example:

Create an authorization object, if not already done:

CREATE AUTHORIZATION MyAuthObj
USER 'YOUR-ACCESS-KEY-ID'
PASSWORD 'YOUR-SECRET-ACCESS-KEY;

Create an example table.

CREATE MULTISET FOREIGN TABLE bad_numeric_fields
, EXTERNAL SECURITY MyAuthObj
USING (
     LOCATION  ('/s3/td-usgs-public.s3.amazonaws.com/DATA-BAD/bad_numeric_data')
);

Select:

SELECT payload FROM bad_numeric_fields;

Result:

Payload
----------------------------------------------------------------------------
{ "site_no":"09396100", "datetime":"2018-07-16 00:00", "Flow":"44.7", "GageHeight":"1.50", "Precipitation":"0.00", "GageHeight2":"1.50"}
{ "site_no":"09396100", "datetime":"2018-07-14 00:00", "Flow":"232", "GageHeight":"2.16", "Precipitation":"0.00", "GageHeight2":"2.16"}
{ "site_no":"09396100", "datetime":"2018-07-14 00:14", "Flow":"186", "GageHeight":"2.05", "Precipitation":"", "GageHeight2":"2.05"}
{ "site_no":"09400812", "datetime":"2018-07-12 00:09", "Flow":"", "GageHeight":"jjjj", "Precipitation":"bcde", "BatteryVoltage":"12.5"}
{ "site_no":"09400815", "datetime":"2018-07-12 00:00", "Flow":"0.00", "GageHeight":"-0.01", "Precipitation":"0.00", "BatteryVoltage":""}
{ "site_no":"09400815", "datetime":"2018-07-12 00:07", "Flow":"", "GageHeight":"", "Precipitation":"", "BatteryVoltage":"12.5"}

Select from the table:

SELECT payload.site_no, payload.GageHeight(INT) FROM bad_numeric_fields;

Result: The query returns an error.

Use TRYCAST to figure out which data is bad:

SELECT payload.site_no (CHAR(20)), TRYCAST(payload.GageHeight AS INT) FROM bad_numeric_fields;

Sample result:

Payload.site_no       TRYCAST(Payload.GageHeight)
--------------------  ---------------------------
09396100                                        1
09396100                                        2
09396100                                        2
09400812                                        ?
09400815                                        0
09400815                                        0

The results show there is bad data in the record with site_no 09400812.

WRITE_NOS Limitations

  • The maximum Parquet page size supported for both READ_NOS and WRITE_NOS is 16,776,192 bytes (including LOB and LOB-based UDT columns). WRITE_NOS can create an unlimited number of these files.
  • When you read back Parquet data created by WRITE_NOS, then the data type depends on the version of Vantage, and the Parquet tool. The following table shows the differences:
    Type Teradata Version 17.0 or Generic Parquet Tools Teradata Version 17.10 and later
    CHAR (Unicode) VARCHAR (Unicode) CHAR (Unicode)
    CHAR (Latin) VARCHAR (Unicode) CHAR (Latin)
    CLOB (Latin) CLOB (Unicode) CLOB (Latin)
    INTERVAL BYTE INTERVAL
    NUMBER VARBYTE NUMBER
    PERIOD VARBYTE PERIOD
    VARCHAR (Latin) VARCHAR (Unicode) VARCHAR (Latin)
  • It is the responsibility of the user to clean up object store files on interrupted write operations. Write operations can be interrupted on transaction aborts or system resets, among other reasons.
  • Concurrent WRITE_NOS requests to the same location are likely to cause an error to be returned. If a request gets an error, any objects written by the request are not deleted from external storage and must be manually cleaned up using tools provided by the external object store vendor, such as s3cmd. To avoid this, Teradata recommends specifying a unique location for concurrent requests.
  • An error is reported if you attempt to write an object with same path name in the same location.
  • Manifest files are not cumulative. If you want to add entries to a manifest, you must create a new manifest that includes the original entries plus the ones you want to add.
  • An error is reported if you attempt to write another manifest file in the same location. Use OVERWRITE('TRUE') with MANIFESTONLY('TRUE') keywords to replace a manifest in the same location.