Column Definitions | CREATE FOREIGN TABLE | Teradata Vantage - 17.10 - data_column_definition - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - SQL Data Definition Language Syntax and Examples

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
Programming Reference
Publication ID
B035-1144-171K
Language
English (United States)

[Optional for CSV- or Parquet-formatted data, disallowed otherwise.] Specify the definition of one or more data columns in the external file, in the order in which the records appear in the external file.

The location_column and the payload_column or data_column_definition fields are optional, because NOS generates them. For CSV, if NOS is cannot generate the columns, it generates the payload column of DATASET CSV instead.

Each data_column_definition has the format column_name data_type.

CSV Files

Data columns are optional because NOS creates them from the information in the data itself. If you declare the columns and their data types, NOS checks their correctness.

If you specify data columns, these rules apply:
  • The column_name need not match the name of the corresponding column in the header record.
  • The data_type is the Vantage column data type to associate with the corresponding field in the CSV record.
  • For a CSV table, data_type must be one of the following:
    • SMALLINT
    • INTEGER
    • BIGINT
    • DECIMAL (precision, scale) where precision is in the range [1, 38] and scale is in the range [0, 38].
    • BYTEINT
    • NUMBER
    • REAL, FLOAT, or DOUBLE PRECISION
    • BYTE
    • VARBYTE
    • CHAR CHARACTER SET { LATIN | UNICODE }
    • VARCHAR CHARACTER SET { LATIN | UNICODE }
    • CLOB of at most 16 MB
    • BLOB of at most 16 MB
    • DATE
    • INTERVAL
    • TIME
    • TIMESTAMP
    User-defined types (UDTs) and period data types (PDTs) are not supported.
  • data_type can specify these attributes:
    • CASESPECIFIC
    • UPPERCASE
    • FORMAT
    • TITLE
  • data_type cannot specify these attributes:
    • NOT NULL
    • COMPRESS
    • CONSTRAINT
    • REFERENCES
    • CHECK
  • If data_type is a numeric type:
    • You can specify the FORMAT in which to display the data.
    • A field value enclosed in single quotation marks is treated as a string, causing an error and causing Vantage to skip the record.
  • If data_type is a date type:
    • Vantage uses its FORMAT to determine the positions of the day, month, and year.
    • The default FORMAT is 'YY/MM/DD'.
    • You can specify a different FORMAT.
    • If the date field in a CSV record does not have the default FORMAT, you must specify its FORMAT.
    • Any FORMAT you specify must represent the month as MM and the day as DD.
  • If data_type is a character type:
    • A field value can be enclosed in single or double quotation marks.

      To strip the enclosing double quotation marks, use STRIP_ENCLOSING_CHAR. (You cannot strip single quotation marks.)

    • A field value can have leading spaces, trailing spaces, or both.
    • By default, Vantage does not strip exterior space characters (leading or trailing spaces).

      To strip exterior space characters, use either STRIP_EXTERIOR_SPACES or the SQL functions TRIM, LTRIM, and RTRIM.

  • If you specify more columns than the number of fields that some records have, Vantage gives each extra column the value NULL.
  • If you specify fewer columns than the number of fields that some records have, Vantage ignores the extra fields.

Parquet Files

The data_type must be the Vantage data type that corresponds to the Parquet data type in the Parquet schema:

Parquet Data Type Corresponding Vantage Data Type Description
UINT_8 SMALLINT Unsigned integer with range 0 to 255.
UINT_16 INTEGER Unsigned integer with range 0 to 65535.
UINT_32 BIGINT Unsigned integer with range 0 to 4,294,967,295.
UINT_64 DECIMAL(20,0) Unsigned integer with range 0 to 18,446,744,073,709,551,615.
INT_8 BYTEINT Signed integer with range -128 to 127.
INT_16 SMALLINT Signed integer with range -32768 to 32767.
INT_32 INTEGER Signed integer with range -2,147,483,648 to 2,147,483,647.
INT_64 BIGINT Signed integer with range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
INT_96 TIMESTAMP(6) Used to represent timestamp. First 8 bytes has number of nanoseconds and next 4 bytes has number of days since Julian day.
DECIMAL(p,s) DECIMAL(p,s) Decimal where precision is between 1 and 38 (inclusive) and scale is between 0 and 38 (inclusive).
VARBYTE(p+1), where p is the precision of the DECIMAL. Precision and scale is greater than 38.
FLOAT REAL IEEE 32-bit floating point.
DOUBLE REAL IEEE 64-bit floating point.
DATE DATE Number of days from the Unix epoch, 1 January 1970
TIME_MILLIS TIME(3) or TIME(6) Stores time in milliseconds.
TIME_MICROS TIME(6) Stores time in microseconds.
TIMESTAMP_MILLIS TIMESTAMP(3) or TIMESTAMP(6) Stores timestamp in milliseconds.
TIMESTAMP_MICROS TIMESTAMP(6) Stores timestamp in microseconds.
INTERVAL VARCHAR CHARACTER SET UNICODE Stores interval in months, days and milliseconds.
BOOL BYTEINT Boolean (True or False).
STRING VARCHAR or CLOB CHARACTER SET UNICODE Encoded as a UTF8 byte array.
BSON JSON STORAGE FORMAT BSON BSON data.
JSON JSON CHARACTER SET UNICODE JSON data.
STRUCT VARCHAR or CLOB CHARACTER SET UNICODE Group of fixed members.
MAP VARCHAR or CLOB CHARACTER SET UNICODE Maps keys to values.
LIST VARCHAR or CLOB CHARACTER SET UNICODE Contains data that is stored in array.
ENUM VARCHAR or CLOB CHARACTER SET UNICODE Store enumerated values encoded as a UTF8 string.
ARRAY VARCHAR or CLOB CHARACTER SET UNICODE Stored as repeated fields. Can be an array of a single value or multiple values.