[Optional for CSV- or Parquet-formatted data, disallowed otherwise.] Specify the definition of one or more data columns in the external file, in the order in which the records appear in the external file.
The location_column and the payload_column or data_column_definition fields are optional, because NOS generates them. For CSV, if NOS is cannot generate the columns, it generates the payload column of DATASET CSV instead.
Each data_column_definition has the format column_name data_type.
CSV Files
Data columns are optional because NOS creates them from the information in the data itself. If you declare the columns and their data types, NOS checks their correctness.
- The column_name need not match the name of the corresponding column in the header record.
- The data_type is the Vantage column data type to associate with the corresponding field in the CSV record.
- For a CSV table, data_type must be one of the following:
- SMALLINT
- INTEGER
- BIGINT
- DECIMAL (precision, scale) where precision is in the range [1, 38] and scale is in the range [0, 38].
- BYTEINT
- NUMBER
- REAL, FLOAT, or DOUBLE PRECISION
- BYTE
- VARBYTE
- CHAR CHARACTER SET { LATIN | UNICODE }
- VARCHAR CHARACTER SET { LATIN | UNICODE }
- CLOB of at most 16 MB
- BLOB of at most 16 MB
- DATE
- INTERVAL
- TIME
- TIMESTAMP
User-defined types (UDTs) and period data types (PDTs) are not supported. - data_type can specify these attributes:
- CASESPECIFIC
- UPPERCASE
- FORMAT
- TITLE
- data_type cannot specify these attributes:
- NOT NULL
- COMPRESS
- CONSTRAINT
- REFERENCES
- CHECK
- If data_type is a numeric type:
- You can specify the FORMAT in which to display the data.
- A field value enclosed in single quotation marks is treated as a string, causing an error and causing Vantage to skip the record.
- If data_type is a date type:
- Vantage uses its FORMAT to determine the positions of the day, month, and year.
- The default FORMAT is 'YY/MM/DD'.
- You can specify a different FORMAT.
- If the date field in a CSV record does not have the default FORMAT, you must specify its FORMAT.
- Any FORMAT you specify must represent the month as MM and the day as DD.
- If data_type is a character type:
- A field value can be enclosed in single or double quotation marks.
To strip the enclosing double quotation marks, use STRIP_ENCLOSING_CHAR. (You cannot strip single quotation marks.)
- A field value can have leading spaces, trailing spaces, or both.
- By default, Vantage does not strip exterior space characters (leading or trailing spaces).
To strip exterior space characters, use either STRIP_EXTERIOR_SPACES or the SQL functions TRIM, LTRIM, and RTRIM.
- A field value can be enclosed in single or double quotation marks.
- If you specify more columns than the number of fields that some records have, Vantage gives each extra column the value NULL.
- If you specify fewer columns than the number of fields that some records have, Vantage ignores the extra fields.
Parquet Files
The data_type must be the Vantage data type that corresponds to the Parquet data type in the Parquet schema:
Parquet Data Type | Corresponding Vantage Data Type | Description |
---|---|---|
UINT_8 | SMALLINT | Unsigned integer with range 0 to 255. |
UINT_16 | INTEGER | Unsigned integer with range 0 to 65535. |
UINT_32 | BIGINT | Unsigned integer with range 0 to 4,294,967,295. |
UINT_64 | DECIMAL(20,0) | Unsigned integer with range 0 to 18,446,744,073,709,551,615. |
INT_8 | BYTEINT | Signed integer with range -128 to 127. |
INT_16 | SMALLINT | Signed integer with range -32768 to 32767. |
INT_32 | INTEGER | Signed integer with range -2,147,483,648 to 2,147,483,647. |
INT_64 | BIGINT | Signed integer with range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. |
INT_96 | TIMESTAMP(6) | Used to represent timestamp. First 8 bytes has number of nanoseconds and next 4 bytes has number of days since Julian day. |
DECIMAL(p,s) | DECIMAL(p,s) | Decimal where precision is between 1 and 38 (inclusive) and scale is between 0 and 38 (inclusive). |
VARBYTE(p+1), where p is the precision of the DECIMAL. | Precision and scale is greater than 38. | |
FLOAT | REAL | IEEE 32-bit floating point. |
DOUBLE | REAL | IEEE 64-bit floating point. |
DATE | DATE | Number of days from the Unix epoch, 1 January 1970 |
TIME_MILLIS | TIME(3) or TIME(6) | Stores time in milliseconds. |
TIME_MICROS | TIME(6) | Stores time in microseconds. |
TIMESTAMP_MILLIS | TIMESTAMP(3) or TIMESTAMP(6) | Stores timestamp in milliseconds. |
TIMESTAMP_MICROS | TIMESTAMP(6) | Stores timestamp in microseconds. |
INTERVAL | VARCHAR CHARACTER SET UNICODE | Stores interval in months, days and milliseconds. |
BOOL | BYTEINT | Boolean (True or False). |
STRING | VARCHAR or CLOB CHARACTER SET UNICODE | Encoded as a UTF8 byte array. |
BSON | JSON STORAGE FORMAT BSON | BSON data. |
JSON | JSON CHARACTER SET UNICODE | JSON data. |
STRUCT | VARCHAR or CLOB CHARACTER SET UNICODE | Group of fixed members. |
MAP | VARCHAR or CLOB CHARACTER SET UNICODE | Maps keys to values. |
LIST | VARCHAR or CLOB CHARACTER SET UNICODE | Contains data that is stored in array. |
ENUM | VARCHAR or CLOB CHARACTER SET UNICODE | Store enumerated values encoded as a UTF8 string. |
ARRAY | VARCHAR or CLOB CHARACTER SET UNICODE | Stored as repeated fields. Can be an array of a single value or multiple values. |