CSV files usually include a header row as the first line of the file. The header row defines the names for the columns defined by the corresponding fields of data in each subsequent row of the file. By default, commas (,) separate each field in a record. By default, line breaks (carriage return line feeds (CRLF) and line feeds (LF)) separate each record. For example:
First Name,Last Name,City John,Smith,Los Angeles Mary,Jones,New York
- the field_names, which are used as the column names in the foreign table. If the schema definition includes a field_names parameter, Vantage assumes the first record in the CSV file is a data record, rather than a header record.
- the field_delimiter character used to separate fields in the CSV data.
To create a schema for CSV files, use the SQL statement CREATE storage_format SCHEMA.
Schemas use a JSON style syntax for field and delimiter specifications, even if the schema is defined for CSV data.
If the CSV files do not have header rows, but you do not want to manually assign specific column names, such as when there are a very large number of columns, you can allow Vantage to assign column names automatically. Create a schema for the CSV files and include the "field_names":[] option in the schema specification, leaving the contents of the square brackets empty. Vantage automatically generates column names of the form csv_fldn .
Use the foreign table Payload column definition to associate the schema with the CSV files at a particular external storage location when you access the files using CREATE FOREIGN TABLE . For example:
CREATE CSV SCHEMA SodaBrands_Schema AS
'{"field_delimiter":"\t",
"field_names":["Id","productName","brand"]}';
CREATE FOREIGN TABLE SodaBrands_T, EXTERNAL SECURITY DEFINER TRUSTED DefAuth_S3 (LOCATION VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC, PAYLOAD DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV WITH SCHEMA SodaBrands_Schema) USING ( LOCATION ('external_file_path') );
A schema can also indicate the field delimiter character used in the CSV file. Use the "field_delimiter" parameter in cases where the delimiter is not the standard comma character, such as for TSV files, which are tab-delimited. If a USING...ROWFORMAT option also specifies a delimiter character, the schema specification takes precedence.