- All files that match to a key prefix must be of the same data format type (csv, json, or Parquet).
- Related data of different formats must be put into different key prefix location, (for example:
- Amazon S3: /S3/YOUR-BUCKET.s3.amazonaws.com/csv-table1 and /S3/YOUR-BUCKET.s3.amazonaws.com/json-table-1
- Azure Blob storage and Azure Data Lake Storage Gen2: /AZ/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/csv-table1 and /AZ/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/json-table-1
- GCS: /gs/storage.googleapis.com/YOUR-BUCKET/csv-table1 and /gs/storage.googleapis.com/YOUR-BUCKET/json-table-1
- Files that are part of different logical tables must be located at different key prefix locations, for example:
- Amazon S3: /S3/YOUR-BUCKET.s3.amazonaws.com/emp-table and /S3/YOUR-BUCKET.s3.amazonaws.com/dept-table
- Azure Blob storage and Azure Data Lake Storage Gen2: /AZ/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/emp-table and /az/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/dept-table
- GCS: /gs/storage.googleapis.com/YOUR-BUCKET/emp-table and /gs/storage.googleapis.com/YOUR-BUCKET/dept-table
- If the CSV files have different fields of data, but do not have individual file headers, you should group the files with the same fields into different key prefix locations.
- If different kinds of data are included at a single key prefix location, querying that data will be inefficient. For example if you mix department and employee data in a single key prefix location, a query looking for a particular employee would require Vantage to read all the files at that location, including files that contained only department data.
Best practice is to group files containing the same kind of data into a single external storage key prefix location, and use that key prefix location to create a single foreign table, or in a single READ_NOS query. This applies to all kinds of external data, JSON, CSV, and Parquet.
The same logic applies to mixing CSV files that have different headers. A key prefix location may include several of these CSV files, but querying the data would be inefficient due to the disparate field patterns.
For the same reason, do not mix CSV or Parquet files that have different schemas at the same key prefix locations. In these cases, querying the data is not only inefficient, but it also results in many warnings indicating skipped records and files.