LOCATION Best Practices | Access External Data | Teradata Vantage - LOCATION Key Prefix Best Practices - Analytics Database - Teradata Vantage

SQL Data Manipulation Language

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-04-05
dita:mapPath
pon1628111750298.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
esx1472246586715
lifecycle
latest
Product Category
Teradata Vantage™
  • All files that match to a key prefix must be of the same data format type (csv, json, or Parquet).
  • Related data of different formats must be put into different key prefix location, (for example:
    • Amazon S3: /S3/YOUR-BUCKET.s3.amazonaws.com/csv-table1 and /S3/YOUR-BUCKET.s3.amazonaws.com/json-table-1
    • Azure Blob storage and Azure Data Lake Storage Gen2: /AZ/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/csv-table1 and /AZ/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/json-table-1
    • GCS: /gs/storage.googleapis.com/YOUR-BUCKET/csv-table1 and /gs/storage.googleapis.com/YOUR-BUCKET/json-table-1
  • Files that are part of different logical tables must be located at different key prefix locations, for example:
    • Amazon S3: /S3/YOUR-BUCKET.s3.amazonaws.com/emp-table and /S3/YOUR-BUCKET.s3.amazonaws.com/dept-table
    • Azure Blob storage and Azure Data Lake Storage Gen2: /AZ/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/emp-table and /az/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/dept-table
    • GCS: /gs/storage.googleapis.com/YOUR-BUCKET/emp-table and /gs/storage.googleapis.com/YOUR-BUCKET/dept-table
  • If the CSV files have different fields of data, but do not have individual file headers, you should group the files with the same fields into different key prefix locations.
  • If different kinds of data are included at a single key prefix location, querying that data will be inefficient. For example if you mix department and employee data in a single key prefix location, a query looking for a particular employee would require Vantage to read all the files at that location, including files that contained only department data.

    Best practice is to group files containing the same kind of data into a single external storage key prefix location, and use that key prefix location to create a single foreign table, or in a single READ_NOS query. This applies to all kinds of external data, JSON, CSV, and Parquet.

    The same logic applies to mixing CSV files that have different headers. A key prefix location may include several of these CSV files, but querying the data would be inefficient due to the disparate field patterns.

    For the same reason, do not mix CSV or Parquet files that have different schemas at the same key prefix locations. In these cases, querying the data is not only inefficient, but it also results in many warnings indicating skipped records and files.