LOCATION Best Practices | Access External Data | VantageCloud Lake - LOCATION Key Prefix Best Practices - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
  • All files that match to a key prefix must be of the same data format type (csv, json, or Parquet).
  • Related data of different formats must be put into different key prefix location. For example:
    Platform Key Prefix Location
    Amazon S3 CSV
    • DNS style: /S3/YOUR-BUCKET.s3.amazonaws.com/csv-table1
    • Path style: /S3/s3.amazonaws.com/YOUR-BUCKET/csv-table-1
    • Simple style: s3://YOUR-BUCKET/csv-table1
    JSON
    • DNS style: /S3/YOUR-BUCKET.s3.amazonaws.com/json-table-1
    • Path style: /S3/s3.amazonaws.com/YOUR-BUCKET/json-table-1
    • Simple style: s3://YOUR-BUCKET/json-table1
    Parquet
    • DNS style: /S3/YOUR-BUCKET.s3.amazonaws.com/parquet-table-1
    • Path style: /S3/s3.amazonaws.com/YOUR-BUCKET/parquet-table-1
    • Simple style: s3://YOUR-BUCKET/parquet-table1
    Azure Blob Storage and Azure Data Lake Storage Gen2 CSV
    • Original style: /AZ/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/csv-table1
    • Simple style: AZ://YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/csv-table-1
    JSON
    • Original style: /AZ/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/json-table1
    • Simple style: AZ://YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/json-table-1
    Parquet
    • Original style: /AZ/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/parquet-table1
    • Simple style: AZ://YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/parquet-table-1
    Google Cloud Storage CSV
    • Original style: /gs/storage.googleapis.com/YOUR-BUCKET/csv-table1
    • Simple style: gs://YOUR-BUCKET/csv-table-1
    JSON
    • Original style: /gs/storage.googleapis.com/YOUR-BUCKET/json-table1
    • Simple style: gs://YOUR-BUCKET/json-table-1
    Parquet
    • Original style: /gs/storage.googleapis.com/YOUR-BUCKET/parquet-table1
    • Simple style: gs://YOUR-BUCKET/parquet-table-1
  • Files that are part of different logical tables must be located at different key prefix locations. For example:
    Platform Key Prefix Location
    Amazon S3
    • /S3/YOUR-BUCKET.s3.amazonaws.com/emp-table
    • /S3/YOUR-BUCKET.s3.amazonaws.com/dept-table
    Azure Blob Storage and Azure Data Lake Storage Gen2
    • /AZ/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/emp-table
    • /az/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/YOUR-CONTAINER/dept-table
    Google Cloud Storage
    • /gs/storage.googleapis.com/YOUR-BUCKET/emp-table
    • /gs/storage.googleapis.com/YOUR-BUCKET/dept-table
  • DNS style spreads out the load on Amazon's front end servers. With path style, all connections go to a generic s3.amazonaws.com server, which can get heavily loaded.
    • Example path style: /s3/s3.amazonaws.com/bucketname/datapath
    • Example DNS style: /s3/bucketname.s3.amazonaws.com/datapath
    Amazon plans to deprecate support for the path-style endpoint format.
  • Simplified paths:
    Platform Key Prefix Location
    Amazon AWS s3://bucket/folder

    Example: LOCATION='s3://td-usgs-public/CSVDATA/'

    Azure Blob Storage and Azure Data Lake Storage Gen2 az://storage-account.blob.core.windows.net/container/folder

    Example: LOCATION='az://noscong2dir01.blob.core.windows.net/nos-connector-dbs-test/CSVDATA'

    Google Cloud Storage gs://bucket/objectName

    Example: LOCATION='gs://nos-connector-dbs-test/CSVDATA/'

    By default, the dbscontrol flag AllowToForceS3pathstyle supports the DNS style format for S3 endpoints. To change the setting to enable path style S3 endpoints, contact Teradata Support. Path style S3 formats may be required for "S3-like" endpoints (NOT AWS S3 endpoints).

  • Wildcard DNS entries such as *.s3.amazonaws.com can cause increased security risk and your IT department may prohibit them.
  • If the CSV files have different fields of data, but do not have individual file headers, group the files with the same fields into different key prefix locations.
  • If different kinds of data are included at a single key prefix location, querying that data is inefficient. For example, if you mix department and employee data in a single key prefix location, a query looking for a specific employee needs Vantage to read all files at that location, including files that contain only department data.

    Best practice for external data: Group files with similar data into an external object storage key prefix location. Use the key prefix location to create a single foreign table, or in a single READ_NOS query.

    Do not mix CSV files that have different headers. A key prefix location may include such files, but disparate field patterns makes querying the data inefficient.

    Do not mix CSV or Parquet files that have different schemas at the same key prefix locations. Querying the data is inefficient and causes warnings indicating skipped records and files.