Setting Up an Object Store for River Flow Data - Analytics Database - Teradata Vantage

SQL Data Manipulation Language

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-10-04
dita:mapPath
pon1628111750298.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
esx1472246586715
lifecycle
latest
Product Category
Teradata Vantageā„¢

Many of the examples use a sample river flow data set. USGS Surface-Water Data Sets are provided courtesy of the U.S. Geological Survey.

To run the examples, you can use the river flow data from Teradata-supplied public buckets. Or you can set up a small object store for the data set.

The following instructions explain how to set up the river flow data on your own external object store.

Your external object store must be configured to allow Analytics Database access.

When you configure external storage, you set the credentials to your external object store. Those credentials are used in SQL commands. The supported credentials for USER and PASSWORD (used in the CREATE AUTHORIZATION command) and for ACCESS_ID and ACCESS_KEY (used by READ_NOS and WRITE_NOS) correspond to the values shown in the following table:

System/Scheme USER/ACCESS_ID PASSWORD/ACCESS_KEY
AWS Access Key ID Access Key Secret
Azure / Shared Key Storage Account Name Storage Account Key
Azure Shared Access Signature (SAS) Storage Account Name Account SAS Token
Google Cloud (S3 interop mode) Access Key ID Access Key Secret
Google Cloud (native) Client Email Private Key
On-premises object storage Access Key ID Access Key Secret
Public access object storage empty_string

Enclose the empty string in single straight quotation marks: USER ''

empty_string

Enclose the empty string in single straight quotation marks: PASSWORD ''

The following are alternatives to using an access key or password to secure S3-compatible external object storage. These are included in an authorization object, which is created by the CREATE AUTHORIZATION command:
  • Amazon Identity and Access Management (IAM)
  • AWS Assume Role used to allow existing AWS IAM users and service accounts temporary access to AWS resources in other accounts. See "Assume Role Authorization" in Teradata Vantageā„¢ - SQL Data Definition Language Syntax and Examples, B035-1144.
The following provides further details for setting up credentials on your external object store:
Platform Notes
Amazon S3 IAM IAM is an alternative to using an access key and password to secure S3 buckets. To allow Analytics Database access to S3 buckets that use IAM, your S3 bucket policy must be configured with the following Actions for the role that allows access to the bucket:
  • S3:GetObject
  • S3:ListBucket
  • S3:GetBucketLocation

For WRITE_NOS:

  • S3:PutObject
Other Actions are also allowed, such as S3:HeadBucket, S3:HeadObject, S3:ListBucket, and so on.
Azure Blob storage and Azure Data Lake Storage Gen2 A user with access key information has full control over the entire account. Alternatively, SAS can be defined on Containers, or on objects within containers, so it provides a more fine-grained authentication approach. NOS uses either type of authentication and does not need to know what type of secret is being supplied.
Only Account SAS tokens are supported. Service SAS tokens generate errors and are rejected.
Google Cloud Storage To allow Analytics Database access, the following permissions are needed:
  • storage.objects.get
  • storage.objects.list

See your cloud vendor documentation for instructions on creating an external object store account.

    The following steps may require the assistance of your public cloud administrator.

  1. Create an external object store on a Teradata-supported external object store platform. Give your external object store a unique name. In the Teradata-supplied examples, the bucket/container is called td-usgs. Because the bucket/container name must be unique, choose a name other than td-usgs.
  2. On Amazon, generate an access ID and matching secret key for your bucket or generate an Identity and Access Management (IAM) user credential. On Azure, generate Account SAS tokens (not Service SAS tokens) for your td-usgs container. On Google Cloud Storage, generate an access ID and matching secret key for your bucket.
  3. Download the sample data from https://downloads.teradata.com/ (look for NOS Download Data) to your client/laptop. The ZIP file contains sample river flow data in CSV, JSON, and Parquet data formats.
  4. Copy the sample data to your bucket or container, being careful to preserve the data directory structure. For example, use a location similar to the following:
    • Amazon S3: /S3/YOUR-BUCKET.s3.amazonaws.com/JSONDATA
    • Azure Blob storage and Azure Data Lake Storage Gen2: /az/YOUR-STORAGE-ACCOUNT.blob.core.windows.net/td-usgs/CSVDATA/
    • Google Cloud Storage: /gs/storage.googleapis.com/YOUR-BUCKET/CSVDATA/
    Note, you can use the Amazon S3 or Azure management consoles or a utility like AWS CLI to copy the data to your external object store. For Google Cloud Storage, you can use the gsutil tool to copy the data to your external object store.
  5. In the example code replace td-usgs, YOUR-BUCKET, and YOUR-STORAGE-ACCOUNT with the location of your object store.
  6. Replace YOUR-ACCESS-KEY-ID and YOUR-SECRET-ACCESS-KEY with the access values for your external object store.