Initialization String - Access Module

Teradata® Tools and Utilities Access Module Reference

Product
Access Module
Release Number
17.00
Published
November 30, 2020
Language
English (United States)
Last Update
2020-11-18
dita:mapPath
sch1544831938749.ditamap
dita:ditavalPath
obe1474387269547.ditaval
dita:id
B035-2425
lifecycle
previous
Product Category
Teradata Tools and Utilities

The Teradata Access Module for S3 obtains most of its operating parameter from the initialization string. The initialization string consists of a series of keyword and value pairs separated by blanks. The value of a keyword can be an integer or string.



where:

Syntax Element Description
S3AccessID (Required) This parameter is effectively the S3 user id. The value of S3AccessID can be specified in the initialization string parameter S3AccessID, the environment variable AWS_ACCESS_KEY_ID, or the credentials file. If specified by the initialization string parameter S3AccessID, the value of the environment variable AWS_ACCESS_KEY_ID and the contents of the credentials file will not be considered. If set by the environment variable AWS_ACCESS_KEY_ID, the contents of the credentials file will not be considered.

Consider your security policy when deciding if you want to store this value in your TPT script or Job Variable (JV) file. Teradata Wallet lookups are supported for this variable.

S3AccessKey (Required) This parameter is effectively the S3 password. The value of S3AccessKey can be specified in initialization string parameter S3AccessKey, the environment variable AWS_SECRET_ACCESS_KEY, or the credentials file. If specified by the initialization string parameter S3AccessKey, the value of the environment variable AWS_SECRET_ACCESS_KEY and the contents of the credentials file will not be considered. If specified by the environment variable AWS_ACCESS_KEY_ID, the contents of the credentials file will not be considered.

Consider your security policy when deciding if you want to store this value in your TPT script or Job Variable (JV) file. Teradata Wallet lookups are supported for this variable.

S3AltURI (Optional) Use an S3-protocol compatible storage service not located at Amazon. The general format is https://<ip addr or domain name>:<port number>. The parameter must include be the IP address of the S3-protocol compatible storage service or a Domain Name that will resolve to that address. The colon and port number are optional. If the port number is not specified, port 443 will be used for HTTPS traffic and port 80 will be used for HTTP traffic. For the brevity, it is acceptable to leave off the "https://" portion when HTTPS traffic is desired.

The S3Region parameter is still required as it still has meaning to non-Amazon storage servers.

S3Bucket (Required) Specifies the S3 bucket to be used for load and export operations.

This parameter must be listed in the access module parameters initialization string.

S3BufferCount (Optional) Specifies the number of buffers to be used with the TCP connections specified by the S3ConnectionCount.

Twice the number of connections is the minimum recommended value. If not specified, the value of (2*S3ConnectionCount) is used as default.

S3BufferSize (Optional) Specifies the size of the buffers to be used for the TCP connections. The default is 8MB (8388608 bytes). For export operations the buffersize must be at least 5 megabytes (5MB). For convenience the following multipliers can be used: k (1000), K (1024), m (1000 * 1000), and M (1024*1024). For example, the default could be specified as 8M or 8192K.
S3ConfigDir (Optional) The default location for the config and credentials files is

$HOME/.aws but may be overridden by this parameter.

S3ConnectionCount (Optional) Specifies the number of TCP connections to the S3 service.

If not specified, the default of 10 is used.

S3DontSplitRows (Optional) This parameter only applies to export jobs (i.e., writing to S3 or S3-compatible storage. If S3SinglePartFile=False (which is the default), then when an object becomes “full”, the next object is created and the stream of data continues to be written to the next object. These objects are named <your path>/F000000, <your path>/F000001, etc., as described elsewhere in this document. The split is made on the byte boundary defined by “object full.” When objects are read back in with this access module, the pieces are concatenated and processed as if they were a single object.

T read the objects back in with another application, set S3DontSplitRows=True so that each Fxxxxxx object ends on a database row boundary. In addition, if compression is enabled (i.e., your pathname ends in “.gz”) a new gzip context is generated for each Fxxxxxx object. This makes each object independent of each other; they can be read in any order, or they can be read simultaneously.

S3HttpsProxy (Optional) Specifies the IP address (or domain name) and port number of an HTTPS Proxy that does not require authentication. The format is https://w.x.y.z:m where w.x.y.zis the IP address of the proxy server and m is the port number of the proxy service. Additional details on this parameter are discussed in the HTTPS Proxy Support section earlier in this document.
S3KmsKeyId (Optional) When writing to an S3 object, if a KMS managed-key is to be used for server-side encryption, set the following:
  • S3Sse=KMS
  • S3KmsKeyId=desired KMS key ID

A KMS key can be managed from the IAM section of the AWS Web Console.

When reading, if the user has permission to use the required key, it will be used automatically. This field is ignored on read jobs.

S3LogAPI (Optional) Creates a debug trace file with additional information for developer use. Because the trace file will contain the user's credentials, the S3LogAPI should not be used except at the express direction of Teradata.
S3MaxObjectSize (Optional) This parameter applies only when writing to S3. When S3DontSplitRows=True, S3MaxObjectSize can be used to cause the Teradata Access Module for S3 to advance to the next Fxxxxxx object name at a predetermined size. The default is quite large: S3BufferSize*10000 (i.e., approximately 80 gigabytes if the default S3BufferSize is used).
The value of S3MaxObjectSize can be an integer or an integer followed by, without a space, one of the following multipliers:
  • k (1000)
  • K (1024)
  • m (1000*1000)
  • M (1024*1024)
For example, S3MaxObjectSize=100m would cause the Teradata Access Module for S3 to advance to the next Fxxxxxx object when it got to 100,000,000 bytes. The Teradata Access Module for S3 will always write an integral number of TPT buffers, so objects may be closed somewhat before the specified size is reached. In addition, the Teradata Access Module for S3 must write at least one buffer to each object, so very small values of S3MaxObjectSize may result in objects in the 1-8 Megabyte range.
S3Object (Required) The name of the object to be created in the S3 bucket. The pathname is formed by concatenating S3Prefix and S3Object.

In order to support large result sets, the preferred mode of operation is to create a series of objects based on the pathname specified by appending a suffix generated by this access module. For details, see the description of S3SinglePartFile, below.

S3Prefix (Optional) This string, if present, is prepended to the S3Object string to create the pathname within the bucket. As a convenience in writing export or load scripts, this parameter could be used for the portion of the pathname that does not change when a series of tables are exported or loaded.
Within an S3 bucket, there is no need to explicitly create directories. The S3 CLI and GUI will segregate the listing of objects based on the presence of “/”, but this is a display convenience only. You do not need to create directories to use a “/” in a pathname, and the presence or location of one or more “/” in your pathname can come from either or both the S3Region or S3Object strings. In particular, the S3Prefix, if used, does not need to contain or end in a “ / ”.
S3Profile The config and credentials files are separated into sections per AWS documentation. Each section is labeled with a bracketed heading. The S3Profile variable selects a section from each of these two files. There is always a section called [default]. This section will be used if no S3Profile is specified or if the value default is explicitly listed as default: S3Profile=default.
S3Region (Required) Specifies the S3 region where the S3 objects are stored. If the S3 region name is us-west-2, then you need to specify S3Region=us-west-2 in the initialization string or alternatively you can specify region = us-west-2 in the config file. This required parameter will be obtained from the config file unless overridden by being specified explicitly in the access module initialization string.
The original AWS S3 region on the US East Coast is often referred to as “US Standard”. This should be specified as us-east-1.
S3Role (Optional) This AWS-specific feature specifies the credentials to be used for connecting to S3. The AWS IAM role specified by this parameter must be an IAM role assigned to the AWS EC2 instance that the job is running on. When S3Role is specified, the S3AccessID and S3AccessKey are not required and are ignored if specified. The role specified must include these permissions:
Resource *:
     S3:ListAllMyBuckets, S3:GetBucketLocation
Resource <bucket> and <bucket>/*:
     S3:ListBucket, S3:GetObject, S3:PutObject,
     S3:DeleteObject
S3SecurityToken (Optional) In support of AWS Multi-Factor Authentication, this is the “session-token” obtained from running the AWS CLI command:

aws sts get-session-token --serial-number arn-of-the-mfa-device --token-code code-from-token

S3SinglePartFile (Optional) Determines whether the S3 object being read or written is a single file with the name specified by S3Object or a set of numbered files in an “apparent” directory named by the S3Object. As discussed above, directories do not really exist, but the GUI and CLI segregate files based on “/” making it appear that there is a directory.

When an application creates an object on S3, the object can be created from up to 10,000 “pieces”. The size and SHA hash of the “piece” must be presented when the connection is opened. Since it is not desirable to buffer the file on disk, the size of a “piece” is limited to the size of the buffer. The minimum piece size/buffer size is 5 MB. By default 8 MB is used as a good trade-off between memory usage and performance. When the pieces are to be consolidated into an S3 object (a file) the command is limited to 10,000 pieces. That effectively limits the size of a single S3 object (file) that is created to 80 GB unless a larger buffersize is used.

The solution is to provide a mechanism to automatically treat a list of objects (files) with related pathnames as a single file for the export or load operation. The Teradata Access Module for S3 appends a “/”, followed by an “F”, followed by a sequence number. So there is an “apparent” directory created with the name of the requested S3 object. The files in that “apparent” directory are named F000001, F000002, F000003, etc. This solves the problem of being able to handle an arbitrarily large result set from the database. The suffixes of the “parts” are predefined, and cannot be overridden.

If the output exceeds 10,000*buffersize, the S3SinglePartFile parameter must be set to false so a set of numbered files can be created. The default value for S3SinglePartFile is "false".

The S3SinglePartFile parameter may be set to "true" in the following scenarios:
  • For tests using small datasets
  • For production where the customer is absolutely certain the result set to be stored is smaller that 10,000*buffersize

The S3SinglePartFile must be set to true when reading a single S3 object (file) of arbitrary size that isn’t split up into parts (F000001, F000002, etc.).

S3SSe (Optional) The data of S3 objects residing on S3 is unencrypted unless either the bucket policy specifies what encryption will be used or the application writing to S3 requests server-side encryption. An encryption key is associated with the user’s AWS account. That key (sometimes known as the “S3 key”) can be requested by specifying “True” or “S3” for the value of S3Sse. It is also possible to use specific keys managed by the AWS Key Management Service.
To use the S3-Managed key shared by all users of of a given AWS account, set this parameter:
  • S3SSe=True or S3Ss3=S3 (equivalent)
To use a KMS-managed key: set these parameters:
  • S3Sse=KMS
  • S3KmsKeyId=desired KMSKeyID

Values specified for S3Sse and S3KmsKeyId are ignored when reading S3 objects. If the user has access to the required key, read jobs will succeed whether these parameters are specified or not.

S3TransportMode Allows unencrypted traffic from the access module to S3 or an S3-compatible storage server. This feature is enabled by setting S3TransportMode to UnEncrypted (S3TransportMode=UnEncrypted). This parameter is case-sensitive.
A warning that unencrypted traffic was used is placed in the log.