17.10 - Controlling Object Sizes - Access Module

Teradata® Tools and Utilities Access Module Reference

Product
Access Module
Release Number
17.10
Published
October 2021
Language
English (United States)
Last Update
2021-11-02
dita:mapPath
uur1608578381725.ditamap
dita:ditavalPath
obe1474387269547.ditaval

The behavior of S3MaxObjectSize is different depending on the value for S3SinglePartFile.

When S3SinglePartFile=false:

S3MaxObjectSize can be used to cause the Teradata Access Module for S3 to advance to the next Fxxxxxx object name at a predetermined size. The default size is S3BufferSize*10000 (that is, approximately 80 gigabytes if the default S3BufferSize is used).

The value of S3MaxObjectSize can be an integer or an integer followed by, without a space, one of the following multipliers:
  • k (1000)
  • K (1024)
  • m (1000*1000)
  • M (1024*1024)

For example, S3MaxObjectSize=100m would cause the Teradata Access Module for S3 to advance to the next Fxxxxxx object when it got to 100,000,000 bytes. The Teradata Access Module for S3 will always write an integral number of TPT buffers. Hence, the objects can be closed before the specified size is reached. In addition, the Teradata Access Module for S3 must write at least one buffer to each object. Hence, even the small values of S3MaxObjectSize may result in objects in the 1-8 Megabyte range.

When SinglePartFile=true:

This scenario will not use Fxxxxx objects; instead, all S3 objects will be derived from the base name of the object as specified by the S3Object parameter.

S3 objects will be created as individual objects (without Fxxxxxx parts), where each object does not exceed the size as specified in S3MaxObjectSize parameter. A unique, sequential number is appended (with preceding hyphen) to the end of base name of the S3Object name.

For example, if S3Object=myweeklydata, then the objects generated will be named myweeklydata-001, myweeklydata-002, myweeklydata-003, ..., myweeklydata-nnn.

If there is a file extension, as in S3Object=myweeklydata.gz, then the sequential number is placed before the extension. So, the object names generated would be myweeklydata-001.gz, myweeklydata-002.gz, myweeklydata-003.gz, ..., myweeklydata-nnn.gz.

If there is more than one file extension, as in S3Object=myweeklydata.csv.gz, then the sequential number is placed before the first extension. So, the object names generated would be myweeklydata-001.csv.gz, myweeklydata-002.csv.gz, myweeklydata-003.csv.gz, ..., myweeklydata-nnn.csv.gz.

This feature makes it easier to read back the objects using wildcards.

If multiple instances of the DataConnector operator are specified for the write scenario, then the same numbering scheme applies as above, except the object name numbering begins with -001, -002, and so on. depending on the number of instances, and then each instances increments its sequential number by the total number of instances.

Thus, the object names would be as follows, assuming that three instances of the DataConnector operator are being used:
  • Instance 1: myweeklydata-001, myweeklydata-004, myweeklydata-007, ...
  • Instance 2: myweeklydata-002, myweeklydata-005, myweeklydata-008, ...
  • Instance 3: myweeklydata-003, myweeklydata-006, myweeklydata-009, ...

As different instances can run at different speeds, it is possible that there could be gaps in the sequential numbers, as a few instances processing more data than others.