Compression and Decompression - Access Module

Teradata® Tools and Utilities Access Module Reference

Product
Access Module
Release Number
16.20
Published
November 2020
Language
English (United States)
Last Update
2020-11-18
dita:mapPath
igy1527114222333.ditamap
dita:ditavalPath
igy1527114222333.ditaval
dita:id
B035-2425
lifecycle
previous
Product Category
Teradata Tools and Utilities

GZIP data compression is supported for both import and export.

For export, if the object name ends in ".gz", the generated object(s) will be compressed. In the "S3SinglePartFile=True" case, a single compressed object will be created with the name specified ending in .gz. In the "S3SinglePartFile=False" case, the apparent directory holding the F000000, F000001... files will have a name ending in .gz. The individual objects in the "apparent" directory will not have a .gz suffix. The objects are compressed even though they don't end in .gz, because the object specified DID end in .gz. They must be concatenated to be uncompressed. The connector does this automatically on a load. They cannot be individually uncompressed. If manually downloaded with "aws s3 cp", the pieces retried must be concatenated and the resulting file named .gz.

For import, if the object name ends in .gz it will be decompressed. When S3SinglePartFile=False if the object name specified ends in .gz, all the Fxxxxxx files will be concatenated and uncompressed as if they were a single object, even though the Fxxxxx files don't end in .gz as discussed in the export description above. When S3SinglePartFile=True and a wildcard specification is not used, if the object name ends in .gz it will be decompressed as it is read When S3SinglePartFile=True and a wildcard specification is used, the individual matches are inspected and decompressed, or not, depending on the presence or absence of a .gz suffix.

Each object that needs decompression is individually decompressed. The objects are concatenated AFTER the optional decompress operation and delivered to Teradata Parallel Transporter. Although it would be odd for this to happen, it is allowed to have a mixture of compressed and uncompressed objects. This concatenation of the results is a streaming operation and is not memory limited. The data is not landed on disk.This method will support the compressed file format of some other cloud databases. For instance, a list of files ending in .gz exported for a single RedShift export can be read this way. Checkpoint/Restart is implemented for compressed object but the seek phase of the restart is implemented by reading and uncompressing the object until the correct location is found. This will probably still be faster than restarting the job from the beginning.