Parallel Reading of a Single Data Source - Parallel Transporter

Teradata® Parallel Transporter User Guide

Product
Parallel Transporter
Release Number
17.00
Published
August 31, 2020
Language
English (United States)
Last Update
2020-08-27
dita:mapPath
zae1544831938751.ditamap
dita:ditavalPath
tvt1507315030722.ditaval
dita:id
B035-2445
lifecycle
previous
Product Category
Teradata Tools and Utilities

Today, one of the most common loading scenarios is to load data from a single data source into the data warehouse. This single data source includes a file, an ODBC table, a message queue (e.g., MSMQ), data from a named pipe, and so on.

Teradata PT, due to its parallel and scalable architecture, makes it possible to both utilize additional CPU power for loading data and allow multiple data streams to be used to parallelize the data extraction process. This architecture provides a foundation for producer operators such as the Data Connector and the ODBC operators to read a single data source in a parallel and scalable manner.

You can specify multiple instances of the Data Connector to read a file by setting the “MultipleReaders” attribute to “Yes”. Prior to Teradata PT 15.0, the support of “MultipleReaders” requires each of the instances to read the same file and process a subset of rows in the file. Although this approach achieves scalable performance by reading data in parallel, it incurs excessive I/O and CPU cycles, and has the potential to cause I/O delays if the instances are not synchronized with each other (e.g., some instances run ahead of the other instances). When this “out of synch” issue occurs, the reading of data blocks may not be fulfilled from the file cache of the I/O system, thus causing I/O overhead due to multiple reads of the same data block by more than one instance.

With Teradata PT 15.0, an enhancement was made to the Data Connector operator so that the main instance is the only instance that reads a file and distributes data rows to the worker instances. As a consequence, there is only one read per data block and each block of data can be processed by multiple worker instances without causing excessive I/O and disk contention issues.

Once the main instance finishes filling a buffer with data rows, it sends the buffer to the worker instances. While the worker instances are processing the data from the buffer in parallel, the main instance begins filling another buffer. When all instances finish reading or processing the data buffers, the main and worker instances would swap buffers, and the same process repeats until all rows are read from the data source.

To maximize throughput performance, however, you need to tweak the following parameters to fit the need of a Teradata PT job:
  • RecordsPerBuffer attribute
  • Number of instances
  • Size of shared memory

The RecordsPerBuffer attribute is for users to define the number of records that the main instance can store in a data buffer for each worker instance. This data buffer is allocated from shared memory so that it can be shared among instances. During the data acquisition phase, the Data Connector operator allocates three shared memory buffers of equal size based on the row size and the RecordPerInstance value. The main instance uses these buffers to send rows to the worker instances in a parallel fashion. Due to the use of these shared memory buffers, a job that uses the “MultipleReaders” feature requires at least (6 x data buffer size) bytes of shared memory by default. This is in addition to the shared memory required for a TPT job not using the “MultipleReaders” feature. Hence, the tbuild -h runtime parameter should be utilized to increase the amount of shared memory reserved for the job, if memory allocation error occurs. For information about the use of the tbuild -h option, see "Teradata PT Utility Commands" in the Teradata® Parallel Transporter Reference, B035-2436.

To obtain an optimal value for RecordsPerBuffer, you need to tweak it based on the row size, the number of instances, and the size of the shared memory. If the value is set too low, performance will degrade due to excessive swapping of buffers between the main and worker instances. If the value is set too high, it could exceed the maximum limit of shared memory for a job. For more information about the use of this attribute, see "DataConnector Operator" in the Teradata® Parallel Transporter Reference, B035-2436.

To increase scalability and parallelism, you can adjust the number of instances of an operator in the APPLY-SELECT statement. While the higher number of instances usually results in better throughput performance, it could degrade performance if more instances than needed are used because they consume system resources such as CPU, semaphores, and shared memory. For strategies on choosing the optimal number of instances for an operator, see Determining System Resource Usage at the Job Level in Best Practices.

Similar to the Data Connector operator, the ODBC operator also allows users to extract data from a single ODBC table with multiple instances. The only difference between the Data Connector operator and the ODBC operator is that the former reads a single file, whereas the latter reads a single ODBC table. Both operators use shared memory buffers parallel processing to maximize the sequential read rate of a data source. While the throughput performance of the Data Connector is limited by the I/O efficiency of a file system, the throughput performance of the ODBC operator depends on the maximum number of rows that can be extracted from an ODBC table in a single ODBC “fetch” operation.

To facilitate the “fetch” efficiency, the ODBC operator provides the DataBlockSize attribute for users to set an optimal buffer size so as to allow the maximum number of rows to be extracted from a table in a single ODBC “fetch” operation. The DataBlockSize attribute functions exactly the same as the RecordsPerBuffer in the Data Connector operator. The ODBC operator uses the DataBlockSize value to allocate shared memory buffers for the main instance to send rows to the worker instances in a parallel fashion.

If the DataBlockSize value is set too low, it will incur excessive swapping of buffers between the main and the worker instances. If the DataBlockSize value is set too high, it could exceed the maximum limit of shared memory for a job. By default, the ODBC allocates two megabytes for each shared memory buffer. For more information about the use of this attribute, see "ODBC Operator" in the Teradata® Parallel Transporter Reference, B035-2436.