15.10 - Parallel Reading of a Single Data Source - Parallel Transporter

Teradata Parallel Transporter User Guide

prodname
Parallel Transporter
vrm_release
15.10
category
User Guide
featnum
B035-2445-035K

Today, one of the most common loading scenarios is to load data from a single data source into the data warehouse. This single data source includes a file, an ODBC table, a message queue (e.g., MSMQ), data from a named pipe, and so on.

Teradata PT, due to its parallel and scalable architecture, makes it possible to both utilize additional CPU power for loading data and allow multiple data streams to be used to parallelize the data extraction process. This architecture provides a foundation for producer operators such as the Data Connector and the ODBC operators to read a single data source in a parallel and scalable manner.

You can specify multiple instances of the Data Connector to read a file by setting the “MultipleReaders” attribute to “Yes”. Prior to Teradata PT 15.0, the support of “MultipleReaders” requires each of the instances to read the same file and process a subset of rows in the file. Although this approach achieves scalable performance by reading data in parallel, it incurs excessive I/O and CPU cycles, and has the potential to cause I/O delays if the instances are not synchronized with each other (e.g., some instances run ahead of the other instances). When this “out of synch” issue occurs, the reading of data blocks may not be fulfilled from the file cache of the I/O system, thus causing I/O overhead due to multiple reads of the same data block by more than one instance.

With Teradata PT 15.0, an enhancement was made to the Data Connector operator so that the master instance is the only instance that reads a file and distributes data rows to the slave instances. As a consequence, there is only one read per data block and each block of data can be processed by multiple slave instances without causing excessive I/O and disk contention issues.

Once the master instance finishes filling a buffer with data rows, it sends the buffer to the slave instances. While the slave instances are processing the data from the buffer in parallel, the master instance begins filling another buffer. When all instances finish reading or processing the data buffers, the master and slave instances would swap buffers, and the same process repeats until all rows are read from the data source.

To maximize throughput performance, however, you need to tweak the following parameters to fit the need of a Teradata PT job:

  • IOBufferSize attribute
  • RecordsPerBuffer attribute
  • Number of instance
  • Size of shared memory
  • The IOBufferSize attribute is for users to define the size of the I/O buffer that is used for storing data rows read from a file. If more rows are read into a buffer, then lesser I/O requests are required, which would result in better throughput performance. However, there is a limit for the buffer size to be set on different operating system environments. For more information about the use of this attribute, see the DataConnector Operator chapter in the Teradata Parallel Transporter Reference.

    The RecordsPerBuffer attribute is for users to define the number of records that the master instance can store in a data buffer for each slave instance. Unlike the I/O buffer mentioned earlier, this data buffer is allocated from shared memory so that it can be shared among instances. During the data acquisition phase, the Data Connector operator allocates three shared memory buffers of equal size based on the row size and the RecordPerInstance value. The master instance uses these buffers to send rows to the slave instances in a parallel fashion. Due to the use of these shared memory buffers, a job that uses the “MultipleReaders” feature requires at least (6 x IOBufferSize) bytes of shared memory by default. This is in addition to the shared memory required for a TPT job not using the “MultipleReaders” feature. Hence, the tbuild -h runtime parameter should be utilized to increase the amount of shared memory reserved for the job, if memory allocation error occurs. For information about the use of the tbuild -h option, see the Teradata PT Utility Commands chapter in Teradata Parallel Transporter Reference.

    To obtain an optimal value for RecordsPerBuffer, you need to tweak it based on the row size, the number of instances, and the size of the shared memory. If the value is set too low, performance will degrade due to excessive swapping of buffers between the master and slave instances. If the value is set too high, it could exceed the maximum limit of shared memory for a job. For more information about the use of this attribute, see the DataConnector Operator chapter in the Teradata Parallel Transporter Reference.

    To increase scalability and parallelism, you can adjust the number of instances of an operator in the APPLY-SELECT statement. While the higher number of instances usually results in better throughput performance, it could degrade performance if more instances than needed are used because they consume system resources such as CPU, semaphores, and shared memory. For strategies on choosing the optimal number of instances for an operator, see “Determining System Resource Usage at the Job Level” in the Best Practices chapter of this document.

    Similar to the Data Connector operator, the ODBC operator also allows users to extract data from a single ODBC table with multiple instances. The only difference between the Data Connector operator and the ODBC operator is that the former reads a single file, whereas the latter reads a single ODBC table. Both operators use shared memory buffers parallel processing to maximize the sequential read rate of a data source. While the throughput performance of the Data Connector is limited by the I/O efficiency of a file system, the throughput performance of the ODBC operator depends on the maximum number of rows that can be extracted from an ODBC table in a single ODBC “fetch” operation.

    To facilitate the “fetch” efficiency, the ODBC operator provides the DataBlockSize attribute for users to set an optimal buffer size so as to allow the maximum number of rows to be extracted from a table in a single ODBC “fetch” operation. The DataBlockSize attribute functions exactly the same as the RecordsPerBuffer in the Data Connector operator. The ODBC operator uses the DataBlockSize value to allocate shared memory buffers for the master instance to send rows to the slave instances in a parallel fashion.

    If the DataBlockSize value is set too low, it will incur excessive swapping of buffers between the master and the slave instances. If the DataBlockSize value is set too high, it could exceed the maximum limit of shared memory for a job. By default, the ODBC allocates two megabytes for each shared memory buffer. For more information about the use of this attribute, see the ODBC Operator chapter in the Teradata Parallel Transporter Reference.