Determining System Resource Usage at the Job Level
Determining the Use of the Number of Instances
Although most operators in Teradata PT can be scaled to use multiple instances for achieving maximum throughput, excessive use of instances can lead to over-parallelism, which can affect performance adversely. Each instance added to a job may introduce more data streams for data transfer, resulting in more shared memory, more semaphores, and additional system processes.
The following methods are recommended to manage the use of the number of instances:
You probably only need 1 to 4 instances of any given operator in most loading scenarios. However, scenarios like Directory Scan and LOB/JSON/XML Loading may require more producer instances and consumer instances, respectively.
Determining the Use of Shared Memory
More instances of operators in a job require more allocated shared memory for data streams. By default, Teradata PT provides 20M of shared memory per job. If you want to employ more producer or consumer instances to boost parallelism and scalability, you need to allocate more shared memory. The tbuild -h option can be used to increase shared memory size. See “tbuild” in the Teradata Parallel Transporter Reference.
The following formula for calculating the size of shared memory required for a job with multiple producer/consumer instances appears below:
Note: The formula used in Example 1 and Example 2 is for non-Buffer Mode loading.
[65000 x (Producer_count x Consumer_count) x 2] bytes + [65000 x (Producer_count + Consumer_count)] bytes
Shared memory used by 2 producers and 2 consumers:
(65000 x 2 x 2 x 2) bytes + (65000 x (2 + 2)) bytes = 780000 bytes
Shared memory used by 4 producers and 4 consumers:
(65000 x 4 x 4 x 2) bytes + (65000 x (4 + 4)) bytes = 2600000 bytes
For Buffer Mode loading “Determining the Size of Shared Memory for Buffer Mode Loading” below. This section provides more information about shared memory allocation.
For a Directory Scan that reads a very large number of files from a directory, add the following amount of shared memory to account for the size of the checkpoint record the DataConnector operator creates:
1K + file_count * 580 bytes
Determining Semaphore Usage per Job
Semaphores are used for synchronizing processes and access to resources in a parallel execution environment. For example, when a data stream is used to transfer data buffers from the producer instance (one process) to the consumer instance (another process), semaphores are used to synchronize the access to the data stream that the producer and consumer instances share. If more instances are used in a job, more semaphores are needed.
Use the following formula to calculate the required semaphores for a job with multiple producer and consumer instances is:
Nprocs = MAX( 25, Consumer_count + Producer_count + 2)
Semaphores = 2 * (Nprocs + 3) + 5
Nprocs are the number of job processes, including the processes that the Teradata PT infrastructure uses.
Determining the Size of Shared Memory for Buffer Mode Loading
Buffer Mode in Teradata PT is a loading mechanisms for transferring data buffers directly from the producer operator to the consumer operator without using the CPU-intensive row-by-row processing in the Teradata PT infrastructure and, in this way, increasing throughput performance.
For a producer or consumer job to be eligible for Buffer Mode, the job script cannot contain filtering criteria such as CASE/WHEN or WHERE clauses in the Teradata PT SELECT statement.
Note: Not all operators support Buffer Mode. Currently, the Export, Select, ODBC, and DataConnector producer operators and the Load and DataConnector consumer operators support Buffer Mode. LOB/JSON/XML importing and exporting are not Buffer-Mode eligible.
The following are the typical operations that are Buffer-Mode eligible:
Buffer Mode also allows blocking of multiple buffers into a single data stream message so as to minimize buffer transfers in data streams. The main challenge with blocked Buffer Mode is determining a blocking factor, that is, the number of buffers in a message. The blocking factor is determined based on the following formula:
Buffers/Block = (MemoryPercent * TotalSharedMemory) / ((ProducerCount + (QueueDepth * ProducerCount + 1) * ConsumerCount) * BufferSize).
Where MemoryPercent is the percentage of shared memory to be dedicated to data stream messages and QueueDepth is the maximum number of messages that can be placed on a data stream. The consumer operator sets the BufferSize dynamically.
Teradata PT provides the default setting of the blocking factor, but it may not be optimal because it only takes the default values for the MemoryPercent (80), TotalSharedMemory (10M), and the QueueDepth (2) when deciding the blocking factor. If you want to use a larger blocking factor to minimize the number of buffers being transferred through data streams, you need to increase the shared memory at the job level using the tbuild -h option.