15.10 - Determining System Resource Usage at the Job Level - Parallel Transporter

Teradata Parallel Transporter User Guide

Parallel Transporter
User Guide

Determining System Resource Usage at the Job Level

Determining the Use of the Number of Instances

Although most operators in Teradata PT can be scaled to use multiple instances for achieving maximum throughput, excessive use of instances can lead to over-parallelism, which can affect performance adversely. Each instance added to a job may introduce more data streams for data transfer, resulting in more shared memory, more semaphores, and additional system processes.

The following methods are recommended to manage the use of the number of instances:

  • Do not create more instances than needed because this will consume system resources. Start with 2 instances and work your way up.
  • You probably only need 1 to 4 instances of any given operator in most loading scenarios. However, scenarios like Directory Scan and LOB/JSON/XML Loading may require more producer instances and consumer instances, respectively.

  • Measure where a bottleneck may be occurring when data is being loaded. Teradata PT can be scaled to eliminate data I/O and load process CPU bottlenecks.
  • Read the TWB_STATUS private log which displays statistics showing how much data was processed by each instance. Job performance is evaluated based on the number of CPU seconds and elapsed time in seconds in the TWB_STATUS log.
  • Reduce the number of instances if you see under utilized instances of operators. Both operator private logs and the TWB_STATUS log provide detailed information at the instance level. See “Using the TWB_STATUS Private Log to Obtain Job Status” below.
  • Determining the Use of Shared Memory

    More instances of operators in a job require more allocated shared memory for data streams. By default, Teradata PT provides 20M of shared memory per job. If you want to employ more producer or consumer instances to boost parallelism and scalability, you need to allocate more shared memory. The tbuild -h option can be used to increase shared memory size. See “tbuild” in the Teradata Parallel Transporter Reference.

    The following formula for calculating the size of shared memory required for a job with multiple producer/consumer instances appears below:

    Note: The formula used in Example 1 and Example 2 is for non-Buffer Mode loading.

    [65000 x (Producer_count x Consumer_count) x 2] bytes + [65000 x (Producer_count + Consumer_count)] bytes 

    Example 1

    Shared memory used by 2 producers and 2 consumers:

    (65000 x 2 x 2 x 2) bytes + (65000 x (2 + 2)) bytes = 780000 bytes

    Example 2

    Shared memory used by 4 producers and 4 consumers:

    (65000 x 4 x 4 x 2) bytes + (65000 x (4 + 4)) bytes = 2600000 bytes

    For Buffer Mode loading “Determining the Size of Shared Memory for Buffer Mode Loading” below. This section provides more information about shared memory allocation.

    For a Directory Scan that reads a very large number of files from a directory, add the following amount of shared memory to account for the size of the checkpoint record the DataConnector operator creates:

    1K + file_count * 580 bytes

    Determining Semaphore Usage per Job

    Semaphores are used for synchronizing processes and access to resources in a parallel execution environment. For example, when a data stream is used to transfer data buffers from the producer instance (one process) to the consumer instance (another process), semaphores are used to synchronize the access to the data stream that the producer and consumer instances share. If more instances are used in a job, more semaphores are needed.

    Use the following formula to calculate the required semaphores for a job with multiple producer and consumer instances is:

    Nprocs = MAX( 25, Consumer_count + Producer_count + 2)
    Semaphores = 2 * (Nprocs + 3) + 5


    Nprocs are the number of job processes, including the processes that the Teradata PT infrastructure uses.

    Determining the Size of Shared Memory for Buffer Mode Loading

    Buffer Mode in Teradata PT is a loading mechanisms for transferring data buffers directly from the producer operator to the consumer operator without using the CPU-intensive row-by-row processing in the Teradata PT infrastructure and, in this way, increasing throughput performance.

    For a producer or consumer job to be eligible for Buffer Mode, the job script cannot contain filtering criteria such as CASE/WHEN or WHERE clauses in the Teradata PT SELECT statement.

    Note: Not all operators support Buffer Mode. Currently, the Export, Select, ODBC, and DataConnector producer operators and the Load and DataConnector consumer operators support Buffer Mode. LOB/JSON/XML importing and exporting are not Buffer-Mode eligible.

    The following are the typical operations that are Buffer-Mode eligible:

  • Exporting rows from a Teradata Database table and:
  • Writing them to files
  • Loading them into another Teradata Database table
  • Extracting rows:
  • From files and loading them into a Teradata Database table
  • From an ODBC source table and loading them into a Teradata Database table
  • Through INMOD and access modules and loading them into a Teradata Database table
  • Buffer Mode also allows blocking of multiple buffers into a single data stream message so as to minimize buffer transfers in data streams. The main challenge with blocked Buffer Mode is determining a blocking factor, that is, the number of buffers in a message. The blocking factor is determined based on the following formula:

    Buffers/Block = (MemoryPercent * TotalSharedMemory) / ((ProducerCount + (QueueDepth * ProducerCount + 1) * ConsumerCount) * BufferSize).

    Where MemoryPercent is the percentage of shared memory to be dedicated to data stream messages and QueueDepth is the maximum number of messages that can be placed on a data stream. The consumer operator sets the BufferSize dynamically.

    Teradata PT provides the default setting of the blocking factor, but it may not be optimal because it only takes the default values for the MemoryPercent (80), TotalSharedMemory (10M), and the QueueDepth (2) when deciding the blocking factor. If you want to use a larger blocking factor to minimize the number of buffers being transferred through data streams, you need to increase the shared memory at the job level using the tbuild -h option.