16.20 - Determining System Resource Usage at the Job Level - Parallel Transporter

Teradata® Parallel Transporter User Guide

Product
Parallel Transporter
Release Number
16.20
Published
August 2020
Language
English (United States)
Last Update
2020-08-27
dita:mapPath
uah1527114222342.ditamap
dita:ditavalPath
Audience_PDF_product_tpt_userguide_include.ditaval

Determining the Use of the Number of Instances

Although most operators in Teradata PT can be scaled to use multiple instances for achieving maximum throughput, excessive use of instances can lead to over-parallelism, which can affect performance adversely. Each instance added to a job may introduce more data streams for data transfer, resulting in more shared memory, more semaphores, and additional system processes.

The following methods are recommended to manage the use of the number of instances:
  • Do not create more instances than needed because this will consume system resources. Start with 2 instances and work your way up.

    You probably only need 1 to 4 instances of any given operator in most loading scenarios. However, scenarios like Directory Scan and LOB/JSON/XML Loading may require more producer instances and consumer instances, respectively.

  • Measure where a bottleneck may be occurring when data is being loaded. Teradata PT can be scaled to eliminate data I/O and load process CPU bottlenecks.
  • Read the TWB_STATUS private log which displays statistics showing how much data was processed by each instance. Job performance is evaluated based on the number of CPU seconds and elapsed time in seconds in the TWB_STATUS log.
  • Reduce the number of instances if you see under utilized instances of operators. Both operator private logs and the TWB_STATUS log provide detailed information at the instance level. See "Using the TWB_STATUS Private Log to Obtain Job Status" in Managing and Monitoring Teradata PT Jobs.

Determining the Use of Shared Memory

More instances of operators in a job require more allocated shared memory for data streams. TPT tries its best to determine the optimum amount of shared memory for the job based on all of the information in the script (total number of instances and the size of the schema).

The following formula is used for determining the amount of shared memory for the job:
  • When a DataConnector Producer (e.g., file reader) is not specified in the script:
    TotalSharedMemorySize =  ( ( ProducerCount + ConsumerCount  x  (ProducerCount x QueueDepth  + 1) )  x  MaxRowSizeInSchema  )  x 1.3
  • When a DataConnector Producer (e.g., file reader) is specified in the script:
    TotalSharedMemorySize =  ( ( ( ProducerCount + ConsumerCount  x  (ProducerCount x QueueDepth  + 1) )  x  MaxRowSizeInSchema  ) x 1.3 ) + (3MB x DCProducerInstanceCount)
    where:
    • QueueDepth is currently set to 2.
    • MaxRowSizeInSchema is the maximum size of the schema for the job.

      If MaxRowSizeInSchema is less than 1MB, then we use 1MB as the minimum size for MaxRowSizeInSchema.

Example 1

Shared memory used by 1 Export operator instance and 1 consumer instance:

((1 + 1) * ((1 * 2) + 1)) * 1048576) * 1.3 = ~8,178,893 bytes (~8MB)

Example 2

Shared memory used by 2 Export operator instances and 2 consumer instances:

((2 + 2) * ((2 * 2) + 1)) * 1048576) * 1.3 = ~27,262,976 bytes (~26MB)

Example 3

Shared memory used by 1 DC producer instance and 1 consumer instance:

(((1 + 1) * ((1 * 2) + 1)) * 1048576) * 1.3) + (3MB * 1) = ~11,324,621 bytes (~11MB)

Directory Scan Considerations

The Directory Scan feature requires the DataConnector Operator to store information about every file it processes in a Checkpoint Record at various points in the execution of the job. For a Directory Scan that reads a very large number of files (more than 500) from a directory, sometimes additional shared memory must be allocated. The best practice is to determine the amount of shared memory that is needed, and add that additional shared memory amount to the calculation as determined in “Determining the Use of Shared Memory”.

The following formula is used for determining the amount of shared memory needed for the Checkpoint Record:

((12K * FileCount) + 12K) * 2 bytes

The tbuild -h option can be used to increase shared memory size. See "tbuild" in Teradata® Parallel Transporter Reference, B035-2436.

Example

A directory scan of 1000 files would require:
((12K * 1000) + 12K) * 2 = ~24MB bytes of extra shared memory

Determining Semaphore Usage per Job

Semaphores are used for synchronizing processes and access to resources in a parallel execution environment. For example, when a data stream is used to transfer data buffers from the producer instance (one process) to the consumer instance (another process), semaphores are used to synchronize the access to the data stream that the producer and consumer instances share. If more instances are used in a job, more semaphores are needed.

Use the following formula to calculate the required semaphores for a job with multiple producer and consumer instances is:

Nprocs = MAX( 25, Consumer_count + Producer_count + 2)
Semaphores = 2 * (Nprocs + 3) + 5

where:

Nprocs are the number of job processes, including the processes that the Teradata PT infrastructure uses.