15.10 - Strategies for Balancing Sessions and Instances - Parallel Transporter

Teradata Parallel Transporter User Guide

prodname
Parallel Transporter
vrm_release
15.10
category
User Guide
featnum
B035-2445-035K

Strategies for Balancing Sessions and Instances

Without concrete performance data, no recommendations or guidelines exist for determining the optimum number of sessions or instances, but some strategies exist for finding good balances.

Balancing sessions and instances helps to achieve the best overall job performance without wasting resources.

  • Logging on unnecessary sessions is a waste of resources.
  • Starting more instances than needed is a waste of resources.
  • Strategy 1

    Start with MaxSessions equal to the number of available AMPs or number of sessions you want to allocate for the job, using one instance of each operator (producers and consumer).

    1 Run the job and record how long it took for the job to complete.

    2 Then, increment the number of instances for the consumer operator. Do not change the number of sessions, because changing multiple variables makes it difficult to compare.

    3 Rerun the job and examine the job output:

  • Did the job run faster?
  • Was the second consumer instance used?
  • How many rows were processed by the first vs. the second instance? (For example, were they about equal or was one doing 80% of the work while the other was only doing 20%?) If the work was balanced, another instance might improve performance. If the second instance did not do much work, a third one is likely to waste resources without much performance gain.
  • If the work was unbalanced, another instance might be better. If the second instance did not do much work, a third one would not likely get engaged.

    4 Repeat the process of increasing the number of instances for the consumer operator until you are using as many instances as it needs.

    Now it is time to look at the producers. You can try increasing the number of each producer instance separately to see if it will feed data into the data stream at a higher rate.

    1 Increase the number of instances for each producer operator separately. Again, do not change the number of sessions.

    2 Rerun the job and compare the job output:

  • Did the job run faster? Remember this is the ultimate goal!
  • Was there a change in the number of consumer operator instances used? Because the work always is balanced across the producer instances, you should look at the impact on the consumer instances to see if the change impacted the job.
  • Was there a change in the balance of the consumer operator instances? You want to balance the number of rows being loaded across the number of instances, using as many instances as necessary.
  • Note: Be careful not to trade off instance balance for overall job performance. Just because rows are read evenly across all instances, it does not necessarily mean that the balanced load makes the whole job run faster.

    3 Depending on your results, you may want to increase the number of instances for the producer or consumer operators.

    Now that you know an acceptable number of instances, you can modify the value of MaxSessions to see if there is an impact.

    1 Decrease the value of MaxSessions. It is best to make MaxSessions a multiple of the number of instances for the operator so they are evenly balanced across the instances.

    2 Rerun the job and compare the output:

  • Did the job run faster?
  • Was there a change in the number of consumer operator instances used?
  • Was there a change in the balance of data in the consumer operator instances?
  • 3 Depending on the results, you may want to use the original MaxSessions, or continue experimenting. You may even want to revisit the number of instances you are using.

    Strategy 2

    Start with MaxSessions equal to the number of available AMPs or number of sessions allocated for the job, using four instances of each operator (producers and consumer).

    1 Run the job and examine the output:

  • How long did it take for the job to complete?
  • How many consumer operator instances are being used?
  • How many rows are being processed by each consumer operator instance? We are looking for balance without wasted resources.
  • 2 Make adjustments based on your results.

  • If the job is not using all the consumer operator instances:
  • Decrease the number of instances to eliminate the unused ones.
  • Decrease the number of producer instances by one. Avoid doing anything too drastic, or it will be difficult to determine the optimal number.
  • If the job is using all the consumer operator instances, and the workload is balanced:
  • Try increasing the number of consumer operator instances.
  • If the job is using all the consumer operator instances, but the workload not balanced:
  • Try increasing the number of producer operator instances.
  • 3 Rerun the job and compare the output:

  • Did the job run faster? Remember, this is the ultimate goal!
  • Was there a change in the number of consumer operator instances used?
  • Was there a change in the balance of data in the consumer operator instances?
  • 4 Repeat the process to optimize the number of producer and consumer instances.

    Now that you know the best number of instances, you can modify the number of MaxSessions to see if there is an impact.

    1 Decrease the number of MaxSessions. It is best to make MaxSessions a multiple of the number of instances for the operator so they are evenly balanced across the instances.

    2 Rerun the job and compare the output:

  • Did the job run faster?
  • Was there a change in the number of consumer operator instances used?
  • Was there a change in the balance of data in the consumer operator instances?
  • 3 Depending on the results, you may want to use the original MaxSessions, or continue experimenting. You may even want to re-visit the number of instances you are using.