Specifying Instances
You can specify the number of instances for an operator in the APPLY TO or SELECT
FROM statement in which it is referenced, using the form (operator_name [number of
instances]), as shown in the following example:
APPLY <DML>...TO OPERATOR (UPDATE_OPERATOR [2]...)
In attempting to determine the right number of instances for your job, note that producer
operators tend to use all of the instances specified in the script, while consumers
often use fewer instances than the number specified. This difference results from
the fact that consumers and producers use instances differently:
Producers automatically balance the load across all instances, pumping data into the
data stream as fast as they can.
By default, consumers will use only as many instances as needed. If one instance can
read and process the data in the data stream as quickly as the producers can write
it, then the other instances are not used. If the first instance cannot keep up with
the producer operators then the second instance is engaged, and so on.
The -C command line option overrides the default behavior by informing producer operators
and their underlying data streams to ship data blocks to target consumer operators
in a cyclical, round-robin manner, providing a more even distribution of data to consumer
operators.
Consider the following when specifying operator instances:
If the number of instances is not specified, the default is 1 instance per operator.
Experiment. Start by specifying only one or two instances for any given operator.
Teradata PT will start as many instances as specified, but it uses only as many as
needed.
Don't create more instances than needed--instances consume system resources.
Read the Teradata PT log file, which displays statistics showing how much data was
processed by each instance. Reduce the number of instances if you see under utilized
instances of any operators. If all instances are used add more and see if the job
runs better.
If the number of instances exceeds the number of available sessions, the job aborts.
Therefore, when specifying multiple instances make sure the MaxSessions attribute
is set to a high enough value that there is at least one session per instance.
After the job runs, use the evaluation criteria shown in “Strategies for Balancing Sessions and Instances” on page 85 to help adjust and optimize the number of operator instances.