If the goal is to take advantage of TPT’s parallelism and limit the size of each object as it is written to GCS, the same naming convention will be used as described in Multiple Instances. The objects are processed as follows:
Assume you have 50 MB of data and would like to write the objects to GCS with each object not exceeding 10 MB each, and you have specified 2 instances of the DataConnector operator (using Object=my_load_job name):
- Instance 1 will create: my_load_job-001, my_load_job-003, my_load_job-005
- Instance 2 will create: my_load_job-002, my_load_job-004