A flow loads input data files in microbatches. You can optimize flow performance by tuning microbatches, using these options:
Option | Description |
---|---|
Checkpoint Files | Determines number of files per microbatch. Larger setting (such as 200) results in larger microbatches. May improve flow performance when load jobs consist of many smaller files. Smaller setting (such as 5) results in smaller microbatches. May improve flow performance when load jobs consist of a few large files. |
Checkpoint Size | Determines maximum size of each microbatch, in mebibytes (MiB), gigibytes (GiB), or tebibytes (TiB). Specify number and unit (for example, 10 MiB). |
If you use both options, Flow runs the microbatch based on the value reached first, number of files per microbatch or maximum microbatch size.
Factors that affect flow performance include the following:
- File format
- File size
- Row size
- Data complexity
- Schema complexity
Test different Checkpoint Files and Checkpoint Size settings until you find the best combination.
Specify Checkpoint Files and Checkpoint Size settings in
.