Before release 17.20, Vantage distributed Parquet files across AMPs based on file size, balancing the load across all AMPs. This distribution could cause skew if there were fewer files than AMPs or if files were of significantly different sizes.
Now Vantage can distribute Parquet files across AMPs based on rowgroup boundaries (parallel distribution).
Benefits
- Eliminates skew, improving performance of retrieve queries from Parquet tables.
Improvement is significant when the number of files is not a multiple of number AMPs or the difference between file sizes is large.
Considerations
- This feature is enabled by default.
To make Vantage distribute Parquet files across AMPs based on file size, use SET QUERY_BAND BINPACKALGO4PARQUET=1.