Parallel Distribution of Parquet Files| Analytics Database 17.20 | Teradata Vantage - Parallel Distribution of Parquet Files - Analytics Database - Teradata Vantage

Teradata Vantageā„¢ - Analytics Database Release Summary - 17.20 What's New

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Analytics Database
Teradata Vantage
Release Number
17.20
Published
June 2022
Language
English (United States)
Last Update
2024-01-30
dita:mapPath
jva1628096041737.ditamap
dita:ditavalPath
qkf1628213546010.ditaval
dita:id
weq1472245453190
Product Category
Teradata Vantage

Before release 17.20, Vantage distributed Parquet files across AMPs based on file size, balancing the load across all AMPs. This distribution could cause skew if there were fewer files than AMPs or if files were of significantly different sizes.

Now Vantage can distribute Parquet files across AMPs based on rowgroup boundaries (parallel distribution).

Benefits

  • Eliminates skew, improving performance of retrieve queries from Parquet tables.

    Improvement is significant when the number of files is not a multiple of number AMPs or the difference between file sizes is large.

Considerations

  • This feature is enabled by default.

    To make Vantage distribute Parquet files across AMPs based on file size, use SET QUERY_BAND BINPACKALGO4PARQUET=1.