Setting the Optimal Teradata Parallel Data Pump SERIALIZE Option
Setting the SERIALIZE option of the Teradata Parallel Data Pump utility to ON in the BEGIN LOAD statement serves two purposes.
The SERIALIZE option applies rows in the order that they occur in the input data source.
You do this by using the KEY option to specify the primary index of the table to force rows with the same primary index value to go into the same session.
The SERIALIZE option forces rows with the same primary index value to go into the same session, which reduces hash lock contention among multiple sessions.
The SERIALIZE option is mostly important for NUPI tables, especially with highly non‑unique data. There is some additional CPU for the Teradata client when SERIALIZE is set to ON.
For a NoPI table or a column‑partitioned table, the traditional hash lock contention issue no longer applies because the table does not have a primary index. If the order of data application is not important to you, you should set SERIALIZE to OFF.
For a NoPI table or column‑partitioned table, Teradata Database generally uses one hash value for as many rows as there are that fill up the 44‑bit uniqueness values for a combined row partition. A row hash lock on a NoPI table or column‑partitioned table usually locks all the rows on an AMP because it is frequently true that the rows on an AMP all have the same row hash value in their rowID. Typically, multiple Teradata Parallel Data Pump sessions running on the same AMP on the same NoPI table or column‑partitioned table block one other. Therefore, you should keep the number of Teradata Parallel Data Pump sessions to the number of AMPs in your system.