Difference between apply, map_row, map_partition, and udf
| DataFrame.apply() | DataFrame.map_row() | DataFrame.map_partition() | udf() |
|---|---|---|---|
| Executes on every teradataml DataFrame row on VantageCloud Lake. | Executes on every teradataml DataFrame row on VantageCloud Enterprise. | Executes on group of teradataml DataFrame rows on VantageCloud Enterprise. | Executes on every teradataml DataFrame row on VantageCloud Enterprise. |
| Returns teradataml DataFrame | Returns teradataml DataFrame | Returns teradataml DataFrame | Returns teradataml DataFrame Column |
| Teradata recommends having the same Python interpreter version and same version of Python libraries, that are used inside the function, in the local client environment and the server-side user environment. | Teradata recommends having the same Python interpreter version and same version of Python libraries, that are used inside function, in the local client environment and VantageCloud Enterprise. | Teradata recommends having the same Python interpreter version and same version of Python libraries, that are used inside function, in the local client environment and VantageCloud Enterprise. | Teradata recommends having the same Python interpreter version and same version of Python libraries, that are used inside the function, in the local client environment and the server-side user environment. |
| Lambda functions are supported. | Lambda functions are supported. | Lambda functions are supported. | Lambda functions are not supported. |
udf vs apply vs map_row vs map_partition: When to use what in teradataml
| UDF/Method | When to use |
|---|---|
| udf() |
You can directly access each row’s column data by specifying the column name as an input to the Python function, unlike other function where you must design Python functions to read the data from the Series object or iterator (TextFileReader object) and manipulate it accordingly. (VantageCloud Enterprise and VantageCore) With udf(), Python function can only return a single values, while DataFrame.map_row() and DataFrame.map_partition() allow Python functions to either print output to the standard output or return objects such as numpy 1-D or 2-D arrays, pandas Series, or pandas DataFrames. |
| DataFrame.apply() |
|
| DataFrame.map_row() |
|
| DataFrame.map_partition() |
|