The NOS Schema Evolution feature facilitates evolving the foreign table schema based on any structural changes to Parquet files. You can add, update, or remove columns from any position within the file without necessitating a complete rebuilding of the Data Definition Language (DDL). Schema evolution of the Parquet file can be incorporated into the DDL using ALTER FOREIGN TABLE.
This feature enables flexible schema evolution for Parquet files and foreign table DDLs. It supports column sequencing (re-ordering, re-positioning, removing) when data is added to existing Parquet files.
Behavior Allowed in Parquet Files with and without Schema Evolution
Behavior | Index-Based Approach (without Schema Evolution | Column Name-Based Approach (with Schema Evolution) |
---|---|---|
Re-order columns | Re-ordering of column does not give correct results. | Re-ordering of column gives correct results. |
Add columns at the beginning or middle of the file | Addition of columns in beginning and middles is not allowed, it allows only addition at the end. | Addition of the column can be done at any position. |
Remove columns from the beginning or middle of the file | Removal of columns in beginning and middles is not allowed, it allows only removal from the end. | Removal of the column can be done at any position. |
Case-sensitive column name, e.g., EMPID, empid, EmpId | Any kind of duplicates is allowed. | Only duplicates with different case will be allowed. |
Exact duplicate name | Any kind of duplicates is allowed. | Processing exact duplicate names would be ambiguous, so this can’t be supported |
Behavior Allowed in Foreign Table Definition with and without Schema Evolution
Behavior | Index-Based Approach (without Schema Evolution | Column Name-Based Approach (with Schema Evolution) |
---|---|---|
Add new column introduced in the Parquet file | Allow addition of only those columns which are added at the end of the Parquet file. | Columns can be at any position in the Parquet file. In DDL, the column is added at the end. |
Remove unwanted column from the Parquet file | Allow removal of only those columns which are added at the end of the Parquet file. | Columns can be removed from any position in Parquet files. |
User-created column | You can create columns equal to or less than the number of columns in the actual Parquet file. Corresponding columns should have a valid datatype matched to the Parquet files, else an error will be thrown. SELECT returns output for columns with valid names and data type. |
You can create a maximum of 2048 columns in the DDL. No validation on the column name/data type will be performed. |
Limitation
Parquet files cannot have duplicate column names as they are case-sensitive. For example, column names can be specified as EMPID, empid, and EmpID but not EMPID and EMPID.