Considerations for Defining Tables
A table acquires data attributes when you define its columns in a CREATE TABLE or ALTER TABLE statement with at least a name and a data type phrase.
Data attributes control the internal representation of stored data and determine how that data is presented to a user. You can use the FORMAT phrase to change the external representation returned by a SELECT query.
The table definition directly affects the performance of applications that access that table. Proper planning helps you define a table correctly at the time of creation. The following table lists several things to consider when planning your tables.
Issue |
Considerations |
Compression (Algorithmic) |
Specify the COMPRESS USING udfname option of the CREATE TABLE or ALTER TABLE statement to compress character data by table column and minimize storage. This option allows you to choose your own compression or decompression algorithms. Algorithmic compression works best on seldom-used data. See Database Design for details. |
Compression (Block-level) |
Specify the BLOCKCOMPRESSION = block_compression_option of the CREATE TABLE or ALTER TABLE statement to compress data blocks of primary and fallback tables to minimize storage. Software-based block-level compression works best on large volumes of seldom-accessed tables. See Database Design for details. |
Compression (Multivalue) |
Specify the COMPRESS phrase to compress up to 255 specific values or NULLs to zero space. (Nullable columns with nulls are automatically compressed, when COMPRESS is specified.) Use ALTER TABLE to immediately add, change, or delete compression on a column, column partition, or a table. See Database Design for details. |
Default value of a column |
When you create or alter a table, you can specify a default value for a column by using the DEFAULT function. Specifying a default value allows the system to return a value associated with the column. If you do not specify a value, the default is null. Note: You cannot assign a user-defined type (UDT) as an explicit default value to the column. You can assign the column to return USER, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP built-in values as the default value. For more information, see SQL Functions, Operators, Expressions, and Predicates. For more information on DEFAULT and NULLs returned to the host, see SQL Data Types and Literals. |
Default versus NULL for aggregation results |
During aggregation, the unspecified value represented by a: For example, assume you want the average salary of employees in Department 300 as follows: SELECT DeptNo, AVG(Salary)
If a salary is not known, the result differs depending on how the column is defined. If an unspecified value in the Salary column is defined with: You can do the following to further control how unknowns are handled during aggregation: NULLIF(SALARY,defaultvalue)
In this case, the defaultvalue is ignored in computing the average. COALESCE function definition. For example: COALESCE(Salary,defaultvalue)
where defaultvalue is the value to be used in computing the average. |
Defining consistent data types and column names |
Columns for the same data in different tables should have the same type and name for ease of identification. For example, if the column title is EmpNo in one table, it should be EmpNo in another table. Consistency of data across tables is critical. Some considerations include: If you change data in one table, data in other tables may be affected. For example, updating a department number in the Department table also affects the Employee table, which contains a DeptNo column. To maintain data integrity, you can use: |
IDENTITY column |
IDENTITY is an optional attribute used to generate a number for every row inserted into the table on which it is defined. An identity column does not have to be the first column in the table or defined as an index. The IDENTITY attribute may be specified such that the generated number will be unique. |
Joins involving NULLs |
If a column used in a join contains either NULLs or generic values, the results might be misleading. For example, assume you want to join the Employee and Department tables to obtain a listing of employees and their workplaces: SELECT Name, Loc
Undesired results may occur. The nulls in each table prevent matches. Employees with an unknown department number (Employee.DeptNo is NULL) and departments without employees are not listed. To list an employee with an unknown department number and a department with no employees, use a different default value to represent unknowns in the DeptNo column, as follows: 1 In the Employee table, add a mock “unknown” employee who has a DeptNo equal to the value of the default used for the Department table. 2 To the Department table, add a mock “unknown” department that has a DeptNo equal to the value of the default used for the Employee table. Note: In the example query, a full outer join could be used instead of the inner join to obtain information for non-matching rows when there is a NULL in DeptNo. (Also, as noted previously, you may use the COALESCE function to change a NULL to a value.) However, using outer joins to obtain non-matching rows may be more efficient. For further discussion on the implications of using NULL and for information on how to design efficient joins, see “Joins and Hash Indexes” in Database Design. |
NULL and the GROUP BY clause |
If you use a column defined as NULL for grouping aggregate results, the results may be confusing. For example, assume you submit this query to find the average salary for each department in the organization: SELECT DeptNo, AVG(Salary)
GROUP BY DeptNo;
The results can differ. If the DeptNo column is: Note: You can convert a NULL into a default value (with NULLIF) and a default value into NULL (with COALESCE), as explained previously for aggregated results. |
Nullability of a column |
It may not always be possible to specify a value when inserting a new data row. For example, a new employee may not immediately have a job title. You can explicitly define the nullability of a column with a NULL or NOT NULL phrase as follows: If no value is given for a column when a row is inserted, and the nullability of a column is: Note: It is best practice is specify NOT NULL for a column with a default value (if a default value is appropriate) unless there is a valid reason for allowing the column to be nullable. This saves space and allows for more efficient query plans in some cases. If the column does need to be nullable, be sure to determine, understand, and document the meaning of NULL for this column since null has various interpretations. If an application program requests data from a column (without NULL indicators) that allows nulls and a NULL is found, a substitute value that is compatible with the data type of the column is returned to the application instead of a null. The substitute value (a zero, blank, or zero-length element string) might be misleading, because NULL is an unknown. Note: Never use the phrase NOT IN on nullable columns always replace it with the phrase NOT EXISTS. If NOT IN is used, it will make the database work a lot harder. Also, note that any comparison to NULL results is an UNKNOWN value. NOT EXISTS, however, treats UNKNOWN values as FALSE as shown in EXPLAIN. For example: 1 in (1,2,NULL) -> 1=1 or 1=2 or 1=NULL -> TRUE OR FALSE OR UNKNOWN -> TRUE 1 not in (1,2,NULL) -> 1<>1 and 1<>2 and 1<>NULL -> TRUE and FALSE and UNKNOWN -> UNKNOWN For more information on unexpected answer sets, see “Behavior of Nulls for NOT IN” in SQL Functions, Operators, Expressions, and Predicates. |
Other considerations |
Consider: |
NoPI table |
Use a NoPI table when either you want to column-partition the table, a primary index does not add any value, you do not know yet what primary index to specify, or you need a staging table and restrictions on NoPI tables are acceptable. To create a staging table, use a NoPI table which is a table without a PI. Using a NoPI table for staging purposes improves performance when loading data from FastLoad and TPump jobs because the table is not hashed by a PI and the rows do not have to be sorted in any particular order or redistributed to any particular AMP. NoPI tables are useful for applications that load data into a staging table that needs to go through some sort of transformation or standardization. The data must be converted before being stored in another table. You can then apply the data from the staging table to a target table through an INSERT... SELECT, UPDATE-FROM or MERGE-INTO statement. It is best practice is to explicitly specify a PI or NoPI instead of relying on the default for the PI. The default may not be appropriate and may cause skewing of the data on the AMPs. For more information on how to create a NoPI table, see “CREATE TABLE” in SQL Data Definition Language. |