16.10 - Considerations for Defining Tables - Teradata Database

Teradata Database Administration

Product
Teradata Database
Release Number
16.10
Release Date
April 2018
Content Type
Administration
Publication ID
B035-1093-161K
Language
English (United States)

A table acquires data attributes when you define its columns in a CREATE TABLE or ALTER TABLE statement with at least a name and a data type phrase.

Data attributes control the internal representation of stored data and determine how that data is presented to a user. You can use the FORMAT phrase to change the external representation returned by a SELECT query.

The table definition directly affects the performance of applications that access that table. Proper planning helps you define a table correctly at the time of creation. The following table lists several things to consider when planning your tables.

Issue Considerations
Compression

(Algorithmic)

Specify the COMPRESS USING udfname option of the CREATE TABLE or ALTER TABLE statement to compress character data by table column and minimize storage. This option allows you to choose your own compression or decompression algorithms. Algorithmic compression works best on seldom-used data. See Database Design for details.
Compression

(Block-level)

Specify the BLOCKCOMPRESSION = block_compression_option of the CREATE TABLE or ALTER TABLE statement to compress data blocks of primary and fallback tables to minimize storage. Software-based block-level compression works best on large volumes of seldom-accessed tables. See Database Design for details.
Compression

(Multivalue)

Specify the COMPRESS phrase to compress up to 255 specific values or NULLs to zero space. (Nullable columns with nulls are automatically compressed, when COMPRESS is specified.)

Use ALTER TABLE to immediately add, change, or delete compression on a column, column partition, or a table. See Database Design for details.

Default value of a column When you create or alter a table, you can specify a default value for a column by using the DEFAULT function. Specifying a default value allows the system to return a value associated with the column. If you do not specify a value, the default is null.
You cannot assign a user-defined type (UDT) as an explicit default value to the column.

You can assign the column to return USER, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP built-in values as the default value. For more information, see SQL Functions, Operators, Expressions, and Predicates. For more information on DEFAULT and NULLs returned to the host, see SQL Data Types and Literals.

Default versus NULL for aggregation results During aggregation, the unspecified value represented by a:
  • NULL, which is ignored in the calculation.
  • Default value, which is included in the calculation.

For example, assume you want the average salary of employees in Department 300 as follows:

SELECT DeptNo, AVG(Salary)
FROM Employee
GROUP BY DeptNo
WHERE DeptNo = 300 ;

If a salary is not known, the result differs depending on how the column is defined. If an unspecified value in the Salary column is defined with the following:

  • A DEFAULT value, then the result includes the default value as if it were the actual value. This may be far from the average result if all values were known.
  • A NULL, then the result is the average of the salaries that are known. NULL is ignored. This may be closer to the actual average.

You can do the following to further control how unknowns are handled during aggregation:

  • Exclude substitutions by making NULL the default value for an unknown, then use the NULLIF function definition; for example:
    NULLIF(SALARY, defaultvalue )

    In this case, the defaultvalue is ignored in computing the average.

  • Change an unknown represented by a NULL into the default value, then use the

    COALESCE function definition. For example:

    COALESCE(Salary, defaultvalue )

    where defaultvalue is the value to be used in computing the average.

Defining consistent data types and column names Columns for the same data in different tables should have the same type and name for ease of identification. For example, if the column title is EmpNo in one table, it should be EmpNo in another table. Consistency of data across tables is critical. Some considerations include:
  • Changing data in one table.

    If you change data in one table, data in other tables may be affected. For example, updating a department number in the Department table also affects the Employee table, which contains a DeptNo column. To maintain data integrity, you can use:

    • A macro, a stored procedure, a trigger, or an application program to update all the tables affected by a change.
    • RI constraints. (For details, see Database Design and SQL Data Definition Language.)
  • Joining the columns of one table with the columns of another table. Ensure the following:
    • Join columns are of the same data type and size.
    • Data is consistent in both tables. For example, to join two tables on the Employee Name column, the notation must be identical in both tables (for instance, lastname firstinitial, such as “Smith H”).
    • All row-level security-protected tables referenced in a join contain the same constraints.
  • Using tables to store large data objects. Create a table with LOB type columns. This includes:
    • Character Large Object (CLOB)- A CLOB column can store character data, such as simple text, HTML, or XML documents.
    • Binary Large Object (BLOB)- A BLOB column can store binary objects, such as graphics, video clips, files, and documents.
IDENTITY column IDENTITY is an optional attribute used to generate a number for every row inserted into the table on which it is defined. An identity column does not have to be the first column in the table or defined as an index. The IDENTITY attribute may be specified such that the generated number will be unique.
Joins involving NULLs If a column used in a join contains either NULLs or generic values, the results might be misleading. For example, assume you want to join the Employee and Department tables to obtain a listing of employees and their workplaces:
SELECT Name, Loc
FROM Employee, Department
WHERE Employee.DeptNo = Department.DeptNo;

Undesired results may occur. The nulls in each table prevent matches. Employees with an unknown department number (Employee.DeptNo is NULL) and departments without employees are not listed.

To list an employee with an unknown department number and a department with no employees, use a different default value to represent unknowns in the DeptNo column, as follows:

  1. In the Employee table, add a mock “unknown” employee who has a DeptNo equal to the value of the default used for the Department table.
  2. To the Department table, add a mock “unknown” department that has a DeptNo equal to the value of the default used for the Employee table.
In the example query, a full outer join could be used instead of the inner join to obtain information for non-matching rows when there is a NULL in DeptNo. (Also, as noted previously, you may use the COALESCE function to change a NULL to a value.) However, using outer joins to obtain non-matching rows may be more efficient.

For further discussion on the implications of using NULL and for information on how to design efficient joins, see “Joins and Hash Indexes” in Database Design.

NULL and the GROUP BY clause If you use a column defined as NULL for grouping aggregate results, the results may be confusing. For example, assume you submit this query to find the average salary for each department in the organization:
SELECT DeptNo, AVG(Salary) 
FROM Employee
GROUP BY DeptNo;

The results can differ, depending on the definition of the DeptNo column:

  • If DeptNo is allowed to be NULL, and two employees have not yet been assigned a department number, then the result lists the computed average for those two employees under a NULL department number. This might be confusing.
  • If DeptNo is defined with DEFAULT and the specified constant is meaningful (such as Unknown), this result is more valid.
You can convert a NULL into a default value (with NULLIF) and a default value into NULL (with COALESCE), as explained previously for aggregated results.
Nullability of a column It may not always be possible to specify a value when inserting a new data row. For example, a new employee may not immediately have a job title. You can explicitly define the nullability of a column with a NULL or NOT NULL phrase as follows:

If no value is given for a column when a row is inserted, and the nullability of a column is the following:

  • Defined as NOT NULL and a DEFAULT value is not specified, then the INSERT statement returns an error.
  • Not defined, then a NULL is supplied automatically.
It is best practice is specify NOT NULL for a column with a default value (if a default value is appropriate) unless there is a valid reason for allowing the column to be nullable. This saves space and allows for more efficient query plans in some cases. If the column does need to be nullable, be sure to determine, understand, and document the meaning of NULL for this column since null has various interpretations.

If an application program requests data from a column (without NULL indicators) that allows nulls and a NULL is found, a substitute value that is compatible with the data type of the column is returned to the application instead of a null. The substitute value (a zero, blank, or zero-length element string) might be misleading, because NULL is an unknown.

Never use the phrase NOT IN on nullable columns always replace it with the phrase NOT EXISTS. If NOT IN is used, it will make the database work a lot harder.

Also, note that any comparison to NULL results is an UNKNOWN value. NOT EXISTS, however, treats UNKNOWN values as FALSE as shown in EXPLAIN.

For example:

1 in (1,2,NULL) -> 1=1 or 1=2 or 1=NULL -> TRUE OR FALSE OR UNKNOWN -> TRUE

1 not in (1,2,NULL) -> 1<>1 and 1<>2 and 1<>NULL -> TRUE and FALSE and UNKNOWN -> UNKNOWN

For more information on unexpected answer sets, see “Behavior of Nulls for NOT IN” in SQL Functions, Operators, Expressions, and Predicates.

Other considerations Consider the following:
  • Whether or not to have a primary index. (See the NoPI row in this table.)
  • The columns to use in the primary index and whether the PI is unique or not.
  • If the table has a primary index, whether to have one or more levels of row partitioning.
  • If there is no primary index, whether to have column partitioning and, optionally, one or more levels of row partitioning.
  • What secondary indexes to define, if any, and whether they are unique.
  • What constraints are needed on each of the columns.
  • Row header compression for column partitions that have COLUMN format can reduce storage costs and enhance system performance. However, in some cases, such as wide column partitions that do not benefit from autocompression, ROW format (no row header compression) may be better.
  • Whether to allow concurrent reads on committed rows in a table while loading. A table must be set up as load isolated to use this feature. A load-isolated table requires more space than a non-isolated table and requires regular maintenance to remove logically deleted rows. For more information, see About Reading Committed Data While Loading to the Same Table.
NoPI table Use a NoPI table when either you want to column-partition the table, a primary index does not add any value, you do not know yet what primary index to specify, or you need a staging table and restrictions on NoPI tables are acceptable.

To create a staging table, use a NoPI table which is a table without a PI. Using a NoPI table for staging purposes improves performance when loading data from FastLoad and TPump jobs because the table is not hashed by a PI and the rows do not have to be sorted in any particular order or redistributed to any particular AMP.

NoPI tables are useful for applications that load data into a staging table that needs to go through some sort of transformation or standardization. The data must be converted before being stored in another table. You can then apply the data from the staging table to a target table through an INSERT... SELECT, UPDATE-FROM or MERGE-INTO statement.

It is best practice is to explicitly specify a PI or NoPI instead of relying on the default for the PI. The default may not be appropriate and may cause skewing of the data on the AMPs.

For more information on how to create a NoPI table, see “CREATE TABLE” in SQL Data Definition Language.