A table acquires data attributes when you define its columns in a CREATE TABLE or ALTER TABLE statement with at least a name and a data type phrase.
Data attributes control the internal representation of stored data and determine how that data is presented to a user. You can use the FORMAT phrase to change the external representation returned by a SELECT query.
The table definition directly affects the performance of applications that access that table. Proper planning helps you define a table correctly at the time of creation. The following table lists things to consider when planning your tables.
Issue | Considerations |
---|---|
Compression (Algorithmic) |
Specify the COMPRESS USING udfname option of the CREATE TABLE or ALTER TABLE statement to compress character data by table column and minimize storage. This option allows you to choose your own compression or decompression algorithms. Algorithmic compression works best on seldom-used data. See Algorithmic Compression. |
Compression (Block-level) |
Specify the BLOCKCOMPRESSION = block_compression_option of the CREATE TABLE or ALTER TABLE statement to compress data blocks of primary and fallback tables to minimize storage. Software-based block-level compression works best on large volumes of seldom-accessed tables. See Software-Based Block-Level Compression. For Object File System tables, BLOCKCOMPRESSION has a default value that you cannot change.
|
Compression (Multivalue) |
Specify the COMPRESS phrase to compress up to 255 specific values or NULLs to zero space. (Nullable columns with nulls are automatically compressed, when COMPRESS is specified.) Use ALTER TABLE to immediately add, change, or delete compression on a column, column partition, or a table. See Multivalue Compression. |
Default value of a column | When you create or alter a table, you can specify a default value for a column by using the DEFAULT function. Specifying a default value allows the system to return a value associated with the column. If you do not specify a value, the default is null. |
Default or NULL for aggregation results | During aggregation, the unspecified value represented by a:
For example, assume you want the average salary of employees in Department 300 as follows: SELECT DeptNo, AVG(Salary)
FROM Employee
GROUP BY DeptNo
WHERE DeptNo = 300 ;
If a salary is not known, the result differs depending on how the column is defined. If an unspecified value in the Salary column is defined with the following:
You can do the following to further control how unknowns are handled during aggregation:
|
Defining consistent data types and column names | Columns for the same data in different tables must have the same type and name for ease of identification. For example, a column named EmpNo in one table must be named EmpNo in another table. Consistency of data across tables is critical. Considerations include:
|
IDENTITY column | IDENTITY is an optional attribute used to generate a number for every row inserted into the table. An identity column does not have to be the first column in the table or defined as an index. The IDENTITY attribute may be specified such that the generated number is unique. Identity column is supported on both the Block File System and Object File System. However, you can only run queries related to an identity column table on the primary cluster.
|
Joins involving NULLs | If a column used in a join contains either NULLs or generic values, the results may be misleading. For example, assume you want to join the Employee and Department tables to get a listing of employees and their workplaces:SELECT Name, Loc
FROM Employee, Department
WHERE Employee.DeptNo = Department.DeptNo;
Undesired results may occur. The nulls in each table prevent matches. Employees with an unknown department number (Employee.DeptNo is NULL) and departments without employees are not listed. To list an employee with an unknown department number and a department with no employees, use a different default value to represent unknowns in the DeptNo column, as follows:
In the example query, a full outer join can be used instead of the inner join to get information for non-matching rows when there is a NULL in DeptNo. (Also, as noted previously, you may use the COALESCE function to change a NULL to a value.) However, using outer joins to get non-matching rows may be more efficient.
For further discussion on the implications of using NULL and for information on how to design efficient joins, see Join Indexes. |
NULL and the GROUP BY clause | If you use a column defined as NULL for grouping aggregate results, the results may be confusing. For example, assume you submit this query to find the average salary for each department in the organization:SELECT DeptNo, AVG(Salary)
FROM Employee
GROUP BY DeptNo;
The results can differ, depending on the definition of the DeptNo column:
You can convert a NULL into a default value (with NULLIF) and a default value into NULL (with COALESCE), as explained previously for aggregated results.
|
Nullability of a column | Specifying a value when inserting a new data row may not be possible. For example, a new employee may not immediately have a job title. You can explicitly define the nullability of a column with a NULL or NOT NULL phrase as follows: If no value is given for a column when a row is inserted, and the nullability of a column is the following:
Teradata recommends specifying NOT NULL for a column with a default value (if a default value is appropriate) unless there is a valid reason for allowing the column to be nullable. This saves space and may allow more efficient query plans. If the column does need to be nullable, be sure to determine, understand, and document the meaning of NULL for this column, because null has multiple interpretations.
If an application program requests data from a column (without NULL indicators) that allows nulls and a NULL is found, a substitute value that is compatible with the data type of the column is returned to the application instead of a null. The substitute value (a zero, blank, or zero-length element string) may be misleading, because NULL is an unknown. Do use the phrase NOT IN on nullable columns, use NOT EXISTS. NOT IN makes the database work much harder.
Any comparison to NULL results is an UNKNOWN value. NOT EXISTS, however, treats UNKNOWN values as FALSE, as shown in EXPLAIN. For example: 1 in (1,2,NULL) -> 1=1 or 1=2 or 1=NULL -> TRUE OR FALSE OR UNKNOWN -> TRUE 1 not in (1,2,NULL) -> 1<>1 and 1<>2 and 1<>NULL -> TRUE and FALSE and UNKNOWN -> UNKNOWN |
Other considerations | Consider the following:
|
NoPI table | Use a NoPI table when either you want to column-partition the table, a primary index does not add any value, you do not know yet what primary index to specify, or you need a staging table and restrictions on NoPI tables are acceptable. To create a staging table, use a NoPI table which is a table without a PI. Using a NoPI table for staging purposes improves performance when loading data from FastLoad and TPump jobs because the table is not hashed by a PI and the rows do not have to be sorted in a specified order or redistributed to a specified AMP. NoPI tables are useful for applications that load data into a staging table that must go through transformation or standardization. The data must be converted before being stored in another table. You can then apply the data from the staging table to a target table through an INSERT ... SELECT, UPDATE-FROM or MERGE INTO statement. Explicitly specify a PI or NoPI instead of relying on the default for the PI. The default may not be appropriate and may cause skewing of the data on the AMPs. |