15.10 - Glossary - Teradata Database

Teradata Database SQL Request and Transaction Processing

prodname
Teradata Database
vrm_release
15.10
category
Programming Reference
User Guide
featnum
B035-1142-151K

Numerics

2PC

Two-Phase Commit

A method of ensuring that updates in a distributed database management system either commit to all target nodes in the transaction or all roll back.

See Introduction to Teradata for details.

2PL

Two-Phase Locking
A method of locking database objects that ensures serializability, thus preserving the consistency of the database. The 2 phases are the growing phase, during which all locks on database objects are acquired, and the shrinking phase, during which those locks are dropped.

See “Two-Phase Locking Defined” on page 669 for details.

A

ACCESS Lock

Permits a user to have READ access to an object that might already be locked for READ or WRITE. An ACCESS lock does not restrict access by another user except when an EXCLUSIVE lock is required; therefore it is sometimes referred to as a dirty READ lock.

A user requesting an ACCESS lock disregards all data consistency issues. Because ACCESS and WRITE locks are compatible, the data might be undergoing updates while the user who requested the access is reading it. Therefore, any query that requests an ACCESS lock might return incorrect or inconsistent results.

The Archive/Recovery utility can also place a HUT ACCESS Lock, or ROW RANGE Lock on a database resource.

ACCESS locks are the least restrictive locks available for use on a Teradata system.

See “Teradata Database Locking Levels and Severities” on page 687 for details.

Access Plan

The component of a Query Plan that contains information about how to access rows optimally.

ACID

Atomicity, Consistency, Isolation, Durability. The four fundamental properties all transactions in a relational database management system must possess.

See “ACID Properties of Transactions” on page 647 for details.

ACM

Association for Computing Machinery (http://www.acm.org)

The leading US-based professional society for computer professionals.

Alternate Key

Any Candidate Key for a table that is not selected to be the Primary Key.

AMP

Access Module Processor vproc
The set of software services that controls the file system and data management components of a Teradata Database.

AMP Steps

A set of machine language directives to the AMP software derived primarily from the Optimizer White Tree. The term AMP steps is a synonym for Concrete Steps.

ANSI

American National Standards Institute (http://www.ansi.org)

A US‑based umbrella standards organization based in Washington, D.C., that defines, certifies, and administers the SQL standard.

The ANSI SQL standards are available for purchase at the following website: http://webstore.ansi.org/default.aspx.

The ANSI SQL standard is also recognized by the ISO.

API

Application Programming Interface

A set of software services with a well‑defined program interface.

Arity

The number of columns in a relation.

Arity is a synonym for Degree.

ARSA Algorithm

An initialism for a four‑stage process used by Teradata Database to aggregate data. The stages are as follows.

1
Aggregate locally on each AMP participating in the aggregation.
2
Redistribute the local aggregations to their target AMPs.
3
Sort the redistributed aggregations.

This stage of the process also eliminates duplicate rows from the aggregation.

4
Aggregate globally to compute the sum aggregate total.

ASCII

American Standard Code for Information Interchange

A standard seven-bit code designed to establish compatibility between various types of data processing equipment. Originally proposed in 1963, ASCII is documented by the following standards: ISO-14962-1997 and ANSI-X3.4-1986(R1997).

The standard ASCII character set defines 128 decimal numbers ranging from 0 through 127, inclusive. The individual characters are assigned to alphanumerics, punctuation marks, and a set of commonly used special characters.

There is also an extended ASCII character set consisting of an additional 128 decimal numbers ranging from 128 through 255, inclusive. These characters are assigned to additional special, mathematical, graphic, and “foreign” characters.

Because ASCII uses only 7 bits, it is possible to use the 8th bit for parity checking.

Compare with EBCDIC.

Attribute

Attributes are the set of constructs that constitute the heading of a relation. Each attribute is made up of a column_name:domain_name pair, where domain_name signifies the data type for the column. When attributes are transferred from the logical realm into the physical realm, they are usually referred to as columns.

AWP

AMP Worker Task

B

Bidirectional Inheritance

The property of base tables and their underlying indexes being able to inherit and use existing statistics from one another when either database object in a pair has no existing interval histogram statistics.

If both database objects have existing interval histogram statistics, the Optimizer uses the set with the more recent collection timestamp.

See “Statistical Inheritance by Hash and Join Indexes” on page 220 for details.

Bin

See Bucket or Interval.

Black Tree

Synonym for SynTree. See “Syntaxer” on page 45 for details.

BLOB

Binary Large Object

A data object, usually larger than 64K, that contains only binary data such as pictures, movies, or music.

Compare with CLOB.

Blocked Request

A request that needs to access a database resource that is currently locked by another request that is imposing a lock that prevents that access. As a result, the system places the request in an I/O wait state until the lock on the resource it needs is released.

Note that blocked requests do not time out.

Compare with “Deadlock” on page 890. See “Blocked Requests” on page 710 for details.

BNF

Backus-Naur Form or Backus Normal Form

A metalanguage invented by John Backus of IBM and simplified by Peter Naur of Regnecentralen, the Danish computing institute, that is used to express context-free grammars. In other words, BNF is a formal way to describe formal languages. In this specific case, the formal language being described is the computer programming language SQL.

Broad Join Index

A Covering join index whose definition includes one or more tables that is not specified in the query it covers. To qualify for this kind of coverage, the join to an extra table only adds more columns to the final results and does not change the rows in the join index itself.

A wide range of queries can make use of a broad join index, especially when there are foreign key‑primary key relationships defined between the fact table and the dimension tables that enable the index to be used to cover queries over a subset of dimension tables.

BTEQ

Basic Teradata Query facility.
A Teradata request and script processing facility based on the CLIv2 API.

See Basic Teradata Query Reference for details.

BTET

BTET represents BEGIN TRANSACTION … END TRANSACTION, the SQL statements that must delimit all explicit transactions in Teradata session mode. You can optionally code transaction initiation and termination explicitly in Teradata session mode, but not in ANSI session mode. In ANSI session mode, all transactions begin implicitly, but must be terminated explicitly.

BTET is also an option for the Preprocessor2 TRANSACT and -tr commands signifying that the transaction protocol the application is to use is Teradata session mode (see Teradata Preprocessor2 for Embedded SQL Programmer Guide for details).

Bucket

A synonym for Interval in an Interval Histogram.

BYNET

Banyan Network - The high speed interconnect channel between the nodes of a Teradata MPP system.

C

Cache

A buffer that stores data to enable future requests for that data to be served more quickly.

Cache Hit

A case where the data required to serve a request is found to be cached.

Cache Miss

A case where the data required to serve a request is not found to be cached. In this case, the data must be retrieved from an AMP, which is a slower process.

Candidate Key

Any key for a relational table that is eligible to be selected as its Primary Key. Also see Alternate Key.

Cardinality

The number, or estimated number, of rows in a relation. The relation can be a base table, a materialized global temporary table, a volatile table, a spool file, or an index subtable.

Sometimes used to describe the number of rows per AMP that satisfy a predicate condition.

Character String Literal

See String Literal.

CHECKSUM Lock

Placed in response to a user-specified LOCKING FOR CHECKSUM request modifier (see SQL Data Manipulation Language) when using cursors in embedded SQL.

CHECKSUM locking is identical to ACCESS locking except that it adds checksums to the rows of a spool file to allow a test of whether a row in the cursor has been modified by another user or session at the time an update is being made through the cursor.

CJK

Chinese, Japanese, and Korean

A common abbreviation used to represent the multibyte character sets used to write the Chinese, Japanese, and Korean languages.

CLIv2

Call Level Interface Version 2.

The Teradata API for presenting SQL requests to Teradata Database and receiving their results.

See Teradata Call-Level Interface Version 2 Reference for Mainframe-Attached Systems or Teradata Call-Level Interface Version 2 Reference for Workstation-Attached Systems for details.

CLOB

Character Large Object

A data object, usually larger than 64K, that contains only character data such as XML or other text files.

Compare with BLOB.

Column Correlation

The degree to which the values in one column depend on the values from another column in the same table.

Column correlation is something different than the correlation used in statistical analysis, where it is a bivariate measure of the association between 2 variables. Statistical correlation ranges between -1.00 and 1.00, with -1.00 defining a perfect negative linear relationship, 1.00 defining a perfect positive linear relationship, and 0.00 defining no relationship between the 2 variables.

In contrast to the statistical definition, column correlation captures the mapping relationship of values from 2 columns.

COLUMN Format

The format of the physical row used as a Container in a column partition. The format includes an indication of the autocompression used, if any, an optional multi-value compression dictionary, a series of column partition values, and optional sets of autocompression bits: one set for each column partition value in the Container. This format, along with column partitioning, provides a column-store in Teradata Database. See Database Design for more detailed information.

Compare with ROW Format.

Column Partition

A set of one or multiple table columns grouped into a set of Column Partition Values or in a Subrow and stored in a Physical Row that is referred to as a Container, which represents the values and nulls of a column partition.

See ROW Format and COLUMN Format for definitions of the various ways that a Physical Row can be specified for a column partition.

Column‑Partitioned Object

A table or join index that has no primary index but is partitioned into rows with either COLUMN Format or a mix of COLUMN format and ROW Format.

Column Partitioning

A method of vertically partitioning sets of columns of a table.

Column Partition Value

The values of one or more columns in a column partition projected from a table row.

If the column partition is specified or system-determined to have a COLUMN Format, a column partition value is a just a column value or null if the column partition only has a single column, or multiple column values and nulls if the column partition has more than one column. One or more column partition values can be represented in a Container.

If the column partition is specified or system-determined to have a ROW Format, the column partition is represented using Subrows and the column partition value is the set of column values and nulls in the Subrow. There is one column partition value represented per Subrow.

Combined Partitioning Expression

An expression that combines all of the Partitioning Expressions of a PARTITION BY clause and the column partition number for a partitioned table, which is 1, into a single expression.

The combined partitioning expression for a nonpartitioned table is 0.

The combined partitioning expression is such that ordering on the combined partitioning expression value for rows would have the same ordering for rows if they were ordered by the value of the first partitioning expression, the second partitioning expression, and so on.

Combined Partition Number

The result of the Combined Partitioning Expression for a specific set of values of the partitioning columns.

If a table is column‑partitioned, its combined partition number is the combined partitioned number for the table row using one as the column partition number for a combined partition number of a table row. The combined partition number is adjusted to the partition number of a specific column partition when determining the combined partition number for that column partition. For a nonpartitioned table, the combined partition number is 0.

Complete Transaction History

A sequence of operations that reflects the execution of multiple transactions, including a transaction‑terminating COMMIT or ROLLBACK for each transaction in the Transaction History.

Compressed Interval Histogram

A family of histograms that combines High-Biased Intervals with Equal-Height Intervals and Master Records. The name of this family of histograms is somewhat misleading because unlike the other histogram types, there is no similarly named Interval (because its intervals are a mix of high-biased and equal-height intervals).

The High-Biased Intervals are always stored in Buckets 1 through n, where n represents the highest numbered Bucket containing High-Biased Interval information.

Equal-Height Intervals begin at bucket n + 1 in a compressed interval histogram.

See “Types of Interval Histograms Used By Teradata Database” on page 167 for details.

Compression Ratio

When you collect statistics on a column partition, the Optimizer computes the ratio of the number of physical data blocks to the estimated number of logical data blocks as a compression ratio.

The number of physical data blocks is the actual number of blocks occupied by the physical partitions of a column partition.

Teradata Database estimates the number of logical data blocks based on the number of active rows in the column‑partitioned object and the estimated logical row size of the column‑partitioned object.

If no PARTITION statistics have been collected for the column‑partitioned object, the Optimizer estimates a compression ratio using values taken from a Dynamic AMP Sample.

Concrete Steps

A set of machine language directives to the AMP software derived primarily from the Optimizer White Tree.

These are the steps created when OptApply binds USING request modifier data parcels into the Plastic Steps produced earlier by the Generator. The term concrete steps is a synonym for AMP Steps.

See “Definition of Concrete Steps” on page 68 for details.

Condition

See Predicate.

Connecting Condition

See Connecting Term.

Connecting Term

A connecting term is a predicate that connects a query and its immediate subquery. Connecting terms have the following additional properties.

  • They can specify any comparison operator.
  • The specified comparison operator can be qualified by an ANY or ALL qualifier.
  • If the comparison operator is qualified by ANY or ALL, then the subquery can return multiple rows.
  • If the comparison operator is not qualified by ANY or ALL, then the subquery can return only a single row.
  • In the following example, y1 = ANY (SELECT y2 FROM t2) is the connecting term.

       SELECT x1
       FROM t1
       WHERE y1=ANY (SELECT y2
                     FROM t2);

    Constant Expression

    An SQL expression that does not make any column references.

    Container

    A Physical Row that has a COLUMN Format. A Container contains a representation of a series of Column Partition Values for a column partition that is specified to have a COLUMN Format or that has a COLUMN Format by default.

    A container can be either a single-column container or a multicolumn container. A series of containers with increasing row ID values represents a table column or set of table columns.

    Compare withSubrow.

    Correct Transaction History

    A Serializable Transaction History. This means that the read and write operations of a set of transactions can be reordered until the read and write operations of each transaction are together without affecting the values they read.

    Correlated Join

    A correlated join is a class of join methods developed to process correlated subqueries.

    For each type of correlated join, the right table is a collection of groups and a left row can be returned once for each group.

    Correlating Term

    A correlating term is a predicate that references an outer table from a subquery.

    In the following example, x1 = x2 is the correlating term. Column x1 correlates with table t1 in the outer (main) query block, and column x2 correlates with table t2 in the inner (subquery) query block:

       SELECT * FROM t1 WHERE y1 IN (SELECT y2 FROM t2 WHERE x1 = x2);

    Correlation Name

    A name alias assigned to a table or view name that is specified in an SQL request. Correlation names are also sometimes called range variables.

    Correlation names are associated with a table or view only within the context in which they are defined. Once you specify a correlation name for a table or view name, you must specify that correlation name in any qualified reference to a column of that instance of the table or view instead of specifying the actual table or view name.

    Cost Factor

    An environmental variable defining a system configuration element or a system performance element. There are 2 types of cost factors:

    Cost Prediction Parameter

    A Cost Profile constant used by Optimizer cost functions to predict operation costs.

    Cover

    A condition in which all the column data requested by a query can be obtained by index‑only access, typically using a hash, join, or nonunique secondary index to cover the query.

    CPU Cost Variable

    One of several possible variables that describes various CPU costs for accessing, sorting, or building rows using various methods.

    D

    Deadlock

    Consider two requests, 1 and 2, that are running concurrently. Suppose both requests need to lock database resource A and database resource B. If 1 locks B before 2, and 2 locks A before 1, neither request 1 nor request 2 can access the other resource it needs, and both must wait for the lock on the resource they need to access to be released before they can proceed.

    Deadlock is resolved in Teradata Database by aborting the most recently submitted, or younger, request, and rolling back its updates. Teradata Database does not detect HUT locks; instead, the system returns error code 2631 to the application, so your code must be written to detect error code 2631 and if it is found, to resubmit the rolled back transaction.

    Compare with Blocked Request.

    See “Deadlock” on page 722 for details.

    Degree

    The number of attributes, or columns, in a relation. If a relation has a degree of 5, then it has 5 columns.

    Degree is a synonym for Arity.

    Delayed Row Partition Elimination

    Row partition elimination can occur with conditions comparing a partitioning column to a USING variable or built-in function. This cannot be done when building a plan that is cached, because a cached plan needs to be general enough to handle changes in these values in subsequent executions.

    In certain cases, row partition elimination can be delayed until the finalized plan is built from a cached plan using the values for this specific execution of the plan, and this deferred row partition elimination is called delayed row partition elimination.

    Delayed row partition elimination is similar to Static Row Partition Elimination in that it is performed by the Optimizer, but it is done at a later stage of the optimization process.

    Delayed row partition elimination is unlike Dynamic Row Partition Elimination because it is done as part of the optimization process undertaken by the Optimizer, not done post optimization by the AMP‑based database software.

    Also see Row Partition Elimination.

    See “Delayed Row Partition Elimination” on page 338 for details.

    Derived Statistics

    Statistics that are transformed from various constraint sources, including query predicates, and then adjusted dynamically at each stage of the query optimization process. Derived Statistics are propagated from optimization stage to optimization stage by means of a data structure that contains both the relevant static interval histogram statistics and the dynamically adjusted derived statistics.

    See “Derived Statistics” on page 210 for details.

    Derived Table

    A derived table is a transitory table that is created dynamically from one or more other tables by evaluating a query expression or table expression. The semantics of derived tables are identical to the semantics of views.

    Derived tables must always be defined with a correlation name.

    DevX

    Teradata Developer Exchange. A community-oriented technical website that connects Teradata engineers with developers and customers who are building and deploying solutions on Teradata platforms. The site features downloads, articles, blogs, forums, and best practices for a wide variety of topics.

    The URL for DevX is http://developer.teradata.com.

    Direct PPI Merge Join

    A join method in which the table or join index of interest is not spooled in preparation for a Merge Join, but instead is done directly. The Optimizer might choose a direct merge join when at minimum all columns of the primary index are specified in equality join terms.

    Direct Join

    A binary join operation for which the relation of interest is not spooled in preparation for the join.

    For example, a direct merge join is a join in which the relation of interest is not spooled in preparation for a merge join. Similarly, a direct product join is a product join in which the relation of interest is not spooled in preparation for a product join.

    Directory Block

    In a Hilbert R‑tree used for a NUSI on a geospatial column, a directory block is a component of the multilevel hierarchy of the index, which has n-1 directory levels and 1 leaf level. Each directory block contains Directory Rows.

    Each Hilbert R‑tree contains one root directory block and one or more non‑root directory blocks.

    Compare with Leaf Block.

    Directory Row

    In a Hilbert R‑tree used for a NUSI on a geospatial column, a directory row points to the child directory block in the level below it.

    Compare with Leaf Row.

    Dirty Read

    A dirty read is an uncommitted dependency. Dirty read describes a situation in which one transaction is permitted to retrieve, or even update, a value that has been updated by a second concurrently running transaction that has not yet committed. The problem arises because the second transaction might never commit. If the second transaction were to roll back, or be rolled back by the system for some reason external to the transaction itself, the database would be left in an inconsistent state.

    Discretionary Access Control

    An access control mechanism that assigns privileges to objects of several different types. The privileges can be assigned at the system level, database level, table level, or column level as well as to other object types defined in the system. Privileges are then granted to users and roles. The granted privileges allow a specific set of actions to be taken against the objects on which they are granted.

    Disk Array Throughput

    A measure of the throughput for a particular disk array I/O operation expressed in input:output operations per second.

    Disk Delay Definition

    One of several possible elapsed time measures for various disk input:output operations expressed in milliseconds.

    Dispatcher

    The PE query processing component that takes the concrete steps produced by OptApply and creates AMP steps, which it then passes across the BYNET for processing by an AMP set.

    The Dispatcher also manages any abort processing that might be required, as well as receiving the results of the AMP Steps from the BYNET and returning them to the requesting application.

    See “Dispatcher” on page 69 for details.

    DOM

    Document Object Model.

    The platform‑ and language‑neutral de facto W3C standard object model used to represent text that is structured using XML and similar markup language formats.

    DPE

    See Dynamic Row Partition Elimination.

    Dynamic AMP Sample

    When there are no statistics available to quantify the demographics of a column set or index, or when existing statistics are determined to be stale, the Optimizer selects a single AMP to sample for statistics using an algorithm based on the table ID. By inference, these statistics are assumed to represent the global statistics for the sampled column. Also done where statistics already exist to get a more current estimate.

    Dynamic Query Plan

    A Specific Plan that is generated incrementally by the Optimizer using the dynamic feedback of either the results from intermediate spool files or from their statistics. A dynamic plan consists of 2 or more plan fragments. Dynamic query plans are generated by the Incremental Planning and Execution (IPE) framework. Compare with Static Query Plan.

    Dynamic Row Partition Elimination

    A form of dynamic re‑optimization performed by the AMP software when query conditions reference values in other tables that permit row partition elimination, but which cannot be determined at the time a query is initially optimized.

    See Row Partition Elimination, Delayed Row Partition Elimination, and Static Row Partition Elimination.

    Also see “Row Partition Elimination” on page 317 for details.

    E

    E2I

    External-to-Internal

    EBCDIC

    Extended Binary-Coded Decimal Interchange Code

    An 8-bit code for alphanumerics, punctuation marks, and special characters devised by IBM Corporation as an alternative to ASCII. EBCDIC and ASCII use different coding schemes to define their respective character sets, and EBCDIC defines some special characters that are not defined in ASCII. EBCDIC is only used by IBM computing equipment.

    Because EBCDIC is an 8-bit coding scheme, it is not possible to perform parity checks using the 8th bit.

    Compare with ASCII.

    Eliminated Partition

    A partition that is skipped for a particular query because the Optimizer has determined that it contains no qualifying rows for row partitioning or no columns are referenced in the partition for column partitioning.

    See Row Partition Elimination.

    Equal-Height Interval

    An interval containing column statistics normalized across the distribution in such a way that the graph of the distribution of the number of rows as a function of interval number is flat.

    This is achieved by varying the width of each interval so it contains approximately the same number of rows (but usually different attribute value ranges) as its neighbors.

    Also known as an equal-depth interval.

    See “Types of Interval Histograms Used By Teradata Database” on page 167 for details.

    Equal‑Height Interval Histogram

    A family of histograms characterized by approximately equal cardinalities and a variable attribute value range per bucket. More formally, an equal‑height interval histogram is an array of ordered equal‑height intervals.

    Also known as equal‑depth interval histograms.

    See “Types of Interval Histograms Used By Teradata Database” on page 167 for details.

    EquiSet

    In join planning, an EquiSet is a set of columns that was equated in predicates from a previous join. Propagating EquiSets to subsequent join operations in a query for reuse is part of the “Derived Statistics” framework.

    See “Using Join Predicate Redundancy to Derive Column Demographics After Each Binary Join” on page 236 for details.

    EUC

    Extended UNIX Code

    A character encoding scheme specified by ISO standard ISO-2022.

    The EUC code set uses control characters to identify characters in some of the character sets. The encoding rules are based on the ISO-2022 definition for the encoding of 7-bit and 8-bit data. The EUC code set uses control characters to separate some of the character sets.

    The various UTF-n formats are defined partly by the EUC standard and partly by the various parts of ISO standard ISO-8859.

    EVL

    EVaLuator Machine. The EVL (pronounced evil) machine is a code set that behaves like a pseudocomputer that processes a defined set of pseudoinstructions. It takes the White Tree produced by the Optimizer and converts it to an EVL tree, which can then either be interpreted directly or used to generate machine code that can be assembled to produce assembled EVL code.

    Exclusion Join

    An exclusion join is a Product Join, Merge Join, or Hash Join where only the rows that do not satisfy (are NOT IN) any condition specified in the request are joined.

    Exclusion Hash Join

    A type of Hash Join where only the rows that do not satisfy (are NOT IN) any condition specified in the request are joined.

    Exclusion Merge Join

    A type of Merge Join where only the rows that do not satisfy (are NOT IN) any condition specified in the request are joined.

    Exclusion Product Join

    A type of Product Join where only the rows that do not satisfy (are NOT IN) any condition specified in the request are joined.

    EXCLUSIVE Lock

    Placed only on a database or table when the object is undergoing structural changes (for example, a column is being created or dropped) or when a database is being restored by the Archive/Recovery utility, in which case a HUT EXCLUSIVE Lock is placed on the resource (see Teradata Archive/Recovery Utility Reference for details).

    An EXCLUSIVE lock restricts access to the object by any other user and is the most restrictive lock available for use on a Teradata system.

    See “Teradata Database Locking Levels and Severities” on page 687 for details.

    Expression Mapping

    A process used by the Optimizer to map an expression it finds within a non‑matching predicate expression to an index column on which statistics have been collected. Compare with Predicate Matching.

    External Cost Parameter

    One of several hardware or configuration details about the configuration of a Teradata system.

    External Partition Number

    The value computed by the partitioning expression for a row in a table with row partitioning. Compare with Internal Partition Number and Combined Partition Number.

    F

    Fast Path Local Nested Join

    A Nested Join that returns multiple join rows because there can be multiple row IDs from the right NUSI subtable for each pair of left and right table rows. The process followed by a fast path local nested join is as follows.

    1
    Read a row from the left base table and record its hash value.
    2
    Read the next row from the right NUSI subtable that has a row hash value that is greater than or equal to that of the left base table row.
     

    IF the row hash values are …

    THEN …

    equal

    join the 2 rows.

    not equal

    use the larger row hash value to read the row from the other table.

    FIFO

    First-In-First-Out

    A type of queue in which the first entries placed in the sequence are also the first read from it.

    Foreign Key

    A means of establishing referential integrity between tables in a relational database. A foreign key in a child table is typically the logical primary key of its parent table. If it is not the primary key for the parent table, then it is one of its alternate keys.

    Compare with Primary Key.

    See Database Design for details.

    FTS

    See Full‑Table Scan.

    Full‑Table Scan

    A situation in which indexes are not used to access table values. Instead, every row of the specified table is touched during the table scan.

    Full table scans are used whenever a request condition set does not specify an indexed column and whenever statistics are collected.

    Future Date

    When a row containing a date value is inserted into a table after date statistics have been collected on that table, the date value is said to be a future date. Note that this cannot be a true future date; it is only futuristic with respect to when the date statistics were last collected on the particular date column. This means that “future” dates range from the date statistics were last collected on the date column through the current date, inclusive.

    If such a date value is not part of a composite column or index statistics set, the Optimizer can use extrapolation to estimate its current statistics; however, it cannot apply date extrapolation for multicolumn or multicolumn index statistics.

    See “Using Extrapolation to Replace Stale Statistics” on page 288 for details.

    G

    Generator

    The PE query processing component that takes the White Tree produced by the Optimizer and creates plastic steps, which it then passes to Steps Packaging.

    Plastic steps are, except for statement literals, a data-free skeletal tree of AMP directives derived from the white tree. The completed plastic steps tree is then passed to Steps Packaging for further processing.

    Steps packaging adds context to the plastic steps by integrating various user‑ and session‑specific information. If any data parcels were passed to the Parser via a USING request modifier (see “USING Request Modifier” in SQL Data Manipulation Language), then that data is also added to the steps tree. The final product of this process is referred to as plastic steps. The plastic steps are then passed to OptApply for further processing.

    See “Generator” on page 65 for details.

    Generic Plan

    An optimizer plan that is based on not having peeked at parameterized USING values during query parsing.

    Compare with Specific Plan.

    See “Peeking at Parameterized Values in the Data Parcel” on page 28 for further information.

    GeoGrid

    The geospatial population statistical data from spatial objects in x-y space that is stored in a statistics histogram (compare with Hilbert R‑tree, which contains the summary statistical and demographic data for a Geospatial Index). You can think of the statistical snapshot of population data as rectangle in x-y space, bounded by a set of (Xmin, Ymin) and (Xmax, Ymax) coordinates. This rectangle, representing the “universe” at that instant, can then be divided into a number of equally‑sized Grids to store geospatial data.

    Geospatial Index

    A NUSI that is defined on a column that has a geospatial data type. Geospatial indexes are implemented using GeoGrids for their population component and Hilbert R‑trees for their summary component.

    Scalar NUSIs use Index Key values that are equal to their base table values, while geospatial NUSIs use index keys that only approximate their base table values.

    Global Index

    A join index (see Join Index) defined with the ROWID keyword to reference the corresponding base table rows.

    See Database Design for details.

    Global Temporary Table

    A table defined in the dictionary that produces a materialized instance of itself for each session that references it. The rows of a Global Temporary Table are not persistent across sessions.

    Grid

    An equal‑sized rectangular component of an MBR GeoGrid that contains AMP‑local population data for a Geospatial Index.

    GROUP READ Lock

    A special type of HUT READ lock, implemented internally as a ROW RANGE Lock that permits other users to update a table during an all‑AMP or cluster‑level Archive operation on tables that have an after‑image journal.

    To do this, the Archive/Recovery utility does a moving read on the subject table with the following process stages:

    1
    Place a HUT ACCESS lock at the table level.
    2
    Read a small group of rows (roughly 64 KB) using a HUT READ lock.
    3
    Archive the HUT READ‑locked rows.
    4
    Release the HUT READ lock on those rows.
    5
    Place a HUT READ lock on the next group of rows.
    6
    Repeat stages 2 through 5 until the Archive operation is completed.

    The utility releases the intermediate HUT READ locks, but you must release the GROUP READ lock using a RELEASE LOCK Archive command in order to release the table‑level HUT ACCESS lock set in stage 1.

    Also see ROW RANGE Lock and HUT GROUP READ Lock.

    H

    Hash Index

    A vertical partition of a base table having properties similar to a non‑sparse single‑table Join Index.

    Unlike the primary index, which is stored in‑line with the row it indexes, hash indexes are stored in separate subtables that must be maintained by the system. Hash index subtables also consume disk space, so you should monitor your queries periodically using EXPLAIN modifiers to determine whether the Optimizer is using any of the hash indexes you designed for them. If not, you should either drop those indexes or rewrite your queries in such a way that the Optimizer does use them.

    See Database Design for details.

    Hash Join

    A join method that operates by creating a hash table for the smaller relation in the join and then probing the larger table for matches with hash table rows.

    High-Biased Interval

    An interval used to characterize a skewed value set for a column.

    Any attribute value that is significantly skewed (see Loner and the statistic used to define loners in “Interval Histogram Terminology” on page 163) is summarized by a high-biased interval.

    Each high-biased interval contains statistics for at most 2 attribute row values.

    See “Types of Interval Histograms Used By Teradata Database” on page 167 for details.

    High‑Biased Interval Histogram

    A family of histograms characterized by many loner buckets as seen, for example, with a multimodal or otherwise-skewed attribute value distribution.

    See “Types of Interval Histograms Used By Teradata Database” on page 167 for details.

    Hilbert Curve

    A continuous space-filling curve used to map between two‑dimensional and one‑dimensional space.

    Hilbert R‑tree

    An indexing method for multidimensional database objects. For Teradata Database, Hilbert R‑trees contain summary statistical data that is used to index geospatial columns as NUSIs. Like NUSI subtables for scalar data, Hilbert R‑trees are always AMP‑local. Compare with GeoGrid. Because geospatial NUSIs can be updated, the type of Hilbert R‑tree used to implement them is a dynamic R‑tree as opposed to a static R‑tree.

    Teradata Database implements Hilbert R‑tree indexes on top of the Teradata file system rather than within it.

    Hilbert Value

    A linear value that can be associated with an n‑dimensional coordinate. For the Hilbert R‑trees used for NUSI index statistics, the number of dimensions is 2. A Teradata Database routine generates the Hilbert value by mapping the (x, y) coordinates of the center point of the Minimum Bounding Rectangle (MBR) to a linear value on a Hilbert Curve.

    A Hilbert R‑tree generates a Hilbert value for each MBR by calculating the Hilbert value associated with its centerpoint. It then makes its navigational decisions based on which Hilbert value associated with a directory block row is closest to the Hilbert value associated with the entry being inserted into the index

    Histogram

    A graphic means for representing distributions as a function of the number of elements per an arbitrarily determined interval width.

    Histograms are often called bar charts. Each bar in a histogram represents the number of rows for the defined interval. Histogram intervals are sometimes referred to as bins or buckets because they contain a number of values that summarize the demographics for the values that fall into the range defined for the interval.

    In relational query optimization theory, the term is used to describe the rows in a dictionary table or system catalog that store the buckets defined by the particular intervals used to characterize the frequency distribution of the attribute values for a column set.

    All histograms described in this book are frequency histograms. Each bucket in a frequency histogram contains some number of tokens representing the number of rows that have the attribute values belonging to its range.

    See “Interval Histograms” on page 162 for details.

    HUT ACCESS Lock

    See ACCESS Lock and HUT Lock.

    HUT EXCLUSIVE Lock

    See EXCLUSIVE Lock and HUT Lock.

    HUT GROUP READ Lock

    See GROUP READ Lock, ROW RANGE Lock, and HUT Lock.

    HUT Lock

    Host UTility (HUT) locks are the class of locks placed on Teradata Database resources by the Archive/Recovery utility and related utility commands.

    HUT ACCESS, HUT READ, and HUT GROUP READ locks are the only severities of HUT locks available for an Archive operation, while HUT EXCLUSIVE and HUT WRITE locks are the only severities available for a Recovery, Build, Copy, Rollback, or Rollforward operation (see Teradata Archive/Recovery Utility Reference for details).

    HUT READ Lock

    See READ Lock, ROW RANGE Lock, and HUT Lock.

    HUT WRITE Lock

    See WRITE Lock and HUT Lock.

    I - J

    I2E

    Internal-to-External

    IEEE

    Institute of Electrical and Electronics Engineers (http://www.ieee.org)

    The leading US-based professional society for electrical and electronics engineers.

    The largest of its member societies is the IEEE Computer Society (http://www.computer.org), whose mission is similar to that of the ACM.

    IMS

    Information Management System. The hierarchical database management product developed by IBM Corporation in the late 1960s.

    Inclusion Join

    An inclusion join is a Product Join, Merge Join, or Hash Join where the first right table row that matches the left row is joined to it.

    Inclusion Hash Join

    A type of Hash Join where the first right table row that matches the left row is joined to it.

    Inclusion Merge Join

    A type of Merge Join where the first right table row that matches the left row is joined to it.

    Inclusion Product Join

    A type of Product Join where the first right table row that matches the left row is joined to it.

    Incremental Planning and Execution (IPE)

    An infrastructure in the Optimizer used to process single‑row access and scalar uncorrelated subqueries. IPE infrastructure satisfies the following use cases.

  • Executes single‑row accesses and scalar subqueries and inserts the results of those operations into the main query.
  • Enables optimizations such as partition elimination, access path selection, join index rewrites, transitive closure, constant folding, and improved estimations.
  • Index

    A physical mechanism used to store and access table data. An index is defined, maintained, and accessed on the basis of a set of table columns.

    These columns are individually called index columns, and are collectively referred to as the key of the index or the Index Key.

    Index Key

    The set of columns on which an Index is based.

    Scalar NUSIs use index key values that are equal to their base table values, while geospatial NUSIs use index keys that only approximate their base table values.

    Index key is a term that is used internally only and is not the same thing as a Primary Key or Foreign Key.

    Index Value

    The value of the Index Key. For a specific data row, its index value is the set of column values and nulls of the column set that defines the Index Key. If every index column value is not null, the index value is said to be wholly non‑null. Also see Index.

    Initialization Parameter

    A Cost Profile parameter used during the Cost Profile Activation process to initialize Cost Prediction Parameter values.

    Inner Join

     

    You can specify ordinary, cross, or self-join inner joins against complex data type columns by specifying geospatial join predicates in the WHERE-clause of the request just as you would for a scalar comparison. While cross joins do not specify a WHERE clause condition, ordinary and self joins do specify a WHERE clause. To do this for a geospatial join condition, just specify a complex join predicate.

    Internal Partition Number

    A value calculated from the Combined Partition Number of the Combined Partitioning Expression that is used to number partitions internally. Teradata Database places the internal partition number for each row of a PPI table or column‑partitioned table or join index in the Row ID of each Physical Row.

    This number might be the same as the Combined Partition Number if no modification is needed. Modification of the internal partition number for a row is required for a single-level partitioning expression that consists solely of a CASE_N or RANGE_N function. The partitions for the NO RANGE [OR UNKNOWN], NO CASE [OR UNKNOWN], and UNKNOWN options are placed at fixed internal partitions (internal partition numbers 2, 2, and 1, respectively), with the partitions for ranges and conditions following, beginning at internal partition number 3 initially.

    Modification of an internal partition number might also occur for multilevel partitioning. For both single-level and multilevel partitioning, additional modification is required to retain existing internal partition numbers after an ALTER TABLE request that drops or adds ranges or partitions.

    An internal partition number in a Row ID referring to a table row indicates a column partition number value of 1 by convention. This can be modified to indicate a specific column partition to access that column partition.

    Interval

    A bounded, non‑overlapping set of attribute value cardinalities in an Interval Histogram.

    Interval Histogram

    A graphic means for representing Cardinality distributions for row attributes in a relational database management system.

    Index

    A physical mechanism used to store and access data in a table.

    An index is defined, maintained and accessed on the basis of a set of one or more table columns. These columns are individually called an index column and collectively called the key of the index or the Index Key.

    Index Key

    The set of one or more columns on which an Index is based.

    An index key is not the same thing as the key constraints called a Primary Key or Foreign Key.

    Interval

    With respect to an interval histogram, an interval is a bounded, non‑overlapping set of attribute value counts.

    Interval Histogram

    See Histogram. Also see “Interval Histograms” on page 162.

    IPE

    See Incremental Planning and Execution (IPE).

    ISO

    International Organization for Standardization http://www.iso.org

    An international umbrella standards organization based in Geneva, Switzerland, that also certifies the ANSI SQL standard.

    The following passage from the ISO website explains why the name of the organization does not match its (apparent) initialism: “Because “International Organization for Standardization” would have different abbreviations in different languages (“IOS” in English, “OIN” in French for Organisation internationale de normalisation), it was decided at the outset to use a word derived from the Greek isos, meaning “equal”. Therefore, whatever the country, whatever the language, the short form of the organization's name is always ISO.”

    Join

     

    Join Index

    An index that represents either a join result or a single table projection with (possibly) different distribution of rows to the AMPs and ordering on the AMPs. A join index is a vertical partition of a base table that can, depending on how it is defined, create various types of prejoins of tables, including sparse and aggregate forms. Join indexes cannot be queried directly by an SQL request; instead, they are used by the Query Rewrite Subsystem and the Optimizer to enhance the performance of any queries they Cover.

    A join index is maintained in a separate subtable that is distinct from its underlying base table set. The system maintains join indexes automatically, so maintenance to a base table is automatically reflected in any join indexes defined on that base table.

    A join index that only vertically partitions a base table is referred to as a single‑table join index.

    A join index that prejoins 2 or more base tables is referred to as a multitable join index.

    Both types of join index can be created in sparse or aggregate forms and can have a subset of their columns compressed.

    A join index can have a nonpartitioned primary index, a partitioned primary index, or no primary index. A join index that is a NoPI Object object must be a Column‑Partitioned Object, defined on a single-table, a non‑aggregate index, and it must not be row‑compressed.

    Unlike the primary index, which is stored in‑line with the row it indexes, join indexes are stored in separate subtables that must be maintained by the system. Join index subtables also consume disk space, so you should monitor your queries periodically using EXPLAIN modifiers to determine whether the Optimizer is using any of the join indexes you designed for them. If not, you should either drop those indexes or rewrite your queries in such a way that the Optimizer does use them.

    Although join indexes are essentially a type of data table, they cannot be accessed directly using SQL DML requests. Their use is reserved for the Query Rewrite Subsystem and the Optimizer.

    Join Kind

    A set of join algorithms. For example, a join kind of Merge Join is a classification that includes all of the following join algorithms: fast path inner merge join, slow path inner merge join, fast path left outer merge join, slow path left outer merge join, and so on.

    Join Method

    A synonym for Join Algorithm.

    Join Plan

    The component of a Query Plan that contains information about how to join rows optimally.

    Key

    An Attribute set that uniquely identifies each Tuple in a Relation. Implicitly synonymous with Primary Key, though it applies equally well to any Candidate Key or Foreign Key. See Primary Key.

    This term is sometimes used to describe other things. For example, the set of columns that defines an index is referred to as an Index Key, the set of values that Teradata Database sorts a set of rows on is referred to as a sort key, and so on.

    L

    Leaf Block

    In a Hilbert R‑tree used for a NUSI on a geospatial column, a leaf block is a component of the single leaf level the index contains. Each leaf block contains Leaf Rows.

    Leaf Row

    The index subtable rows contained in a Leaf Block that store statistical and demographic information for their Geospatial Index. The general form of a geospatial index is the Index Key (MBR) followed by a list of base table row ID list.

    Left Relation

    The first relation participating in a join step. The left table is the first relation listed in the EXPLAIN text of the join. Compare with Right Relation.

    Load Isolation

    Part of the definition of a table that allows committed rows to be read at the same time that new data is being loaded into the table. Non-committed rows are logically added or deleted. Load isolation is specified with the WITH CONCURRENT LOAD ISOLATION option of a CREATE/ALTER TABLE request.

    LOB

    Large Object

    Any data object that is larger than the maximum row size for Teradata Database. There are 2 types of LOB: BLOB and CLOB.

    Local Nested Join

    A local nested join implies several things.

  • If necessary, the resulting rows of a nested join are redistributed by row hashing the row ID of the right relation rows.
  • The row ID is used to retrieve the data rows from the right relation.
  • Only local nested joins can result in a Row ID Join. A row ID join is needed if and only if a Nested Join is carried out and only row IDs for rows in the right relation are retrieved.

    Lock

    A mechanism for regulating the accessibility of a system resource. Locks are implemented as a form of Semaphore. In the context of Teradata Database, a lock is a mechanism for regulating concurrent access to a database resource.

    Teradata Database locks have 2 rectangular components.

  • Level
  • Severity
  • See “Database Locks, Two-Phase Locking, and Serializability” on page 669 for details.

    Lock Manager

    A mechanism that manages the placement and release of locks on database objects.

    Locking Level

    The level of a lock refers to the database object on which it enforces some level of severity. Note that the system sets and enforces HUT locks only at the table and database levels.

    The range of levels for the locks available for use in Teradata Database are as follows.

     

    Most Encompassing

    Database

    Table

    Row hash

    Least Encompassing

    See “Teradata Database Locking Levels and Severities” on page 687 for details.

    Locking Severity

    The severity of a lock refers to how exclusive it is in accepting or rejecting other locking severities. Note that the system sets and enforces HUT lucks only at the EXCLUSIVE, READ, and GROUP READ severities.

    The range of severities for the locks available for use in Teradata Database is as follows.

     

    Most Exclusive

    EXCLUSIVE

    WRITE

    READ

    CHECKSUM
    ACCESS

    Least Exclusive

    See “Teradata Database Locking Levels and Severities” on page 687 for details.

    Also see EXCLUSIVE Lock, GROUP READ Lock, READ Lock, and ROW RANGE Lock.

    Loner

    An attribute value in an interval histogram whose frequency in the sampled population deviates significantly from a defined criterion; an unusually frequent value indicating significant frequency skew. By definition, no more than one loner is stored per High-Biased Interval.

    By this definition, the maximum number of loners in a Teradata Database histogram is 400.

    In practice, a column might produce more than 400 loners, so the implemented upper bound on the number of loners per interval histogram is 398 (or 199 intervals).

    This leaves at least one equal-height interval to account for the remaining values in the distribution.

    See “Interval Histograms” on page 162.

    LT/ST

    Large Table/Small Table (join)

    An optimized join type used to join fact (large) tables with their satellite (small) dimension tables.

    See “Star Join Optimization” on page 488 for details.

    M - N

    MBR

    Initialism for Minimum Bounding Rectangle.

    Merge Join

    A merge join retrieves rows from 2 tables by processing equality joins that probe one of the tables using either row key or row hash values from the qualified rows in the other table being joined. Merge joins assumes that the probed table is sorted either by row key or row hash. The operation then puts the joined rows onto a common AMP based on the row key or row hash of the join columns. Teradata Database sorts the rows into join column row key or row hash sequence, then joins those rows that have matching join column row key or row hash values.

    Minimum Bounding Rectangle

    All geometric shapes can be approximated by a rectangle, referred to as an MBR, whose edges enclose the object. The MBR for a Geospatial Index is an Index Key in the Hilbert R‑tree subtable that contains the index data.

    Minus All Join

    The minus all join is used to implement MINUS, INTERSECT, and outer joins.

    The process applied by the minus all join algorithm is as follows.

    1
    Distribute and sort the left and right relations based on their column_1 values.
    2
    For each left table row, start at the current relation table row and read until a row having a value >= the left relation column_1 value is found.
    3
    If the right relation row column_1 > the left relation row column_1 or if no more right relation rows are found, then return the left relation row.

    Miscellaneous Cost Parameters

    A set of miscellaneous DBS Control record values for a Teradata system such as free space percentages, maximum number of parse tree segments, Data Dictionary cache size, and so on. These parameters are expressed in various units, depending on the individual cost parameter.

    Multicolumn Container

     

    A Container with a subset of 2 or more columns of a table row. This kind of Container contains a series of Column Partition Value. Each Column Partition Value has a format similar to the ROW Format used for a regular row or Subrow but without a row header. A Column Partition Value in this case does not correspond to a Physical Row but is a structure that can repeat within a Container.

    Multicolumn Partition

    A Column Partition with 2 or more columns of a table row.

    Multicolumn Partition Value

    The values and nulls of columns in a multicolumn partition projected from a table row.

    If the multicolumn partition is specified or system-determined to have a COLUMN Format, one or more multicolumn-partition values and nulls can be represented in a Container.

    If the multicolumn partition is specified or system-determined to have a ROW Format, the column partition is represented using Subrows and the multicolumn-partition value is the set of column values and nulls in the Subrow. There is one multicolumn-partition value represented per Subrow.

    Compare with Single‑Column Partition Value.

    Multilevel Partitioning

    A partitioning scheme where partitions at any one level are subpartitioned. Multiple partitioning level specifications, specified as either a partitioning expression or a COLUMN, are used to define the partitioning. No more than one level can be specified as COLUMN.

    Nested Join

    A join for which the WHERE conditions specify an equijoin with a constant value for a unique index in one table and those conditions also match some column of that single row to a primary or secondary index of the second table.

    The nested join is one of the most optimally performing joins available because it is the only join type that need not touch all AMPs in order to join the relations.

    Network Cost Variable

    One of several possible measures of message distribution and duplication costs and overheads for a Teradata system expressed in milliseconds.

    NoPI Object

    A table or join index that has no primary index or primary AMP index. All access to NoPI rows is by means of a full‑table scan unless you define Secondary Indexes on the table that are used in conditions for queries against it, or in the case of NoPI tables, Join Indexes that cover queries made against it.

    A column‑partitioned table or join index is a NoPI database object.

    See Database Design for details.

    NP‑Complete

    NP‑completeness is a complexity class in computational complexity theory. The abbreviation NP signifies nondeterministic polynomial time. If a problem is NP‑complete, its solution can be verified in polynomial time on a non‑deterministic Turing machine. This implies that while an NP‑complete problem is solvable and verifiable, its solution or verification can take a very long time to compute.

    Null

    A construct used to represent missing information. Nulls are not values, so the term null value is incorrect usage. The truth value of nulls is usually, but not always, UNKNOWN (see Database Design for a list of exceptions).

    Null Filtering Condition

    Any predicate that evaluates to FALSE for nulls is referred to as a null filtering condition, or NFC.

    NUPI

    NonUnique Primary Index

    A Primary Index that is not uniquely constrained. NUSIs are often used to position rows from multiple tables on the same AMP to facilitate join operations on those tables.

    See Database Design for details.

    NUSI

    NonUnique Secondary Index

    An AMP‑local Secondary Index designed to be used for set (multirow) selection rather than single row selection.

    See Database Design for details.

    O

    Object Use Counts

    Object use counts are a measure of the usage of various database objects in a Teradata Database system.

    Teradata Database maintains 2 types of object use counts.

  • User object use counts
  • These counts track the usage of various database objects from the time they were last reset. You can reset these counts at intervals of your choice using either the ClearUserUseCount system macro or the ClearUserStatCount macro.

  • System object use counts
  • These counts track the usage of system-selected database objects. Only Teradata Database can reset these counts. The reset interval depends on the optimizations for which the counts are used.

    OCC

    Optimistic Concurrency Control. An experimental method of transaction concurrency control that does not use locking. Commercial RDBMSs do not use OCC.

    See Chapter 6: “Transaction Processing and Locking” for further information.

    OCES

    Optimizer Cost Estimation Subsystem.

    See “Cost Optimization” on page 304 for details.

    ODBC

    Open DataBase Connectivity.

    A de facto standard API for communicating between client applications and relational databases using SQL.

    The ANSI SQL standard API, referred to as CLI (sometimes as SQL/CLI), or Call‑Level Interface, is based on the ODBC specification. Beginning with ODBC version 3.0, the ODBC standard and the ANSI SQL Call Level Interface standards became identical.

    OLTP

    OnLine Transaction Processing

    Operation Tree

    The parse tree created by the Optimizer by transforming the ResTree for an SQL request. A synonym for White Tree.

    OptApply

    The PE query processing component that takes the plastic steps produced by the Generator and produces Concrete Steps after applying any data parcels carried by the request from a USING request modifier (see SQL Data Manipulation Language).

    Steps packaging passes the concrete steps to the Dispatcher for assignment to the AMPs.

    See “OptApply” on page 67 for details.

    Optimizer

    The PE query processing component that takes the ResTree´ produced by Query Rewrite Subsystem and creates optimized access and, when necessary, join plan for the query.

    The Parse Tree output of the Optimizer is known as the White Tree. which then becomes the input to the Generator or, if the optimized request is prefaced by the EXPLAIN request modifier (see “Chapter 5 Interpreting the Output of the EXPLAIN Request Modifier” and “EXPLAIN Request Modifier” in SQL Data Manipulation Language), is converted to a textual translation of the plan.

    In this case, the system adds additional costing information about various operations the Optimizer does not cost in order to both make the EXPLAIN text output more comprehensive and to provide information for the benefit of the Teradata dynamic workload management software, which can process that data with its Query Estimator tool to estimate the resources required to process a given query or query workload.

    The Generator and OptApply are also components of the Optimizer.

    See Chapter 1: “Request Parsing” and Chapter 2: “Query Rewrite and Optimization” for details.

    Optimizer Environment Variable

    One of several possible measures of hardware configuration information such as the number of CPUs and vprocs per node; maximum, minimum, and average number of MIPS per CPU; number of nodes per clique; number of disk arrays per clique, and so on expressed in unitless decimal values. These variables are components of the method used to determine various optimizer cost factors.

    Optimizer Weights and Scales

    A set of weighting of CPU, disk, and network contributions to optimizing a request plus various scaling factors that are used to normalize disk array and instruction path length contributions expressed in unitless decimal values.

    Outer Join

    A join between 2 tables that retains and reports joined rows whether they match specified conditions or not. Teradata Database reports unmatched columns as nulls.

    P

    PA

    See Primary AMP (PA) Index.

    Parameterized PK Request

    A request that contains only the following:

  • Parameterized PK statements
  • Null statements
  • See SQL Data Manipulation Language for the definition of null statement.

    See “Peeking at Parameterized Values in the Data Parcel” on page 28 for details.

    Parameterized PK Statement

    A single SQL statement with a set of USING variables that are used in equality predicate conditions on a nonpartitioned primary index, Partitioned Primary Index, or USI of a single table with no ORed conditions.

    See “Peeking at Parameterized Values in the Data Parcel” on page 28 for details.

    Parser

    The PE‑based software system responsible for parsing, optimizing, and dispatching SQL requests.

    See Dispatcher, Generator, OptApply, Optimizer, Query Rewrite Subsystem, Resolver, and Syntaxer for details.

    Also see “The SQL Parser” on page 16.

    Parse Tree

    A tree data structure that maps 1:1 with an SQL request submitted to the Parser. Also known as a SynTree. The initial parse tree for a request is further transformed in stages to a ResTree and then to an Operation Tree before being further transformed into plastic steps.

    See Chapter 1: “Request Parsing” and “Parse Tree Representations of an SQL Request” on page 129 for details.

    Partial Cover

    A Hash Index, Join Index, or nonunique secondary index (NUSI) that covers some, but not all, of the columns referenced in a query. In this case, the system must join to the relevant base table to pick up the remaining columns required to complete the Cover.

    See Database Design for details.

    Partial Redistribution Partial Duplication

    See PRPD.

    PARTITION

    A system‑derived column created dynamically from the Combined Partition Number stored in the row ID field in the row header for each row in a table. The PARTITION column that is returned in the result set of a DML request contains the combined partition number for the row it represents. Teradata Database does not store the combined partition number for a row as an actual column in the row, but derives the number from the internal partition number that is stored or implied in the row ID field of the row header. For a nonpartitioned table, the internal partition number is 0 and is implied by the flag bit settings in the row header.

    See Database Design for details.

    Partitioned Primary Index

    A Primary Index for a table or join index that first distributes rows to the AMPs as they would be by a nonpartitioned primary index, then distributes them to a set of row and column partitions that are determined by the DBA and specified using a PARTITION BY clause in a CREATE TABLE or ALTER TABLE request (see “CREATE TABLE” and “ALTER TABLE” in SQL Data Definition Language for details).

    Partitioned primary indexes can be partitioned at a single level or at multiple levels.

    A combined partitioning expression is derived by Teradata Database from the individual partitioning expressions. The hash of the primary index orders rows within a combined partition of the combined partitioning expression.

    PPIs are very useful for various types of queries, for example range queries.

    See Database Design for details.

    Partitioning Column

    A column that is a member of the set of partitioning columns. A partitioning column is referenced in one or more of the partitioning expressions specified by a PARTITION BY clause. Note that a partitioning column can also be a primary index column.

    Partitioning Expression

    An optional expression based on a column set from a base table or join index that can be specified in the PARTITION BY clause to determine the placement of rows from that table or join index in a particular Row Partition on an AMP once it has been hashed or otherwise distributed to that AMP.

    PDE

    Parallel Database Extensions

    A virtual machine layer between the Teradata Database software and the Teradata file system and the underlying operating system.

    The PDE presents a common interface to the Teradata Database software that permits the RDBMS and file system to be more easily ported to different operating systems.

    PDF

    See Probability Density Function.

    PE

    Parsing Engine vproc

    The set of software services that controls the query processing and session management components of a Teradata Database.

    Performance Constant

    A specification of the data transfer rate for each type of storage medium and network interconnection supported by Teradata. The values of these constants are used by the Optimizer to determine the best method of processing a given request. These values can be statically modified to reflect new hardware configurations when necessary.

    Physical Row

    A row as seen by the file system and identified by a row ID. A physical row contains a row header that contains the row ID of the physical row, followed by a sequence of bytes. A physical row can have ROW Format, COLUMN Format, secondary index format, table header format, or compressed join index format, or one several special purpose formats. See Database Design for details.

    PI

    See Primary Index.

    Pink Tree

    A partially transformed Red Tree that has not become a completed White Tree.

    See “Query Rewrite” on page 53 and “Query Rewrite” on page 72 for details.

    PK

    See Primary Key.

    Plan Fragment

    The series of AMP steps generated for a Request Fragment. A Dynamic Query Plan for a request consists of 2 or more plan fragments. The Optimizer uses feedback from the execution of a plan fragment except for the last plan fragment of a dynamic plan to determine the next request fragment to process and how to generate and optimize the plan fragment for that next request fragment.

    Plastic Steps

    Plastic steps are, except for statement literals, a data-free skeletal tree of compiled machine language AMP directives derived from the White Tree. The completed plastic steps tree is then passed to the Request Cache and to OptApply for further processing into Concrete Steps.

    See “Definition of Plastic Steps” on page 66 for details.

    Point‑in‑Time Qualifier

    A qualifier for a request made on a temporal table that specifies to return only those rows that meet the period specified by the qualifier. Typical point‑in‑time qualifiers are CURRENT and AS OF.

    PPI

    See Partitioned Primary Index.

    Predicate

    A truth‑valued function that describes either a property of objects or a relationship among objects as represented by a set of variables. In relational database management systems, predicates are also referred to as conditions, search conditions, or terms.

    The purpose of predicates in SQL queries is to enhance selectivity, narrowing the results set for a query to only those rows that are relevant to the desired outcome.

    In SQL, predicates are the specifications made in the WHERE clause of any query (including those specified with AND or OR operators), the ON clause of a join or MERGE request specification, the HAVING clause of an aggregation, or the QUALIFY clause of an ordered analytic function.

    Internally, query optimizers often add predicates to requests in order to create optimization opportunities that would otherwise not be available. This application is derived from work done in the fields of logic and constraint databases.

    Predicate Matching

    A process used by the Optimizer to determine matching predicates in a query and in an index on which statistics have been collected in order to use the statistics collected on the secondary, index predicate to estimate cardinality of the primary, base table predicate. Compare with Expression Mapping.

    Predicate Redundancy

    A redundant Predicate is one that does not contribute additional selectivity to a query because other predicates with equivalent results are specified.

    See “Join Predicate Redundancy Handling” on page 251 for details.

    Primary AMP (PA) Index

    A set of columns in a table or join index that determines the distribution of data rows to AMPs. The AMP to which a row is distributed is based on hashing the values of the specified columns. Unlike a Primary Index, the hash is not used in determining the order of rows on the AMP. A table or join index with a PA must be column partitioned and can also be row partitioned. The partitioning may be multilevel.

    See Database Design for details.

    Primary Index

    An Index that determines the distribution of data rows to AMPs for a table, join index, or hash index, and the order of rows on the AMP. If the table is row-partitioned, the primary index determines the order of rows within a combined partition on an AMP. If the table is column-partitioned, the primary index determines order of the column partition values within a combined partition on an AMP. The distribution and ordering is based on a hashing algorithm.

    A table or join index can have one primary index. A hash index must have a primary index. A table or join index with a primary index can also be partitioned. The partitioning can be column or row partitioning or both, and it may be multilevel.

    Compare with Primary Key.

    See Database Design for details.

    Primary Key

    A set of columns in a table whose values make each row in that table unique.

    Primary keys are a logical, not physical, concept that are often, but not necessarily, used as the primary index for a table when it is physically implemented.

    A table can have multiple candidate keys, but only one primary key. Those candidate keys that are not defined as the primary key for a table are referred to as alternate keys.

    Relationships between primary and foreign keys are often used to establish referential integrity between tables. These relationships are also frequently exploited by the Optimizer to enhance query performance.

    See Database Design for details.

    PRPD

    Partial Redistribution Partial Duplication

    A join strategy for dealing with equality joins of skewed tables that Splits the 2 tables in a join into 4 tables, with the first split containing the rows with the skewed values from table_1 locally and their matching rows from table_2 distributed to all AMPs and the second split containing the remaining non‑skewed rows from table_1 and table_2.

    Teradata Database first joins the rows within each split and then merges them into the final join result.

    For more information, see “Strategies for Joining Skewed Tables on an Equality Join Condition” on page 393.

    Probability Density Function

    A function that describes the relative likelihood of the occurrence of a continuous random variable at a given point in the distribution. The probability density function of the Gaussian probability distribution is the frequently seen bell curve.

    Product Join

    A product join compares every qualifying row from one relation to every qualifying row from the other relation and saves the rows that match the WHERE predicate filter. Because all rows of the left relation in the join must be compared with all rows of the right relation, the system always duplicates the smaller relation on all AMPs, and if the entire spool does not fit into available memory, Teradata Database must read the same data blocks more than once. Reading the same data block multiple times is a very costly operation.

    This operation is called a product join because the number of comparisons needed is the algebraic product of the number of qualifying rows in the 2 relations.

    PROJECT Operator

    A relational algebra operator whose function corresponds to the select list clause of an SQL SELECT request.

    Pseudoindex

    A type of histogram in DBC.Indexes that stores statistics for multiple columns. Single‑column statistics are stored in histograms in DBC.TVFields. See Data Dictionary for information about these dictionary tables and the system views you can use to query them.

    Q

    QCD

    Query Capture Database. A system‑installed database of tables into which various request information can be captured.

    See Chapter 7: “Query Capture Facility” for details.

    QCF

    Query Capture Facility. A set of tools and tables that can be used to capture various information about SQL requests, which can then be subjected to further analysis.

    See Chapter 7: “Query Capture Facility” for details.

    QITS

    Queue Insertion TimeStamp

    A required, user‑defined column that must be defined for all queue tables.

    QSN

    Queue Sequence Number

    A useful, but not required, column that can be defined for queue tables.

    Query Block

    In the context of Query Rewrite, any view, a SELECT request or subquery, or an INSERT, DELETE, UPDATE, ABORT, ROLLBACK, CREATE TABLE AS, or MERGE request is considered to constitute a query block.

    Query Plan

    A query plan is the composite Access Plan and Join Plan generated by the Optimizer. See Operation Tree or White Tree.

    A query plan is either a static plan or a Dynamic Query Plan.

    Query Rewrite Subsystem

    The PE query processing component that takes the ResTree produced by the Resolver and rewrites, or transforms, it in such a way that it is both semantically equivalent and runs faster. Query Rewrite then passes the revised ResTree, called ResTree´, to the Optimizer.

    See “Query Rewrite” on page 53 and “Query Rewrite” on page 72 for details.

    R

    RDBMS

    Relational Database Management System

    A database management system based on relational set theory and the theorems, axioms, and operators provided by first order predicate logic. The set theoretic and logical foundation for an RDBMS provide a mathematically sound, predictable set of tools for managing data.

    READ Lock

    Placed in response to a SELECT request.

    Archive requests can also place a HUT READ Lock or a HUT GROUP READ Lock on database resources.

    A READ lock restricts access by users who require EXCLUSIVE or WRITE locks.

    See “Teradata Database Locking Levels and Severities” on page 687 for details.

    Also see ROW RANGE Lock.

    Red Tree

    The version of the parse tree produced by the Resolver. Synonym for ResTree.

    The version of the red tree produced by the Resolver is further processed by Query Rewrite, which produces a rewritten, but semantically identical, red tree that it passes to the Optimizer.

    See “Query Rewrite” on page 53 and “Query Rewrite” on page 72 for details.

    Red Tree´

    The transformed version of the Red Tree produced by Query Rewrite. Synonym for ResTree´.

    Red Tree´ is semantically identical to Red Tree, but runs faster because it has been rewritten to eliminate unnecessary operations and to convert other operations to simpler or more explicit forms.

    See “Query Rewrite” on page 53 and “Query Rewrite” on page 72 for details.

    Regular Row

    A table Row stored as a single Physical Row with ROW Format.

    Relation

    The combination of a heading and Tuple body (in which the Tuples are distinct) defined over a common set of Attributes.

    A relation is sometimes said to n-ary, where n is the number of Attributes (also known as the Arity or Degree of the relation).

    Tables depict representations of n‑ary relations; they are not the same thing as n‑ary relations. Confusion over this distinction often leads people to say that relations are flat, or 2‑dimensional, when they are, in fact, n‑dimensional. More formally, “if relation r has n Attributes, then each Tuple in r represents a point in a certain n‑dimensional space (and the relation overall represents a set of such points)” Date (2005, p. 47). When relations are transferred from the logical domain to the physical domain, they are usually called tables; however, in query optimization the term is often used to describe any table, view, hash index, join index, or spool file.

    Releasing a Join

    Conversion from INNER JOIN syntax to comma syntax is referred to as releasing a join.

    Remote Nested Join

    A remote nested join is used for the case in which a WHERE condition specifies a constant value for a unique index of one table, and the conditions might also match some column of that single row to the primary or secondary index of a second table.

    Request

    One or more SQL statements submitted to Teradata Database as a single unit of work.

    Requests are the semantic elements of Teradata SQL, in contrast to Statements, which are its syntactic elements.

    Request Fragment

    An individual component of a complete request that is planned and executed incrementally by the Incremental Planning and Execution (IPE) framework.

    Residual Statistics

    Statistics that have been collected in the past. The term does not necessarily imply that such statistics are stale, though it suggests that they might be. The Optimizer applies various evaluation methods internally to determine whether existing statistics are stale or not.

    See Stale Statistics for a definition of what determines whether existing statistics are stale or not.

    Resolver

    The PE query processing component that takes the SynTree produced by the Syntaxer and fleshes it out with information about any required data conversions and security checks, adds column names and notes any underlying relationships with other database objects, and then passes the more fleshed out tree, now known as the ResTree, to Query Rewrite Subsystem.

    See “Resolver” on page 48 for details.

    ResTree

    The version of the parse tree produced by the Resolver. Synonym for Red Tree.

    See “Resolver” on page 48 for details.

    ResTree´

    The transformed version of the Red Tree produced by Query Rewrite.

    ResTree´ is semantically identical to ResTree, but runs faster because it has been rewritten to eliminate unnecessary operations and to convert other operations to simpler or more explicit forms.

    See “Query Rewrite” on page 53 and “Query Rewrite” on page 72 for details.

    RI

    Referential Integrity

    A method of ensuring that no parent‑child relationship data is ever orphaned in a relational database. Referential integrity uses the parent‑child relationships between a Primary Key and a Foreign Key to prevent child table rows from ever being orphaned from deleted parent table rows.

    Teradata Database supports 3 different kinds of relational integrity constraints:

  • Referential Integrity constraint
  • This is the standard RI constraint defined by the ANSI SQL standard.

  • Batch Referential Integrity constraint
  • This is a special Teradata Database form of RI that is less expensive to enforce in terms of system resources than standard referential integrity because it is enforced as an all‑or‑nothing operation (the entire transaction must complete successfully) rather than on a row-by-row basis, as standard referential integrity is checked.

  • Referential Constraint
  • This is a special Teradata Database form of RI, sometimes informally referred to as soft RI, that specifies constraints the Optimizer can use to optimize queries, but which are not enforced by the system.

    Temporal Relationship Constraints are a special kind of Referential Constraint that can be defined between a Valid Time or bitemporal parent table and a non‑temporal or Transaction Time child table. For more information, see ANSI Temporal Table Support and Temporal Table Support.

    The Optimizer often uses referential integrity constraints to enhance query performance.

    See Database Design for details.

    Right Relation

    The second relation participating in a join step. The right table is the second relation listed in the EXPLAIN text of the join. Compare with Left Relation.

    Rolling Column

    In the context of extrapolating cardinality estimates, a rolling column is one that has a constant number of rows per value and a varying number of unique values. Examples of rolling columns are those having a DATE or TIMESTAMP data type. For these columns, the demographics of existing data never changes, and only new data can add new distinct values to the column. Compare with Static Column.

    Root Block

    Teradata Database uses the root block of a Hilbert R‑tree to maintain the context for the entire R‑tree. There is only one root block per AMP-local instance of a Hilbert R‑tree.

    Row

    A set of tuple values or nulls for the attributes (columns) of a table. The number of columns is fixed for a table. This is a logical concept.

    A table row is represented as a Regular Row for an object that is not Column Partitioned.

    A table row is split across Column Partitions for a Column Partitioned object. The value set from the table row for a Column Partition are represented in a Container if the Column Partition has a COLUMN Format or a Subrow if the Column Partition has a ROW Format.

    ROW Format

    A format for a Physical Row, Regular Row, or Subrow that consists of row length, row ID, flag byte, presence bits, and a fixed number of column values. This is the traditional Teradata row format.

    Compare with COLUMN Format.

    Row ID

    The row ID makes each row in a table uniquely identifiable even if it otherwise duplicates one or more other rows in the table.

    A row ID includes an internal partition number and is 8-bytes logically, but can be stored as 0 bytes for a nonpartitioned table or join index and as 2-bytes for 2-byte partitioning.

    For NoPI, the 64-bits are split up as 20 bits of hash bucket and 44 bits of uniqueness instead 32 bits of hash value and 32 bits of uniqueness.

    When a row is inserted into a primary‑indexed table, the file system stores the 32‑bit row hash value of its primary index in place with the column data for the row.

    Because row hash values are not necessarily unique, the file system generates a unique 32‑bit numeric value for each row called the Uniqueness Value. Teradata Database appends each Uniqueness Value it generates to its partner row hash value, forming a unique row ID that enables rows having the same Rowkey value to be distinguished from one another.

    Row ID Join

    The row ID join is a special form of the Nested Join. The Optimizer selects a row ID join instead of a nested join when the first condition in the query specifies a literal for the first relation. This value is then used to select a small number of rows which are then equijoined with a secondary index from the second relation.

    Rowkey

    The internal partition number and hash code portion of a row ID for a primary‑indexed table or join index.

    The internal partition number is composed of the result of passing the partitioning column values of the row to the combined partitioning expression defined for a PPI table and mapping to an internal partition number as needed. If a relation is not partitioned, then its internal partition number is 0.

    The hash code is the result of passing the primary index column values of a row through the Teradata hashing algorithm.

    Rowkey‑Based Merge Join

    As for single‑level partitioning, a rowkey‑based merge join for multilevel partitioning requires equality conditions on all the primary index columns and partitioning columns of the 2 relations. This is true for both character and non‑character PPIs.

    To be eligible for a rowkey‑based merge join, both relations must also have the same partitioning, which again is true for both character and non‑character PPIs. Otherwise, one of the relations must be spooled and partitioned to impose equal partitioning between the two.

    Rowkey‑based merge joins are generally not feasible for join operations that involve a PPI table that has a large number of partitions.

    Row Partition

    An AMP‑based cluster of table rows that share the same value, or row partition number, for their evaluation by the Partitioning Expression for that table.

    Each row is inserted into a row partition based on the evaluation of the user‑defined partitioning expression for the table. Rows are first hashed to an AMP and, if the table or join index has a primary index, they are then stored within each row partition on that AMP in the order of their hash value and uniqueness.

    See Database Design for details.

    Row Partition Elimination

    An automatic optimization in which the Optimizer determines, based on query conditions and a partitioning expression, that some row partitions for that partitioning expression cannot contain qualifying rows, and causes those row partitions to be skipped. Row partitions that are skipped for a particular query are called eliminated row partitions (see Eliminated Partition). When multiple partitioning expressions are defined on a table or join index, the system can combine row partition elimination at each of the levels to further reduce the subsets of data that need to be scanned.

    If a table or join index also column partitioning, column partition elimination can further reduce the subset of data that needs to be scanned. Generally, the greatest benefit of partitioning is obtained from row partition elimination.

    Also see the following glossary definitions: Delayed Row Partition Elimination, Dynamic Row Partition Elimination, and Static Row Partition Elimination.

    See “Row Partition Elimination” on page 317 for details.

    Row Partition Number

    An identifier computed by the partitioning expression for a partitioned table or join index and used to uniquely identify each row partition for that partitioning level of the partitioned table or join index.

    ROW RANGE Lock

    This is the internal implementation of the HUT GROUP READ Lock. The severity of a ROW RANGE lock is the same as that of a READ lock.

    R‑tree

    See Hilbert R‑tree.

    S

    Satisfiability

    In formal logic, a formula is said to be satisfiable with respect to a class of interpretations if it is possible to find an interpretation that makes the formula true. The problems of Satisfiability and Transitive Closure are inherently related (see “Predicate Simplification” on page 94 for details).

    Search Condition

    See Predicate.

    Secondary Index

    A vertically partitioned subset of base table columns used to facilitate data manipulation operations.

    Unlike the Primary Index, which is stored in‑line with the row it indexes, secondary indexes are stored in separate subtables that must be maintained by the system. Secondary index subtables consume disk space, so you should monitor your queries periodically using EXPLAIN modifiers to determine whether the Optimizer is using any of the secondary indexes you designed for them. If not, you should either drop those indexes or rewrite your queries in such a way that the Optimizer does use them.

    Teradata does not define the term secondary index in the same way it is commonly used to describe an unclustered index in an indexing system based on B+ trees.

    There are 2 types of secondary index: USI and NUSI.

    See Database Design for details.

    Select List

    The list of columns, expressions, or both to be projected from a set of tables, views, or both in a SELECT request or Subquery. Sometimes referred to as a project list or projection list.

    SELECT Operator

    A relational algebra operator whose function corresponds to the WHERE clause of SQL.

    Selectivity

    A measure of the ability of a predicate or index to return a highly discriminating subset of rows from a table. The higher the selectivity, the fewer rows retrieved.

    A predicate or index that retrieves many rows is said to have low selectivity. By definition, a predicate or index with low selectivity typically accesses more than one row per data block from the table on which it is defined.

    Older definitions of low selectivity stated that a predicate or index with low selectivity accesses more than ten percent of the rows in the table on which it is defined, but the greatly increased size of newer data block configurations makes this definition obsolete, because access rates of 1.0 or even 0.1 percent frequently occur for predicates and indexes with low selectivity.

    A predicate or index that retrieves few rows is said to be highly selective. A highly selective predicate or index is one that does not access all of the data blocks for the table on which it is defined. Older definitions stated that a highly selective predicate or index typically accesses fewer than ten percent of the rows in the table on which it is specified or defined, but this definition is no longer accurate for the same reasons that the old definition of low selectivity is obsolete. The more selective a predicate or index is, the more useful it is for enhancing performance.

    Self‑Join

    A self‑join is a join operation where a table or view is joined to itself.

    Semaphore

    A protected variable that provides a simple abstraction for controlling access by multiple processes to a common resource in a parallel programming environment.

    Semijoin

    A join operation in which all data that is unnecessary for answering a query is removed by the relational operations Select and Project before the join is considered. More specifically, a semijoin is a join for which rows from only one table in the join are returned to the requestor. The act of pushing projections in the Query Rewrite Subsystem is also referred to as a semijoin.

    In SQL, a semijoin is written using a subquery and a Connecting Term.

    Teradata Database refers to semijoins more specifically as either Inclusion Joins or Exclusion Joins, and supports them for both Product Join and Merge Join types.

    Serializable

    The property of concurrent database accesses by transactions such that any arbitrary serial execution of those transactions preserves the integrity of the database is called serializability. Isolation was originally a synonym for serializable, but its usage in the ANSI SQL standard has muddied things somewhat. See “Serializability Defined” on page 670 for details.

    Violations of serializability can occur with DML operations on temporal tables that use Current semantics or that specify the CURRENT VALIDTIME request modifier whether the session transaction isolation level is set to SERIALIZABLE or not. See Temporal Table Support for details.

    SHARE Lock

    SHARE lock is a deprecated synonym for READ Lock.

    SI

    See Secondary Index.

    SIAM

    Society for Industrial and Applied Mathematics (http://www.siam.org). The leading US-based professional society for applied mathematics professionals.

    SIG

    Special Interest Group.

    SIGACT

    ACM Special Interest Group on Algorithms and Computation Theory, the ACM SIG devoted to theoretical computer science (including theoretical issues of importance to database management, complexity theory, and formal logic) (http://www.sigact.org/).

    SIGART

    ACM Special Interest Group for ARTificial Intelligence (http://www.sigart.org/).

    SIGFIDET

    ACM Special Interest Group for FIle DEscription and Translation, the predecessor organization to ACM SIGMOD.

    SIGKDD

    ACM Special Interest Group for Knowledge Discovery and Data Mining (http://www.kdd.org/)

    SIGMOD

    ACM Special Interest Group on Management of Data, the ACM SIG devoted to database management issues (http://www.sigmod.org/).

    Single‑Column Partition

    A Column Partition with only one column of a table row.

    Single‑Column Partition Value

    The value of the column in a Single‑Column Partition projected from a table row.

    If the Single‑Column Partition is specified explicitly or is system-determined to have a COLUMN Format, one or more single-column-partition values can be represented in a Container.

    If the Single‑Column Partition is specified or system-determined to have a ROW Format, the column partition is represented using Subrows and the single-column-partition value is the column value in the Subrow. There is one single-column-partition value represented per Subrow.

    Compare with Multicolumn Partition Value.

    Single‑Level Partitioning

    A partitioning scheme with one level of partitioning, which is defined by a single partitioning expression.

    Single‑Window Merge Join

    A single‑window merge join is very similar to a Sliding‑Window Merge Join except that the number of populated partitions after static or delayed row partition elimination has been applied for each table in the join is small enough to be able to handle all the partitions at once. This could be either a join of a PPI table to a PPI table or a PPI table to a spool.

    The 2 relations must be joined with equality conditions on the primary index and there must be a limited number of participating partitions. The number of participating partitions is estimated based on the estimated number of populated partitions after any Static Partition Elimination. The maximum number of combined partitions processed in as a single set is determined in the same way as for single‑level partitioning. This calculation is based on the setting of the DBS Control flag PPICacheThrP.

    Skew

    A measure of the asymmetry of the distribution of a set of attribute values or their cardinalities.

    Formally, skewness is the third moment of the Probability Density Function for a population of attribute value counts. The first 2 moments of a PDF are its mean and standard deviation, respectively, while the fourth moment is kurtosis.

    With respect to skew in parallel databases, there are several possible types of skew:

  • Attribute value skew refers to skew that is inherent in the data. An example might be a column that can have only 2 values such as TRUE or FALSE.
  • Partition skew refers to skew that results from an uneven distribution of data across the AMPs.
  • The difference in usage is apparent from the context. As used in this manual, the term usually refers to the partition skew that occurs when the primary index for a table is defined on a column set that is highly nonunique.

    See “Interval Histograms” on page 162 for further information.

    Sliding Window

    In a Windowed Read, the sliding window is the set of partitions that is currently being read, and which can slide to the next set of contiguous partitions after all rows in the current window have been processed. The partition in the sliding window that is currently being read from is transparent to the Teradata client.

    Sliding‑Window Merge Join

    A form of Merge Join that uses a Sliding Window algorithm.

    Slow Path Local Nested Join

    The following list documents the process applied by the slow path nested join algorithm.

    1
    Read each row from the left relation.
    2
    Evaluate each left relation row against the right relation index value.
    3
    Retrieve the right relation index rows that correspond to the matching right relation index entries.
    4
    Retrieve the row IDs for the right relation rows to be joined with left relation rows from the qualified right relation index rows.
    5
    Read the right relation data rows using the retrieved row IDs.
    6
    Produce the join rows.
    7
    Produce the final join using the left relation rows and the right relation row IDs.

    Specific Plan

    An optimizer plan that is based on having peeked at parameterized USING values during query parsing.

    Compare with Generic Plan.

    See “Peeking at Parameterized Values in the Data Parcel” on page 28 for further information.

    Split

    For a PRPD join strategy, Teradata Database segregates the rows from the 2 tables into 2 splits. A split in a PRPD join strategy is either a portion of skewed rows from one of the tables and their matching rows from the other table or the remaining non‑skewed rows from both tables being equijoined. Teradata Database first joins the rows within each split and then merges the result rows from the joins within each split into the final set of joined rows. For more information, see “Strategies for Joining Skewed Tables on an Equality Join Condition” on page 393.

    SQL

    The programming language used to create relational database objects (Data Definition Language, or DDL), to manipulate their contents (Data Manipulation Language, or DML), and to define their security attributes (Data Control Language, or DCL).

    Now preferably pronounced ess-kew-ell, the language was originally named SEQUEL (Structured English QUEry Language) and was pronounced the same way as it was spelled. According to the ANSI SQL standard, SQL does not stand for Structured Query Language, as is commonly thought. The name is not an abstraction and does not represent anything other than the characters S, Q, and L. As Hugh Darwen states it, “SQL is just a name and no great significance attaches to that name” (Darwen, 2009, page 230, note i).

    Stale Statistics

    Interval histogram statistics that no longer represent an accurate description of the column sets on which they were originally collected.

    See “Optimal Times to Collect or Recollect Statistics” on page 192 and “Stale Statistics” on page 273 for further information.

    Statement

    Statements are the syntactic elements of Teradata SQL, while Requests are its semantic elements.

    Statement Independence

    A feature in which a single error in a multistatement request does not rollback the entire request, but only the erring request.

    Teradata Database also supports statement independence for BTET transactions in Teradata session mode and for multiple SQL requests in a COMMIT‑terminating transaction in ANSI session mode.

    Static Column

    In the context of extrapolating cardinality estimates, a static column is one that has a varying number of rows per value and a constant number of unique values. Compare with Rolling Column.

    Static Query Plan

    A Query Plan generated by the Optimizer for a request based on static information such as collected statistics, Dynamic AMP Samples, and Object Use Counts. Once a static plan is generated, it is executed without modification. A static plan can be a Specific Plan or a Generic Plan plan. Static query plans are not generated by the Incremental Planning and Execution (IPE) framework. Compare with Dynamic Query Plan.

    Static Row Partition Elimination

    Partition elimination undertaken by the Optimizer during the primary optimization process, as opposed to being delayed until the time the finalized plan is built from a cached plan using built‑in function values or USING request modifier variable values for the specific execution of the plan as is done in Delayed Row Partition Elimination.

    See “Static Row Partition Elimination” on page 325 for details.

    Also see Dynamic Row Partition Elimination and Row Partition Elimination.

    String Literal

    Zero or more alphanumeric characters enclosed by APOSTROPHE characters.

    Subquery

    A SELECT … FROM … WHERE expression that is nested within another SELECT … FROM … WHERE expression.

    Subrow

    A physical row formed from a subset of the column values of a table row. The column values are for a set of one or more columns corresponding to the set of columns defined for a column partition that is specified to have ROW Format or that has ROW Format by default. A subrow is a Physical Row with ROW Format. A series of subrows with increasing row ID values represents a table column or set of table columns as a Column Partition.

    Compare with Container.

    Subtable

    A collection of rows stored in data blocks that belongs to a specific element of a table. Common examples of subtables include secondary index subtables and fallback subtables. A given table can have multiple subtables associated with it.

    Synoptic Data Structure

    A data structure that contains summary, or synopsis, metadata.

    In the case of an interval histogram, the object is actually a file structure rather than a data structure, meaning that it is stored on disk, not just manipulated in memory.

    See “Interval Histograms” on page 162 for details.

    Syntaxer

    The PE query processing component that checks the validity of the syntax for the SQL text in a request parcel and then, if the syntax is valid, converts the text to a parse tree.

    See “Syntaxer” on page 45 for details.

    SynTree

    The skeletal parse tree for an SQL request created by the “Syntaxer.”

    See Chapter 2: “Query Rewrite and Optimization” for details.

    Also see Black Tree.

    T

    Term

    See Predicate

    Ten Percent Rule

    A guideline for recollecting statistics that recommends recollection whenever the demographics of a table or partition change by 10% or more. For any specific table or join index, this percentage might need to be decreased or increased based on the actual rate of change of the demographics over time, but 10% is a reasonable starting point as a guideline.

    It is possible for the cardinality of a table to grow by more than 10 percent, but for the relative proportion of its rows with particular values not to change. As a result, even a change in the table demographics of this magnitude might not affect the query plan generated by the Optimizer.

    The Ten Percent Rule also applies at the row partition level for row‑partitioned tables. If more than 10 percent of the rows are added to or deleted from a partition, then statistics should be recollected on the index. For partitioned tables, any refreshment operation should include the system-derived PARTITION column.

    TLE

    Target Level Emulation. TLE is a component of Teradata System Emulation Tool that is used to specify random AMP sample (RAS), cost parameters, and cost profiles at various levels for a connected test system. The tool permits you to create multiple emulation scenarios covering a wide range of testing possibilities on a single test system.

    See Teradata System Emulation Tool User Guide for details.

    Transaction

    An ACID unit of work.

    Transaction History

    A model of what the Transaction Scheduler sees formed by interleaving the read and write operations of a set of transactions.

    Transaction Manager

    A mechanism that controls transactional operations in a relational database management system.

    Transaction Scheduler

    A mechanism that controls the concurrent execution of interleaved transactions by restricting the order in which the various read, write, commit, and rollback operations of that interleaved set executes.

    Transitive Closure

    In axiomatic set theory, the transitive closure of a set X is the smallest transitive set that contains X. By generalizing this concept to graph theory, it can be said that with respect to query optimization, transitive closure is the construction of a data structure that makes it possible to answer reachability questions about a query graph. For example, is it possible to get from vertex α of a query graph to vertex δ of the same query graph in one or more hops?

    The problems of Satisfiability and Transitive Closure are inherently related (see “Predicate Simplification” on page 94 for details).

    Trusted Parallel Application

    Frequently abbreviated TPA. An application that Teradata has certified to run safely on Teradata Database. The Teradata Database software itself is a TPA. Such applications are frequently referred to by the initialism TPA.

    TSET

    Teradata System Emulation Tool. A client tool used to copy demographic information such as environmental cost factors and statistics from a production system to a test system in order to facilitate testing by emulating the production system environment.

    See Teradata System Emulation Tool User Guide for details.

    Tuple

    The set of all Attribute values associated with an instance of a Relation is a tuple or, more properly, a tuple value. When tuples are transferred from the logical domain to the physical domain, they are usually called rows.

    U - V

    UDI Counts

    UDI counts track the number of update, delete, and insert operations made on tables since their last reset timestamp. They provide the change in table cardinality from the last time the UDI counts were reset. This information is useful for internal database optimizations, database administrators, and application users.

    Teradata Database maintains 2 types of UDI counts.

  • User UDI counts
  • These counts track the number of update, delete, and insert operations made on tables since they were last reset. You can reset these counts at intervals of your choice using the ClearUserUDICount system macro.

  • System UDI counts
  • These counts track the number of update, delete, and insert operations made on tables since they were last reset. Only Teradata Database can reset these counts.

    UDT

    User‑Defined Type

    A data type defined by someone other than Teradata. UDTs come in 2 variations: Distinct and Structured.

    See SQL Data Definition Language and SQL External Routine Programming for further information about UDTs.

    UCS-2

    Universal Coded Character Set containing 2 bytes

    Unique Join Index

    A join index defined with a unique primary index.

    Unique join indexes can be user‑defined or system‑defined (see “Using a Unique Join Index in the Access Path for a Query” on page 267, and “CREATE JOIN INDEX” in SQL Data Definition Language for more information).

    Uniqueness Value

    A 32‑bit unique value for a primary index or secondary index, or 44-bits for a NoPI object generated by the file system and appended to the Rowkey for each row to make it distinguishable from other rows in the same table that have the same Rowkey value.

    Nonpartitioned NoPI Object

    A table that does not have a primary index and is not partitioned.

    UPI

    Unique Primary Index

    A Primary Index that is uniquely constrained. The rows from a table defined with a UPI tend to be distributed more evenly across the AMPs than rows from a table defined with a NUPI.

    See Database Design for details.

    USI

    Unique Secondary Index

    A Secondary Index designed to facilitate single‑row access.

    PRIMARY KEY and UNIQUE constraints are implemented internally as USIs for all non‑temporal tables (see Database Design for details) and as single‑table join indexes for most temporal tables. For more information, see ANSI Temporal Table Support and Temporal Table Support.

    Value‑Equivalent Row

    Two rows are value‑equivalent if they have exactly the same value in all of their columns except their VALIDTIME or TRANSACTIONTIME columns (or both).

    Volatile Table

    A type of temporary table whose definition and data are only accessible within their own session. Teradata Database does not retain volatile table definitions or data across sessions.

    vproc

    Virtual Process

    The Teradata architecture is based on a common node configuration. Each Trusted Parallel Application node can run one or more PE and AMP vprocs.

    W - X

    W3C

    World Wide Web Consortium.

    W3C is the principal standards organization for the World Wide Web.

    WAL

    Write Ahead Log or Write Ahead Logging.

    A transaction logging scheme maintained by the File System in which a write cache for disk writes of permanent data is maintained using log records instead of writing the actual data blocks at the time a transaction is processed. Multiple log records representing transaction updates can then be batched together and written to disk with a single I/O, thus achieving a large savings in I/O operations and enhancing system performance as a result.

    White Tree

    The optimized parse tree for an SQL request. White tree is a synonym for Operation Tree. A textual version of the white tree is returned for an EXPLAIN modifier submitted for a request.

    See Chapter 2: “Query Rewrite and Optimization” for details.

    Windowed Read

    A simultaneous read from multiple partitions of a row-partitioned PI table in which the rows are presented in row hash order. Also referred to as a single‑window read or sliding‑window read.

    WRITE Lock

    Placed in response to an INSERT, UPDATE, or DELETE request.

    A WRITE lock restricts access by other users (except for applications that are not concerned with data consistency and choose to override the automatically applied WRITE lock by specifying a less restrictive ACCESS lock). See “Teradata Database Locking Levels and Severities” on page 687 for details.

    Various Archive utility commands can also place a HUT WRITE Lock on database resources. See Teradata Archive/Recovery Utility Reference for details.

    Xerces

    A software library used to parse and manipulate XML.

    Xerces implements a number of standard APIs for XML parsing, including DOM.

    See http://xerces.apache.org/ for details.