Assume the following abstract SQL query.
SELECT column_1, column_2 FROM table_1 WHERE unique_primary_index = column_value;
This query instructs Teradata Database to retrieve the single row having the unique primary index value column_value and then to display the values for column_1 and column_2 of that row.
Process for Locating a Row Using Its Unique Primary Index
The following process describes how Teradata Database locates the row having the unique primary index value column_value. Teradata Database follows the same process if the primary index is nonunique. The only difference is that the query might retrieve more than a single row in that case because multiple rows could have the same value for column_value.
- The Resolver does a data dictionary lookup to determine the table ID for table_1, then adds it to the parse tree for later use by the Generator.
- The Generator hashes the primary index value column_value, computing its 32-bit row hash value.
- The Generator builds a 3-part message from the following information.
- Table ID for table_1
- The number of bits for the row hash value is always 32, but the number of bits used to represent the hash bucket number and remainder vary depending on the number of hash buckets defined for the system.
IF the system has this many hash buckets … THEN the row hash for column_value is 32 bits wide divided between the hash bucket number and remainder as follows … 65,536
- 16-bit hash bucket number
- 16-bit remainder
- 20-bit hash bucket number
- 12-bit remainder
- Data value for column_value
- The Generator scans the hash map to determine which AMP owns the has bucket the row belongs to.
- The message is inserted into an AMP step, the Dispatcher places it on the BYNET, and sends it point-to-point to the AMP identified by the hash map.
- The file system on the receiving AMP uses the table ID and row hash value as a key to scan its master index for the cylinder number that contains the data block in which the row is stored.
- The file system first determines if the cylinder index is cached.
WHEN the cylinder index is … THEN the file system … cached scans it for the data block. not cached retrieves it from disk and scans for the data block (see stage 8).
- The file system uses the table ID, row hash value, and cylinder number as a key to scan the cylinder index for the data block address known to contain the row being retrieved.
- The file system determines if the data block is cached.
WHEN the data block is … THEN the file system … cached scans it for the row. not cached retrieves it from disk and scans for the row (see stage 10).
- The AMP uses the row hash value and primary index value as a key to scan the data block for the row being retrieved. The AMP checks to see if the row contains the desired values, and if it does, returns the values for column_1 and column_2 to the requestor.
Primary Index Reads
The following flowchart illustrates the flow of a generic primary index read. The flowchart begins at the point that the data block containing the searched row has been located.
Example: Primary Index Read
This example illustrates how the Teradata file system uses the primary index to eliminate synonyms from the returned result by performing a row hash search.
The original query is the following:
SELECT * FROM employee WHERE employee_number = 3755;
At this point in the process, the master index lookup showed that the row representing employee number 3755 is stored on cylinder 169. The cylinder index lookup showed that the row is in the six sector block beginning with sector number 789.
The following graphic illustrates the logical representation of that data block.
The file system must locate the row having a row hash value of 1000 and a primary index data value of 3755. The following process describes how the file system undertakes this final task.
- The file system begins scanning the data block beginning at sector 789.
- The first row found with a row hash value of 1000 has an index data value of 1006. This is a hash synonym for the searched row.
Because this is not the row specified by the WHERE clause predicate, the scan continues.
- The next row found with a row hash value of 1000 has an index data value of 3755.
Because this is the requested row and it was accessed using a UPI, the data block scan terminates and the row is returned to the requestor. Had the index been a NUPI, the file system would have had to read another row to know that no further rows matched.