15.00 - Database Locks, Two-Phase Locking, and Serializability - Teradata Database

Teradata Database SQL Request and Transaction Processing

Product
Teradata Database
Release Number
15.00
Content Type
Programming Reference
User Guide
Publication ID
B035-1142-015K
Language
English (United States)

Database Locks, Two-Phase Locking, and Serializability

Database Locks Defined

A lock is a device, usually implemented as a form of semaphore, that relational database management systems use to manage concurrent access to database objects by interleaved transactions running in multiple parallel sessions. Among the information contained in a lock is the identity of the database object it is locking, the identity of the transaction holding it, and its level and severity.

You can think of the level and severity of a lock as guarantees made to the transaction, assuring it that the objects for which it has requested locks are isolated from illegal interventions that might otherwise be made by other concurrently running transactions on those objects.

Locks cannot be placed on the individual row partitions of a row‑partitioned table, only on either the entire table or on individual row hash values.

Two-Phase Locking Defined

The solution to the illegal intervention problems is two‑phase locking. Better put, two‑phase locking is a solution to concurrency problems. Quoting Fekete et al. (2005, page 493), “Many database researchers think of concurrency control as a solved problem, since there exists a proven set of sufficient conditions for serializability. The problem is that those sufficient conditions can lead to concurrency control bottlenecks … [The issue of lower isolation level concurrency settings] … poses a new task for the theorist: to discover how to guarantee correctness at the least cost for such lower isolation levels.

The two-phase locking protocol (Eswaran et al., 1976) is the heart of modern transaction processing in relational database management systems. A locking protocol is defined as two‑phase if it does not request additional locks for a transaction after it releases the locks it already holds.

This locking protocol is the foundation of serializability. Without two-phase locking, serializability cannot exist.

The two phases of 2PL are.

  • The growing phase, during which locks on database objects are acquired.
  • The shrinking phase, during which the previously acquired locks are released.
  • This is sometimes called the Two‑Phase Rule.

     

                  Phase Number

                         Action Taken By the Transaction Scheduler

                             1

    Take locks.

                             2

    Release locks.

    In practice, the shrinking phase is a point rather than a period; the Lock Manager drops all locks held by a transaction at the moment that transaction commits or finishes rolling back.

    Depending on the current session mode and the API supporting the application, the Lock Manager releases the locks held by a transaction at the point any of the following events occurs.

  • A commit, whether implicit or explicitly specified.
  • The following are all equivalent to a transaction commit.

  • The system encounters a SEMICOLON character at the end of a line of SQL text.
  • Although the SQL language does not enforce any line‑oriented structural rules, recall that Teradata Database recognizes only requests, not SQL statements per se, and a multistatement request terminates only when a SEMICOLON character is the last character in a line (see “Transactions, Requests, and Statements” on page 799).

  • The system encounters either of the following SQL statements.
  • COMMIT (ANSI session mode only)
  • END TRANSACTION (Teradata session mode only)
  • Whether the COMMIT or END TRANSACTION request also requires a terminating SEMICOLON character depends on the API used by the application (see “Processing Multistatement Requests” on page 800).

  • The system successfully performs a macro or stored procedure.
  • A rollback or abort, whether implicit or explicitly specified.
  • The following are both equivalent to a roll back or abort operation.

  • Teradata Database encounters either of the following SQL requests.
  • ROLLBACK
  • ABORT
  • Whether the ROLLBACK or ABORT request also requires a terminating SEMICOLON character depends on the API used by the application (see “Processing Multistatement Requests” on page 800).

  • Teradata Database aborts the transaction for reasons external to the transaction itself.
  • Unlike database locks, which the system releases automatically, HUT locks are released only when you explicitly specify the RELEASE LOCK option with your Archive/Recovery‑related utility command or if your job successfully submits a RELEASE LOCK utility command after the Archive/Recovery‑related operation has completed. See Teradata Archive/Recovery Utility Reference for details.

    If a transaction must roll back, the 2PL protocol provides the advantage that the locks it holds on the objects whose updates must be undone need not be reacquired because, by definition, they are held until the transaction either commits or completes its rollback.

    This protocol is formally referred to as strict two‑phase locking, because the locks are released only when a transaction commits or rolls back. This action is always undertaken by the system because there are no SQL statements for releasing locks.

    Computing methods tend to be optimizations rather than absolutes, and 2PL is no exception to that generalization. The principal concurrency problems that 2PL prevents are elaborated by five classic problems of transaction processing, usually named as follows.

  • The lost update problem.
  • The uncommitted dependency problem.
  • The inconsistent analysis problem.
  • The dirty read problem.
  • The deadlock problem (see “Deadlock” on page 849 and “Pseudo Table Locks” on page 843).
  • Increased system overhead to administer locking.
  • Decreased concurrency.
  • The impact of the lower concurrency that locks introduce is reduced greatly in a system like Teradata Database that supports multiple levels of locking granularity.

    Consult any standard transaction processing textbook such as Gray and Reuter (1993), Bernstein and Newcomer (1997), or Weikum and Vossen (2002) for more information about these canonical concurrency problems.

    Serializability Defined

    The property of concurrent database accesses by transactions such that any arbitrary serial execution of those transactions preserves the integrity of the database is called serializability. The following definition of serializability is equivalent: although a given set of transactions executes concurrently, it appears to each transaction T in the set that the other member transactions executed either before T, or after T, but not both (paraphrased slightly from Gray and Reuter, 1993).

    Teradata Database ensures serializability for non‑temporal transactions as long as the current isolation level for the session is SERIALIZABLE (see “ACID Properties of Transactions” on page 795 and the SET SESSION CHARACTERISTICS statement in SQL Data Definition Language for details).

    Serializability violations can occur with DML operations that use Current semantics on temporal tables or with DML operations that specify the CURRENT VALIDTIME request modifier whether the session transaction isolation level is set to SERIALIZABLE or not. See Temporal Table Support for details of when and why this can occur and how you can work around it.

    For example, suppose table_A is a checking accounts table and table B is a savings accounts table. Suppose one transaction, Tx1, needs to move 400.00 USD from the checking account of a bank customer to the savings account of the same customer. Suppose another concurrently running transaction, Tx2, performs a credit check on the same bank customer.

    For the sake of illustration, assume the three states of the two accounts seen in the following table.

     

        State

                 Checking Account Amount

                  Savings Account Amount

           1

                                   $900.00

                                  $100.00

           2

                                  $500.00

                                  $100.00

           3

                                  $500.00

                                  $500.00

  • State 1 depicts the initial state of the accounts.
  • State 2 depicts an intermediate condition.
  • Tx1 has withdrawn $400.00 from the checking account, but has not yet deposited the funds in the savings account.

  • State 3 depicts the final state of the accounts.
  • Tx1 has deposited the $400.00 withdrawn from the checking account into the savings account.

    Without two‑phase locking, Tx2 can read the two accounts at state 2 and come to the conclusion that the customer balance is too low to justify permitting the purchase for which the credit check was determining the viability. But the reality is that this customer still has $1000 in the two accounts and should have qualified.

    This is an example of the dirty read phenomenon.

    The condition that 2PL ensures is serializability. When serializability is in force for non‑temporal transactions, the effect of these concurrently running transactions is the same as what would occur if they ran one after the other in series.

    The two possibilities are as follows.

    1 Tx1 runs.

    2 Tx2 runs.

    In this scenario, Tx2 only sees state 3, so the customer passes the credit check.

    1 Tx2 runs.

    2 Tx1 runs.

    In this scenario, Tx2 only sees state 1, so the customer passes the credit check.

    It makes no difference which scenario actually takes place, even in a distributed system, as long as the order is the same everywhere and both result in a consistent state for the database. The important thing to understand from this is that serializability ensures only consistent states for the database, not some particular ordering of transaction execution.

    Two‑phase locking of database objects (see “Two-Phase Locking Defined” on page 804) is sufficient, but not necessary, to ensure serializability of non‑temporal transactions.

    The term serializable was originally a synonym for the ACID property known as Isolation (see “ACID Properties of Transactions” on page 795, “Levels of Isolation Defined” on page 808, and the SET SESSION CHARACTERISTICS statement in SQL Data Definition Language), but the usage of the term isolation in the ANSI SQL standard has been broadened to include the concept of isolation levels, so it is no longer synonymous with serializable, at least not as the terms are used in the ANSI SQL Foundation standard (see International Standard ISO/IEC 9075-2:2008(E), Part 2: Foundation (SQL/Foundation), 2008).

    Serializability describes a correct transaction schedule, meaning a schedule whose effect on the database is the same as that of some arbitrary serial schedule.

    Levels of Isolation Defined

    The ANSI SQL-2008 standard defines isolation level as follows: “The isolation level of an SQL‑transaction defines the degree to which the operations on SQL-data or schemas in that SQL‑transaction are affected by the effects of and can affect operations on SQL-data or schemas in concurrent SQL-transactions (International Standard ISO/IEC 9075-2:2008(E), Part 2: Foundation (SQL/Foundation), 2008, page 124).” Note that isolation level is a concept related to concurrently running transactions and how well their updates are protected from one another as a system processes their respective transactions.

    Serializability defines transaction isolation. A transaction is either isolated from other concurrently running transactions or it is not. If you can achieve greater concurrency at the expense of imperfect isolation by using a lower isolation level, while at the same time being certain that you can avoid concurrency errors, then there is no reason not to run under that isolation level. The result is that you use CPU resources more effectively, while still guaranteeing serializable execution for the specific workload implemented in these transactions.

    The ANSI SQL standard formalizes what it refers to as four isolation levels for transactions. To be precise, this section of the standard defines SERIALIZABLE and three weaker, non‑serializable isolation levels that permit certain prohibited operation sequences to occur. Date and Darwen (1997) point out that in the context in which this usage was first introduced into ANSI SQL (isolation levels were first introduced with the SQL:1992 standard), it was inappropriate because it applied only to single transactions, while serializability, as defined in the database management research literature, is a property of the interleaved execution of a set of concurrent transactions. Furthermore, the original definition of isolation in the ANSI SQL:1992 standard is synonymous with the definition of serializable, so by definition it cannot have levels, particularly levels that are not serializable.

    The standard collectively refers to these prohibited operation sequences as phenomena. Note that the ANSI isolation levels are defined in terms of these phenomena, not in terms of locking, even though all commercial RDBMSs implement transaction isolation using locks. The O’Neils call this isolation level snapshot isolation. It does not exhibit any of the ANSI‑defined phenomena, but is also not truly serializable (see Berenson et al., 1995; Fekete et al., 2004).

    The 3 defined phenomena are dirty read, non‑repeatable read, and phantom read, respectively.

    The non‑serializable isolation levels ANSI defines are the following.

  • Read Uncommitted
  • Read Committed
  • Repeatable Read
  • The following table is taken from the ANSI SQL standard with slight modification (see Table 8 on page 125 of International Standard ISO/IEC 9075-2:2008(E), Part 2: Foundation (SQL/Foundation), 2008). It specifies the phenomena that are or are not possible for each isolation level.

     

          Isolation Level

             Dirty Read

      Non‑Repeatable Read

          Phantom Read

    Read Uncommitted

    Possible

    Possible

    Possible

    Read Committed

    Not possible

    Possible

    Possible

    Repeatable Read

    Not possible

    Not possible

    Possible

    Serializable

    Not possible

    Not possible

    Not possible

    Teradata Database does not support the isolation levels READ COMMITTED and REPEATABLE READ.

    The READ UNCOMMITTED isolation level is implemented using an ACCESS level lock (see “ACCESS” on page 820 and “SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL” in SQL Data Definition Language).

    Berenson et al. (1995) have published a thorough criticism of the ANSI isolation levels scheme, including the fact that it permits a “dirty write,” though it does not even mention this possibility. Furthermore, their paper notes that in spite of the fact that the intent of developing these isolation levels was to define them in an implementation‑free manner, the definitions rely entirely on known locking behaviors.

    Another problem with the ANSI SQL:2011 isolation level definitions is that while the first three levels are defined in terms of phenomena, the SERIALIZABLE level also adds the condition that not only must a transaction history not allow the three phenomena, but it must also be serializable. This is not insignificant, because, as Berenson et al. and Fekete et al. note, the O’Neils had discovered a new form of isolation that does not have any of the ANSI phenomena problems, but which is also not serializable. As mentioned earlier, this new form of isolation is commonly referred to as snapshot isolation.

    The authors have formalized a reinterpretation of the language in the standard in an attempt to express what they think its authors really meant, though later thought suggests this is an unachievable task. O’Neil (2004) notes that there seem to be an “infinite number of Isolation Levels, and this demonstrates there is no finite set of phenomena that can characterize them all.”

    O’Neil reaches this conclusion after having spent years working on the problem and after having thought he had correctly redefined the ANSI isolation levels in such a way that not only were the previously noted problems with the definitions remedied, but the levels themselves had been defined in such a way that they were no longer tied to locking implementations, but could also be implemented in terms of optimistic and multiversion concurrency control schemes (Adya et al., 2000).

    Changing the Transaction Isolation Level for Read‑Only Operations

    Sometimes you might be willing to give up a level of transaction isolation insurance in return for better performance. While this makes no sense for operations that write data, it can sometimes make sense to permit dirty read operations, particularly if you are only interested in gaining a general impression of some aspect of the data rather than obtaining consistent, reliable, repeatable results.

    This is a very important consideration, and it should not be taken lightly. The overall qualitative workload of the session must be examined carefully before making the determination of whether to default to ACCESS‑level locking for read‑only operations or not. For example, consider a session in which a MultiLoad IMPORT job is running. Because of the way MultiLoad updates table rows during its acquisition phase (see Teradata MultiLoad Reference for details), using ACCESS locks to query the target table of the MultiLoad job during an acquisition phase can produce extremely inaccurate result sets. In this case, the results probably would not provide even a reasonable impression of the table data. The mechanism MultiLoad uses for this is to downgrade the default lock for read‑only operations from a severity of READ to a severity of ACCESS.

    Teradata Database provides methods for allowing the possibility of dirty reads at two different levels: the individual request and the session.

     

    TO set the default read‑only locking severity for this level …

    USE this method …

    individual request

    LOCKING request modifier.

    See SQL Data Manipulation Language for details of the syntax and usage of this request modifier.

    session

    SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL statement.

    See SQL Data Definition Language for details of the syntax and usage of this statement.

    Note that the global application of ACCESS locking for read operations when SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL is set to READ UNCOMMITED depends on the setting of the DBS Control parameter AccessLockForUncomRead.

    When the parameter is set FALSE, SELECT operations within INSERT, DELETE, MERGE, and UPDATE requests set READ locks, while when the parameter is set TRUE, the same SELECT operations set ACCESS locks. See Utilities and “SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL” in SQL Data Definition Language for details.