15.00 - The Concept of Recursion - Teradata Database

Teradata Database SQL Data Definition Language Detailed Topics

Product
Teradata Database
Release Number
15.00
Content Type
Programming Reference
Publication ID
B035-1184-015K
Language
English (United States)

The Concept of Recursion

A recursive view is a named query expression whose definition is self‑referencing (see “Building a Recursive View” on page 489).

The self‑referencing property provides a simple and elegant method for searching a table set by means of iterative self‑join and set operations. You can think of a recursive view as a persistent view definition that includes a retrieval from that same view. (see SQL Data Manipulation Language for documentation of the DML, or temporary, form of recursive view, the WITH RECURSIVE clause).

Recursive views allow SQL queries to define a hierarchical search on a graph of (possibly) unknown depth, making it possible to find all possible paths in the data. In graph theory, this is described as expressing the transitive closure of the graph.

Suppose you need to determine all part numbers for all parts that are components at any level of a given multicomponent part. This type of problem is generally referred to as a bill of material or parts explosion problem, but it is only one particular case of a more general family of problems that also includes organization charts, family trees, the representation of discussion forum threads by a web or electronic mail client, transportation and communication networks, authorization graphs, document hierarchies, and software invocation structures. In mathematics, the issue is referred to as computing transitive closure.

For example, suppose you have the following table definition.

     CREATE TABLE flights (
       source      VARCHAR(40),
       destination VARCHAR(40),
       carrier     VARCHAR(40),
       cost        DECIMAL(5,0));

After populating the table with data, its contents look like this.

 

You must now solve the following problem: Find all the destinations that can be reached from Paris.

The most straightforward method to solve this class of problems is to use recursion, a class of algorithms characterized by self‑reference and a terminal condition that can be either implicit or specified explicitly.

In SQL, recursive queries are typically built using 4 components.

  • A non‑recursive seed statement.
  • A recursive statement.
  • A connection operator.
  • The only valid set connection operator in a recursive view definition is UNION ALL.

  • A terminal condition to prevent infinite recursion.
  • Because this is a type of query that will be posed frequently, you decide to create a recursive view to handle it rather than performing a simple recursive query (see SQL Data Manipulation Language for information about recursive queries). The view definition is as follows.

        CREATE RECURSIVE VIEW reachable_from (source,destination,depth) AS (
          SELECT root.source, root.destination, 0 AS depth
          FROM flights AS root
          WHERE root.source = 'Paris'
        UNION ALL
          SELECT in1.source, out1.destination, in1.depth + 1
          FROM reachable_from AS in1, flights AS out1
          WHERE in1.destination = out1.source
          AND   in1.depth <= 100);

    Because the recursion is inherent in the definition of the view reachable_from, all you need do is perform a simple SELECT statement on the view to get the desired answer, as is demonstrated below.

    In this recursive view definition, the following query is the non‑recursive or seed query.

         SELECT root.source, root.destination
         FROM flights AS root
         WHERE root.source = 'Paris'

    The following query is the recursive component of the recursive view definition.

         SELECT in1.source, out1.destination, in1.depth + 1
         FROM reachable_from AS in1, flights AS out1
         WHERE in1.destination = out1.source;

    Notice that the recursive component refers to itself (reachable_from being the name of the recursive view of which it is the recursive component) in the FROM clause.

    The UNION ALL set operator in the definition unions the results of the non‑recursive and recursive query components to produce the view being defined.

    Having defined the recursive view named reachable_from, all you need to do to answer the question “find all the destinations that can be reached from Paris” is to run the following simple SELECT statement.

         SELECT DISTINCT source, destination
         FROM reachable_from;

    The answer set is as follows.

     

    The concepts of query recursion are best explained using the mathematical construct called fixpoint semantics. It is this concept that provides the theoretical basis for the recursive properties of the SQL language.

    The semantics of query recursion are derived from the following mathematical analogy.

        If F: t  t is a function from some given type to the same type, then a fixpoint of F is a
        value of x such that f(x) = x.

    For example, consider the following example. What are the fixpoints of the following function?

    The fixpoints for this function are the values -1 and 2.

    The same principle can be applied to explain the fixpoint definition for recursive queries as follows.

    A query Q can be visualized as a mathematical function F(x) that takes 1 table, T, as input and computes another table, T1, as its output. A fixpoint of such a query Q is a table T such that Q(T) = T1.

    In ordinary language, a fixpoint is the point at which further iteration through the data fails to make any additional changes to the result. In relational terms, a fixpoint is reached when further attempts to identify more rows to insert into the final result table are unsuccessful because no further rows can be found.

    Unlike the previously described mathematical function that has 2 fixpoint solutions, it is necessary to restrict query Q to have the property that only 1 solution can be computed. Again resorting to mathematical terminology, it is true that a unique fixpoint solution exists iff query Q increases monotonically.

    In mathematics, a monotonic progression is one that either never increases (always decreases or remains the same) or never decreases (always increases or remains the same).
    For SQL, the definition of monotonicity is limited to a progression that never decreases. Explicitly, the number of rows in the working answer set never decreases: either the number stays the same or it grows larger as the recursion continues. Similarly, the field values of rows in the working answer set must never change.

    One such example of violating monotonicity is the use of aggregates in a recursive query. For example, suppose you want to calculate the maximum price of a flight from Los Angeles to Boston using an iterative query. The first iteration computes a direct flight that costs 100 USD. The row in the result set has columns for origin, destination, and MAX(price). If a subsequent iteration produces a row with a new maximum value (such as LAX, BOS, $200), then the row in the answer set must change because 100 USD is no longer the maximum cost for that particular trip. Changing a field value in this situation is a clear violation of monotonicity.

    Various restrictions on SQL recursion ensure that Q can only increase monotonically.

    One way to compute the fixpoint of query Q is the following procedure.

    1 Start with an empty table, represented symbolically as T←  0.

    2 Evaluate query Q over the current contents of table T.

     

    IF the query result is …

    THEN …

    identical to T

    T is the fixpoint.

    not identical to T

    you have the T query result, which is not the fixpoint.

    Go to Step 2.

    In other words, the fixpoint of a recursive query is reached when further efforts to identify more rows to insert into the result can find no such rows.

    Based on the preceding explanation, a recursive query can be described as a query that is divided into the following processing phases.

    1 Create an initial result set.

    2 Perform a recursion that reaches a fixpoint on the initial result set.

    3 Perform a final query on the result set derived from the recursion, the results of which are returned as the final result set.

    To better understand these concepts, try to visualize these execution phases using the simple algorithm for evaluating a recursive query indicated below. The purpose of presenting this algorithm is just to help you to gain a conceptual understanding of one particular recursive query and its execution. The algorithm presented is designed to solve recursive queries, of which recursive views can be regarded as a special case (see SQL Data Manipulation Language for more information about recursive queries), of the following type.

         WITH RECURSIVE reachable_from (destin, cost, depth) AS
           (SELECT root.destin, root.cost, 0 AS depth
            FROM flights AS root
            WHERE root.source = 'Paris'
         UNION ALL
            SELECT result.destin, seed.cost + result.cost, seed.depth + 1 
                   AS depth
            FROM reachable_from AS seed, flights AS result);
            WHERE Seed.Destin = Result.Source 
            AND   Depth <= 20
           )
         SELECT * FROM Reachable_From;

    Note that depth checking code is not necessary for this particular query, but is presented as an example of a safe coding technique that avoids the possibility of infinite recursion, which is an issue you must always be aware of. See “Preventing Infinite Recursion Due To Cyclic Data” on page 498 and “Preventing Infinite Recursion With Acyclic Data” on page 500. Note that Teradata Database uses the smallest possible data type for any variable that is not cast to a specific type. This could present a problem if you were to set a depth counter that was initialized to a value of 0, but also specified a condition that permitted the recursion to loop to a depth of 128. The largest value the default data type for this counter can hold is 127 because when you initialize the variable to 0, the system assigns it a data type of BYTEINT by default.

    To avoid this problem, you can explicitly cast the initial value of the counter to a data type of INTEGER, which supports far larger numbers than you could possibly require to control runaway recursion. See SQL Data Types and Literals for more information about default data types in the Teradata Database.

    The following pseudocode represents the previously mentioned algorithm for solving the recursion problem provided here, where the following assertions apply.

  • T specifies the base relation named flights.
  • The relations named spool_1, spool_2, spool_3, and spool_4 specify the spool files the system generates in the evaluation process.
  • The first statement constitutes the initialization phase of the algorithm.
  • The statements within the L1 block make up the recursion phase of the algorithm.
  • The statement on L2 is third phase of the algorithm, where the last query returns the final result set.
  • The contents of spool_4 are the final result set that is returned to the user.
  •  
    /* Insert all seed rows from the base relation into Spool_1. The function 
    /* function S retrieves all seed rows from the base relation Flights 
    /* (Root) */
     
    Spool_1 = S (T); 
     
    /* The contents of Spool_1 result is appended to Spool_2 */
     
    L1:  Spool_2 = Spool_2 + Spool_1; 
     
         If (NoRowsAppendedToSpool_2) 
     
    /* If no more rows are added in Spool_2, jump to L2 */
     
            GoTo L2;
     
    /* Spool_1 and T are joined based on the join condition specified in  */
    /* the WHERE clause, e.g.,                                            */
    /* WHERE Seed.Destin = Result.Source and Depth <= 20                  */
    /* and the result is appended to Spool_3.                             */
     
         Spool_3 = Spool_1 x T; 
         Spool_1 = 0;       /* Delete all rows from Spool_1               */
         Spool_1 = Spool_3; /* Append the contents of Spool_3 into Spool_1 */
         Spool_3 = 0;       /* Delete all rows from Spool_2                */
         GoTo L1;
     
    L2:  Spool_4 = Spool_2  /* Append the contents of Spool_2 into Spool_4 */
         Spool_2 = 0;       /* Delete the contents of Spool_2              */