Recursion Overview | CREATE RECURSIVE VIEW | Teradata Vantage - The Concept of Recursion - Advanced SQL Engine - Teradata Database

SQL Data Definition Language Detailed Topics

Product
Advanced SQL Engine
Teradata Database
Release Number
17.05
17.00
Published
June 2020
Language
English (United States)
Last Update
2021-01-24
dita:mapPath
jpx1556733107962.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1184
lifecycle
previous
Product Category
Teradata Vantage™

A recursive view is a named query expression whose definition is self-referencing. See Building a Recursive View. For information about the WITH RECURSIVE clause, see Teradata Vantage™ - SQL Data Manipulation Language, B035-1146.

The self-referencing property provides a method for searching a table set by means of iterative self-join and set operations. You can think of a recursive view as a persistent view definition that includes a retrieval from that same view.

Recursive views allow SQL queries to define a hierarchical search to find all possible paths in the data.

Suppose you need to determine all part numbers for all parts that are components at any level of a multicomponent part. This can also apply to a bill of materials, organization charts, family trees, the representation of discussion forum threads by a web or electronic mail client, transportation and communication networks, authorization graphs, document hierarchies, and software invocation structures.

For example, suppose you have the following table definition.

CREATE TABLE flights (
  source      VARCHAR(40),
  destination VARCHAR(40),
  carrier     VARCHAR(40),
  cost        DECIMAL(5,0));

After populating the table with data, its contents look like this.

Flights
Source Destination Carrier Cost
 Paris  Detroit  KLM 7
 Paris  New York  KLM 6
 Paris  Boston  American Airlines 8
 New York  Chicago  American Airlines 2
 Boston  Chicago  American Airlines 6
 Detroit  San Jose  American Airlines 4
 Chicago  San Jose  American Airlines 2

You must now solve the following problem: Find all the destinations that can be reached from Paris.

The most straightforward method to solve this class of problems is to use recursion, a class of algorithms characterized by self-reference and a terminal condition that can be either implicit or specified explicitly.

In SQL, recursive queries are typically built using these components.
  • A non-recursive seed statement.
  • A recursive statement.
  • A connection operator. The only valid set connection operator in a recursive view definition is UNION ALL.
  • A terminal condition to prevent infinite recursion.

Because this is a type of query that will be posed frequently, you decide to create a recursive view to handle it rather than performing a simple recursive query. For information about recursive queries, see Teradata Vantage™ - SQL Data Manipulation Language, B035-1146. The view definition is as follows.

CREATE RECURSIVE VIEW reachable_from (source,destination,depth) AS (
  SELECT root.source, root.destination, 0 AS depth
  FROM flights AS root
  WHERE root.source = 'Paris'
UNION ALL
  SELECT in1.source, out1.destination, in1.depth + 1
  FROM reachable_from AS in1, flights AS out1
  WHERE in1.destination = out1.source
  AND   in1.depth <= 100);

Because the recursion is inherent in the definition of the view reachable_from, you can perform a simple SELECT statement on the view. In this recursive view definition, the following statement is the non-recursive or seed statement.

SELECT root.source, root.destination
FROM flights AS root
WHERE root.source = 'Paris'

The following query is the recursive component of the recursive view definition.

SELECT in1.source, out1.destination, in1.depth + 1
FROM reachable_from AS in1, flights AS out1
WHERE in1.destination = out1.source;

Notice that the recursive component refers to itself (reachable_from being the name of the recursive view of which it is the recursive component) in the FROM clause.

The UNION ALL set operator in the definition unions the results of the non-recursive and recursive query components to produce the view being defined.

Having defined the recursive view named reachable_from, you can run the following simple SELECT statement to find all the destinations that can be reached from Paris.

SELECT DISTINCT source, destination
FROM reachable_from;

The answer set is as follows.

Source Destination
   Paris    Detroit
   Paris    New York
   Paris    Boston
   Paris    Chicago
   Paris    San Jose

In mathematics, a monotonic progression is one that either never increases (always decreases or remains the same) or never decreases (always increases or remains the same). For SQL, the definition of monotonicity is limited to a progression that never decreases. Explicitly, the number of rows in the working answer set never decreases: either the number stays the same or it grows larger as the recursion continues. Similarly, the field values of rows in the working answer set must never change.

One such example of violating monotonicity is the use of aggregates in a recursive query. For example, suppose you want to calculate the maximum price of a flight from Los Angeles to Boston using an iterative query. The first iteration computes a direct flight that costs 100 USD. The row in the result set has columns for origin, destination, and MAX(price). If a subsequent iteration produces a row with a new maximum value (such as LAX, BOS, $200), then the row in the answer set must change because 100 USD is no longer the maximum cost for that particular trip. Changing a field value in this situation is a clear violation of monotonicity.

Various restrictions on SQL recursion ensure that Q can only increase monotonically.

One way to compute the fixpoint of query Q is the following procedure.

  1. Start with an empty table, represented symbolically as T←0.
  2. Evaluate query Q over the current contents of table T.
    IF the query result is … THEN …
    identical to T T is the fixpoint.
    not identical to T you have the T← query result, which is not the fixpoint.

    Go to Step 2.

In other words, the fixpoint of a recursive query is reached when further efforts to identify more rows to insert into the result can find no such rows.

Based on the preceding explanation, a recursive query can be described as a query that is divided into the following processing phases.

  1. Create an initial result set.
  2. Perform a recursion that reaches a fixpoint on the initial result set.
  3. Perform a final query on the result set derived from the recursion to return the final result set.

This algorithm shows one particular recursive query and its execution. The algorithm presented is designed to solve recursive queries, of which recursive views can be regarded as a special case (see Teradata Vantage™ - SQL Data Manipulation Language, B035-1146 for more information about recursive queries), of the following type.

WITH RECURSIVE reachable_from (destin, cost, depth) AS
  (SELECT root.destin, root.cost, 0 AS depth
   FROM flights AS root
   WHERE root.source = 'Paris'
UNION ALL
   SELECT result.destin, seed.cost + result.cost, seed.depth + 1
          AS depth
   FROM reachable_from AS seed, flights AS result);
   WHERE Seed.Destin = Result.Source
   AND   Depth <= 20
  )
SELECT * FROM Reachable_From;

Note that depth checking code is not necessary for this particular query, but is presented as an example of a safe coding technique that avoids the possibility of infinite recursion, which is an issue you must always be aware of. See Preventing Infinite Recursion Due To Cyclic Data and Preventing Infinite Recursion With Acyclic Data. Note that Teradata Database uses the smallest possible data type for any variable that is not cast to a specific type. This could present a problem if you were to set a depth counter that was initialized to a value of 0, but also specified a condition that permitted the recursion to loop to a depth of 128. The largest value the default data type for this counter can hold is 127 because when you initialize the variable to 0, the system assigns it a data type of BYTEINT by default.

To avoid this problem, you can explicitly cast the initial value of the counter to a data type of INTEGER, which supports far larger numbers than you could possibly require to control runaway recursion. See Teradata Vantage™ - Data Types and Literals, B035-1143 for more information about default data types in the database.

The following pseudocode represents the previously mentioned algorithm for solving the recursion problem provided here, where the following assertions apply.
  • T specifies the base relation named flights.
  • The relations named spool_1, spool_2, spool_3, and spool_4 specify the spools the system generates in the evaluation process.
  • The first statement constitutes the initialization phase of the algorithm.
  • The statements within the L1 block make up the recursion phase of the algorithm.
  • The statement on L2 is third phase of the algorithm, where the last query returns the final result set.
  • The contents of spool_4 are the final result set that is returned to the user.
    /* Insert all seed rows from the base relation into Spool_1. The function
    /* function S retrieves all seed rows from the base relation Flights
    /* (Root) */
    
    Spool_1 = S (T);
    
    /* The contents of Spool_1 result is appended to Spool_2 */
    
    L1:  Spool_2 = Spool_2 + Spool_1;
    
         If (NoRowsAppendedToSpool_2)
    
    /* If no more rows are added in Spool_2, jump to L2 */
    
           GoTo L2;
    
    /* Spool_1 and T are joined based on the join condition specified in  */
    /* the WHERE clause, e.g.,                                            */
    /* WHERE Seed.Destin = Result.Source and Depth <= 20                  */
    /* and the result is appended to Spool_3.                             */
    
        Spool_3 = Spool_1 x T;
         Spool_1 = 0;       /* Delete all rows from Spool_1               */
        Spool_1 = Spool_3; /* Append the contents of Spool_3 into Spool_1 */
        Spool_3 = 0;       /* Delete all rows from Spool_2                */
        GoTo L1;
    
    L2:  Spool_4 = Spool_2  /* Append the contents of Spool_2 into Spool_4 */
        Spool_2 = 0;       /* Delete the contents of Spool_2              */