Recursion Overview | CREATE RECURSIVE VIEW | VantageCloud Lake - The Concept of Recursion - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
ft:locale
en-US
ft:lastEdition
2024-12-11
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

A recursive view is a named query expression whose definition is self-referencing. See Building a Recursive View. For information about recursive queries, see Using Recursive Queries.

The self-referencing property provides a method for searching a table set by means of iterative self-join and set operations. You can think of a recursive view as a persistent view definition that includes a retrieval from that same view.

Recursive views allow SQL queries to define a hierarchical search to find all possible paths in the data.

Suppose you need to determine all part numbers for all parts that are components at any level of a multicomponent part. This can also apply to a bill of materials, organization charts, family trees, the representation of discussion forum threads by a web or electronic mail client, transportation and communication networks, authorization graphs, document hierarchies, and software invocation structures.

For example, suppose you have the following table definition.

CREATE TABLE flights (
  source      VARCHAR(40),
  destination VARCHAR(40),
  carrier     VARCHAR(40),
  cost        DECIMAL(5,0));

After populating the table with data, its contents look like this.

Flights
Source Destination Carrier Cost
 Paris  Detroit  KLM 7
 Paris  New York  KLM 6
 Paris  Boston  American Airlines 8
 New York  Chicago  American Airlines 2
 Boston  Chicago  American Airlines 6
 Detroit  San Jose  American Airlines 4
 Chicago  San Jose  American Airlines 2

You must now solve the following problem: Find all the destinations that can be reached from Paris.

The most straightforward method to solve this class of problems is to use recursion, a class of algorithms characterized by self-reference and a terminal condition that can be either implicit or specified explicitly.

In SQL, recursive queries are typically built using these components.
  • A non-recursive seed statement.
  • A recursive statement.
  • A connection operator. The only valid set connection operator in a recursive view definition is UNION ALL.
  • A terminal condition to prevent infinite recursion.

Because this type of query is frequent, you decide to create a recursive view to handle it rather than performing a simple recursive query. The view definition is as follows.

CREATE RECURSIVE VIEW reachable_from (source,destination,depth) AS (
  SELECT root.source, root.destination, 0 AS depth
  FROM flights AS root
  WHERE root.source = 'Paris'
UNION ALL
  SELECT in1.source, out1.destination, in1.depth + 1
  FROM reachable_from AS in1, flights AS out1
  WHERE in1.destination = out1.source
  AND   in1.depth <= 100);

Because the recursion is inherent in the definition of the view reachable_from, you can perform a simple SELECT statement on the view. In this recursive view definition, the following statement is the non-recursive or seed statement.

SELECT root.source, root.destination
FROM flights AS root
WHERE root.source = 'Paris'

The following query is the recursive component of the recursive view definition.

SELECT in1.source, out1.destination, in1.depth + 1
FROM reachable_from AS in1, flights AS out1
WHERE in1.destination = out1.source;

The recursive component refers to itself (reachable_from is the name of the recursive view of which it is the recursive component) in the FROM clause.

The UNION ALL set operator in the definition unions the results of the non-recursive and recursive query components to produce the view being defined.

Having defined the recursive view named reachable_from, you can run the following simple SELECT statement to find all the destinations that can be reached from Paris.

SELECT DISTINCT source, destination
FROM reachable_from;

The answer set is as follows.

Source Destination
   Paris    Detroit
   Paris    New York
   Paris    Boston
   Paris    Chicago
   Paris    San Jose

In mathematics, a monotonic progression is one that either never increases (decreases or remains the same) or never decreases (increases or remains the same). For SQL, the definition of monotonicity is limited to a progression that never decreases. Explicitly, the number of rows in the working answer set never decreases: either the number stays the same or it grows larger as the recursion continues. Similarly, the field values of rows in the working answer set must never change.

One example of violating monotonicity is using aggregates in a recursive query. For example, suppose you want to calculate the maximum price of a flight from Los Angeles to Boston using an iterative query. The first iteration computes a direct flight that costs 100 USD. The row in the result set has columns for origin, destination, and MAX(price). If a subsequent iteration produces a row with a new maximum value, the row in the answer set changes because 100 USD is no longer the maximum cost for that trip. Changing a field value is a violation of monotonicity.

Restrictions on SQL recursion make sure that Q can only increase monotonically.

One way to compute the fixpoint of query Q is the following procedure.
  1. Start with an empty table, represented symbolically as T←0.
  2. Evaluate query Q over the current contents of table T.

    If the query result is identical to T, T is the fixpoint. If not, you have the T← query result, which is not the fixpoint—go to step 2.

That is, the fixpoint of a recursive query is reached when further efforts to identify more rows to insert into the result can find no such rows.

Based on the preceding explanation, a recursive query can be described as a query that is divided into the following processing phases.
  1. Create an initial result set.
  2. Perform a recursion that reaches a fixpoint on the initial result set.
  3. Perform a final query on the result set derived from the recursion to return the final result set.

This algorithm shows one recursive query and its execution. The algorithm presented is designed to solve recursive queries, of which recursive views can be regarded as a special case, of the following type.

WITH RECURSIVE reachable_from (destin, cost, depth) AS
  (SELECT root.destin, root.cost, 0 AS depth
   FROM flights AS root
   WHERE root.source = 'Paris'
UNION ALL
   SELECT result.destin, seed.cost + result.cost, seed.depth + 1
          AS depth
   FROM reachable_from AS seed, flights AS result);
   WHERE Seed.Destin = Result.Source
   AND   Depth <= 20
  )
SELECT * FROM Reachable_From;

Depth checking code is unnecessary for this query, but is an example of a safe coding technique that avoids the possibility of infinite recursion. See Preventing Infinite Recursion Due to Cyclic Data and Preventing Infinite Recursion with Acyclic Data. Vantage uses the smallest possible data type for any variable that is not cast to a specific type. This can be a problem if you set a depth counter that was initialized to a value of 0, but also specified a condition that permitted the recursion to loop to a depth of 128. The largest value the default data type for this counter can hold is 127 because when you initialize the variable to 0, the system assigns it a data type of BYTEINT by default.

To avoid this problem, you can explicitly cast the initial value of the counter to a data type of INTEGER, which supports far larger numbers than you can possibly require to control runaway recursion. See Data Type Conversions.

The following pseudocode represents the previously mentioned algorithm for solving the recursion problem provided here, where the following assertions apply.
  • T specifies the base relation named flights.
  • The relations named spool_1, spool_2, spool_3, and spool_4 specify the spools the system generates in the evaluation process.
  • The first statement constitutes the initialization phase of the algorithm.
  • The statements within the L1 block make up the recursion phase of the algorithm.
  • The statement on L2 is third phase of the algorithm, where the last query returns the final result set.
  • The contents of spool_4 are the final result set that is returned to the user.
    /* Insert all seed rows from the base relation into Spool_1. The function
    /* function S retrieves all seed rows from the base relation Flights
    /* (Root) */
    
    Spool_1 = S (T);
    
    /* The contents of Spool_1 result is appended to Spool_2 */
    
    L1:  Spool_2 = Spool_2 + Spool_1;
    
         If (NoRowsAppendedToSpool_2)
    
    /* If no more rows are added in Spool_2, jump to L2 */
    
           GoTo L2;
    
    /* Spool_1 and T are joined based on the join condition specified in  */
    /* the WHERE clause, for example,                                            */
    /* WHERE Seed.Destin = Result.Source and Depth <= 20                  */
    /* and the result is appended to Spool_3.                             */
    
        Spool_3 = Spool_1 x T;
         Spool_1 = 0;       /* Delete all rows from Spool_1               */
        Spool_1 = Spool_3; /* Append the contents of Spool_3 into Spool_1 */
        Spool_3 = 0;       /* Delete all rows from Spool_2                */
        GoTo L1;
    
    L2:  Spool_4 = Spool_2  /* Append the contents of Spool_2 into Spool_4 */
        Spool_2 = 0;       /* Delete the contents of Spool_2              */