A recursive view is a named query expression whose definition is self-referencing. See Building a Recursive View. For information about the WITH RECURSIVE clause, see Teradata Vantage™ - SQL Data Manipulation Language, B035-1146.
The self-referencing property provides a method for searching a table set by means of iterative self-join and set operations. You can think of a recursive view as a persistent view definition that includes a retrieval from that same view.
Recursive views allow SQL queries to define a hierarchical search to find all possible paths in the data.
Suppose you need to determine all part numbers for all parts that are components at any level of a multicomponent part. This can also apply to a bill of materials, organization charts, family trees, the representation of discussion forum threads by a web or electronic mail client, transportation and communication networks, authorization graphs, document hierarchies, and software invocation structures.
For example, suppose you have the following table definition.
CREATE TABLE flights ( source VARCHAR(40), destination VARCHAR(40), carrier VARCHAR(40), cost DECIMAL(5,0));
After populating the table with data, its contents look like this.
Flights | |||
---|---|---|---|
Source | Destination | Carrier | Cost |
Paris | Detroit | KLM | 7 |
Paris | New York | KLM | 6 |
Paris | Boston | American Airlines | 8 |
New York | Chicago | American Airlines | 2 |
Boston | Chicago | American Airlines | 6 |
Detroit | San Jose | American Airlines | 4 |
Chicago | San Jose | American Airlines | 2 |
You must now solve the following problem: Find all the destinations that can be reached from Paris.
The most straightforward method to solve this class of problems is to use recursion, a class of algorithms characterized by self-reference and a terminal condition that can be either implicit or specified explicitly.
- A non-recursive seed statement.
- A recursive statement.
- A connection operator. The only valid set connection operator in a recursive view definition is UNION ALL.
- A terminal condition to prevent infinite recursion.
Because this is a type of query that will be posed frequently, you decide to create a recursive view to handle it rather than performing a simple recursive query. For information about recursive queries, see Teradata Vantage™ - SQL Data Manipulation Language, B035-1146. The view definition is as follows.
CREATE RECURSIVE VIEW reachable_from (source,destination,depth) AS ( SELECT root.source, root.destination, 0 AS depth FROM flights AS root WHERE root.source = 'Paris' UNION ALL SELECT in1.source, out1.destination, in1.depth + 1 FROM reachable_from AS in1, flights AS out1 WHERE in1.destination = out1.source AND in1.depth <= 100);
Because the recursion is inherent in the definition of the view reachable_from, you can perform a simple SELECT statement on the view. In this recursive view definition, the following statement is the non-recursive or seed statement.
SELECT root.source, root.destination FROM flights AS root WHERE root.source = 'Paris'
The following query is the recursive component of the recursive view definition.
SELECT in1.source, out1.destination, in1.depth + 1 FROM reachable_from AS in1, flights AS out1 WHERE in1.destination = out1.source;
Notice that the recursive component refers to itself (reachable_from being the name of the recursive view of which it is the recursive component) in the FROM clause.
The UNION ALL set operator in the definition unions the results of the non-recursive and recursive query components to produce the view being defined.
Having defined the recursive view named reachable_from, you can run the following simple SELECT statement to find all the destinations that can be reached from Paris.
SELECT DISTINCT source, destination FROM reachable_from;
The answer set is as follows.
Source | Destination |
---|---|
Paris | Detroit |
Paris | New York |
Paris | Boston |
Paris | Chicago |
Paris | San Jose |
In mathematics, a monotonic progression is one that either never increases (always decreases or remains the same) or never decreases (always increases or remains the same). For SQL, the definition of monotonicity is limited to a progression that never decreases. Explicitly, the number of rows in the working answer set never decreases: either the number stays the same or it grows larger as the recursion continues. Similarly, the field values of rows in the working answer set must never change.
One such example of violating monotonicity is the use of aggregates in a recursive query. For example, suppose you want to calculate the maximum price of a flight from Los Angeles to Boston using an iterative query. The first iteration computes a direct flight that costs 100 USD. The row in the result set has columns for origin, destination, and MAX(price). If a subsequent iteration produces a row with a new maximum value (such as LAX, BOS, $200), then the row in the answer set must change because 100 USD is no longer the maximum cost for that particular trip. Changing a field value in this situation is a clear violation of monotonicity.
Various restrictions on SQL recursion ensure that Q can only increase monotonically.
One way to compute the fixpoint of query Q is the following procedure.
- Start with an empty table, represented symbolically as T←0.
- Evaluate query Q over the current contents of table T.
IF the query result is … THEN … identical to T T is the fixpoint. not identical to T you have the T← query result, which is not the fixpoint. Go to Step 2.
In other words, the fixpoint of a recursive query is reached when further efforts to identify more rows to insert into the result can find no such rows.
Based on the preceding explanation, a recursive query can be described as a query that is divided into the following processing phases.
- Create an initial result set.
- Perform a recursion that reaches a fixpoint on the initial result set.
- Perform a final query on the result set derived from the recursion to return the final result set.
This algorithm shows one particular recursive query and its execution. The algorithm presented is designed to solve recursive queries, of which recursive views can be regarded as a special case (see Teradata Vantage™ - SQL Data Manipulation Language, B035-1146 for more information about recursive queries), of the following type.
WITH RECURSIVE reachable_from (destin, cost, depth) AS (SELECT root.destin, root.cost, 0 AS depth FROM flights AS root WHERE root.source = 'Paris' UNION ALL SELECT result.destin, seed.cost + result.cost, seed.depth + 1 AS depth FROM reachable_from AS seed, flights AS result); WHERE Seed.Destin = Result.Source AND Depth <= 20 ) SELECT * FROM Reachable_From;
Note that depth checking code is not necessary for this particular query, but is presented as an example of a safe coding technique that avoids the possibility of infinite recursion, which is an issue you must always be aware of. See Preventing Infinite Recursion Due To Cyclic Data and Preventing Infinite Recursion With Acyclic Data. Note that Teradata Database uses the smallest possible data type for any variable that is not cast to a specific type. This could present a problem if you were to set a depth counter that was initialized to a value of 0, but also specified a condition that permitted the recursion to loop to a depth of 128. The largest value the default data type for this counter can hold is 127 because when you initialize the variable to 0, the system assigns it a data type of BYTEINT by default.
To avoid this problem, you can explicitly cast the initial value of the counter to a data type of INTEGER, which supports far larger numbers than you could possibly require to control runaway recursion. See Teradata Vantage™ - Data Types and Literals, B035-1143 for more information about default data types in the database.
- T specifies the base relation named flights.
- The relations named spool_1, spool_2, spool_3, and spool_4 specify the spools the system generates in the evaluation process.
- The first statement constitutes the initialization phase of the algorithm.
- The statements within the L1 block make up the recursion phase of the algorithm.
- The statement on L2 is third phase of the algorithm, where the last query returns the final result set.
- The contents of spool_4 are the final result set that is returned to the user.
/* Insert all seed rows from the base relation into Spool_1. The function /* function S retrieves all seed rows from the base relation Flights /* (Root) */ Spool_1 = S (T); /* The contents of Spool_1 result is appended to Spool_2 */ L1: Spool_2 = Spool_2 + Spool_1; If (NoRowsAppendedToSpool_2) /* If no more rows are added in Spool_2, jump to L2 */ GoTo L2; /* Spool_1 and T are joined based on the join condition specified in */ /* the WHERE clause, e.g., */ /* WHERE Seed.Destin = Result.Source and Depth <= 20 */ /* and the result is appended to Spool_3. */ Spool_3 = Spool_1 x T; Spool_1 = 0; /* Delete all rows from Spool_1 */ Spool_1 = Spool_3; /* Append the contents of Spool_3 into Spool_1 */ Spool_3 = 0; /* Delete all rows from Spool_2 */ GoTo L1; L2: Spool_4 = Spool_2 /* Append the contents of Spool_2 into Spool_4 */ Spool_2 = 0; /* Delete the contents of Spool_2 */