Geospatial Join Predicates - Teradata Vantage NewSQL Engine - 16.20

Teradata Vantageā„¢ Geospatial Data Types

prodname
Teradata Database
Teradata Vantage NewSQL Engine
vrm_release
16.20
category
Programming Reference
featnum
B035-1181-162K
Geospatial join predicates cause the optimizer to consider executing a nested join to speed access to the data required to satisfy the geospatial query. Supported join predicates, excluding ST_Distance and ST_3DDistance, use any of the following forms (where GeoColExpression contains a column from Table2):
SELECT a, b 
FROM Table1, Table2
WHERE Table1.GeoCol.SupportedGeoMethod(Table2.GeoCol)=1
or
SELECT a, b 
FROM Table1, Table2
WHERE Table1.GeoCol.SupportedGeoMethod(GeoColExpression)=1
or
SELECT a, b 
FROM Table1, Table2
WHERE GeoColExpression.SupportedGeoMethod(Table1.GeoCol)=1
or
SELECT a, b 
FROM Table2 JOIN Table1
      ON 
      Table1.GeoCol.SupportedGeoMethod(Table2.GeoCol)=1
The expressions must be set to evaluate to 1 (true) for these to qualify as join predicates.
Geospatial distance predicates, those using ST_Distance or ST_3DDistance, use any of the following forms:
SELECT a, b
FROM Table1, Table2
WHERE Table1.GeoCol.SupportedGeoDistanceMethod(Table2.GeoCol) < DistanceLiteral


or
SELECT a, b
FROM Table1, Table2

WHERE Table1.GeoCol.SupportedGeoDistanceMethod(GeoColExpression) < DistanceLiteral
or
SELECT a, b
FROM Table1, Table2

WHERE GeoColExpression.SupportedGeoDistanceMethod(Table1.GeoCol) < DistanceLiteral
or
SELECT a, b
FROM Table2 JOIN Table1
ON

Table1.GeoCol.SupportedGeoDistanceMethod(Table2.GeoCol)  < DistanceLiteral
You can use either the < or <= operator to form these predicates.
Syntax Element Description
Table1 and Table2 Tables containing geospatial data columns.
GeoCol An ST_Geometry column.
At least one side of the join, the owner expression or the argument, must be an ST_Geometry column that has a geospatial index.
GeoColExpression An expression that evaluates to an ST_Geometry value and contains one reference to a column in one of the tables being joined.
SupportedGeoMethod Any of the geospatial methods, excluding ST_Distance and ST_3DDistance, listed in the Overview to this discussion.
SupportedGeoDistanceMethod Either ST_Distance or ST_3DDistance.
DistanceLiteral A floating point value representing a distance.

Example: Using Geospatial Join Predicates

The following example shows the use of geospatial join predicates to determine which points in one table lie within polygons defined in a second table. For such a query the Teradata Optimizer would consider creating a nested join to speed data access.

CREATE TABLE DB_TEST.T1
(a INTEGER, b char(20000), GeoCol ST_Geometry)
INDEX (GeoCol);

CREATE TABLE DB_TEST.T2
(pkey INTEGER, buffer char(20000), Geom ST_Geometry)
INDEX (Geom);


INSERT INTO DB_TEST.T1 (1, 'Teradata01', 'POINT(10 20)');
INSERT INTO DB_TEST.T1 (1, 'Teradata02', 'POINT(20 30)');

INSERT INTO DB_TEST.T2 (1, 'Teradata01', 'POLYGON((3 3, 3 8, 8 8, 8 3, 3 3))');
INSERT INTO DB_TEST.T2 (1, 'Teradata02', 'POLYGON((15 15, 15 35, 35 35, 35 15, 15 15))');
SELECT a,b (format 'x(12)'), geocol
FROM T2
INNER JOIN T1 ON T1.GeoCol.ST_WITHIN(T2.Geom)= 1;

          a b            GeoCol
----------- ------------ -----------------
          1 Teradata02   POINT (20 30)

The number of rows inserted into the table and the nature of the geospatial data itself determines whether the Optimizer chooses to use a geospatial index. In this simple example, with only two rows in each table, the geospatial index would probably not be used. With significantly larger tables or more complex shapes, and depending on the nature and configuration of the database, the Optimizer could use the index to speed query processing. If it did use the index, the EXPLAIN would look similar to this, where step 5 shows the geospatial index use:

Explanation
---------------------------------------------------------------------------
  1) First, we lock DB_TEST.T1 for read on a reserved RowHash to
     prevent global deadlock.
  2) Next, we lock DB_TEST.T2 for read on a reserved RowHash to
     prevent global deadlock.
  3) We lock DB_TEST.T1 for read, and we lock DB_TEST.T2 for read.
  4) We do an all-AMPs RETRIEVE step from DB_TEST.T2 by way of an
     all-rows scan with a condition of ("DB_TEST.T2.b > 1") into
     Spool 2 (all_amps), which is duplicated on all AMPs. The size of
     Spool 2 is estimated with no confidence to be 116 rows (1,168,468
     bytes). The estimated time for this step is 0.67 seconds.
  5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
     all-rows scan, which is joined to DB_TEST.T1 by way of Spatial
     index # 4 "Retrieving from CDT indexed column DB_TEST.T1.sp via
     CDT key expression {LeftTable}.sp .ST_MBR ()"extracting row ids
     only. Spool 2 and DB_TEST.T1 are joined using a nested join,
     with a join condition of ("(1=1)"). The result goes into Spool 3
     (all_amps), which is built locally on the AMPs. Then we do a SORT
     to order Spool 3 by field Id 1. The size of Spool 3 is estimated
     with no confidence to be 1 row (10,083 bytes). The estimated time
     for this step is 0.01 seconds.
  6) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of an
     all-rows scan, which is joined to DB_TEST.T1 by way of an
     all-rows scan with a condition of ("DB_TEST.T1.b > 1"). Spool
     3 and DB_TEST.T1 are joined using a row id join, with a join
     condition of ("(sp .ST_WITHIN ({RightTable}.sp ))= 1"). The
     result goes into Spool 1 (group_amps), which is built locally on
     the AMPs. Then we do a SORT to order Spool 1 by the sort key in
     spool field1. The size of Spool 1 is estimated with no confidence
     to be 1 row (95 bytes). The estimated time for this step is 0.05 
     seconds.
  7) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
     -> The contents of Spool 1 are sent back to the user as the result of
     statement 1. The total estimated time is 0.73 seconds.

Using Both an ST_Geometry Column and a Column Expression in a Geospatial Join

The following example shows that one side of a geospatial join, either the owner expression or the argument, can be a column expression that contains a column and evaluates to an ST_Geometry value. The other side of the join must be a geospatial column that has a geospatial index. The example uses the ST_DISTANCE predicate. g2.geo.ST_Buffer(1) is an expression containing a geospatial column from g2 that generates an ST_Geometry object. g1.geo is an ST_Geometry column in g1 that has a geospatial index.
SELECT * FROM geotable g1, geotable2 g2
WHERE g1.geo.ST_Distance (g2.geo.ST_Buffer(1)) < 5
ORDER BY g1.a;

How the Optimizer Determines Nested Join Utilization

The Teradata Optimizer weighs the costs against the benefits of using a nested join strategy for satisfying a query. The Optimizer uses factors such as the following to judge whether to construct a nested join:
  • The cardinality and size of the duplicated table.

    If there is a geospatial index defined on only one of the tables involved in the join operation, the Optimizer will duplicate the non-indexed table.

    If there is a geospatial index on both tables, the optimizer will calculate the costs of duplicating each table and using the index of the other table, and choose the most cost-effective combination.

  • The cardinality and size of the row ID spool.

    The number of rows populating the row ID spool depends on the selectivity of the join.

    The row size of a row lying within the row ID spool depends on the number of columns belonging to the duplicated table, which were referenced in the query.

For more information on how the Teradata Optimizer chooses execution paths for queries, see Teradata Vantageā„¢ SQL Request and Transaction Processing, B035-1142.