15.10 - Geospatial Predicates and the Optimizer - Teradata Database

Teradata Database SQL Geospatial Types

prodname
Teradata Database
vrm_release
15.10
category
Programming Reference
featnum
B035-1181-151K

Geospatial NUSIs can be used by the Teradata Optimizer to speed access to geospatial data for queries that use single-table predicates and join predicates. For more information on Geospatial Indexes, see “Geospatial Indexes” on page 18.

The following geospatial methods can be used to form predicates:

  • MBB_Filter
  • MBR_Filter
  • ST_3DDistance
  • ST_Contains
  • ST_Crosses
  • ST_Distance
  • ST_Equals
  • ST_Intersects
  • ST_Overlaps
  • ST_Touches
  • ST_Within
  • Within_MBB
  • Intersects_MBB
  • ST_Relate (if this predicate is configured to execute one of the above methods)
  • Geospatial single-table predicates cause the optimizer to evaluate a geospatial index, if one exists, as a potential access path to the geospatial data. In these circumstances, the index can greatly speed access to the data and execution of the query. Supported single-table predicates use either of the two following forms:

    SELECT a, b FROM TableName
    WHERE TableName.GeoCol.ST_SupportedGeoMethod(GeospatialLiteralExpression)=1

    Note: The expression must be set to evaluate to 1 (true) for this to qualify as a single-table predicate.

    or

    SELECT a, b, FROM TableName
    WHERE TableName.GeoCol IS NULL
     

    Syntax Element

    Description

    TableName

    Table containing a geospatial data column.

    GeoCol

    Geospatial data column defined as one of the ST_GEOMETRY types.

    ST_SupportedGeoMethod

    One of the geospatial methods listed above.

    GeospatialLiteralExpression

    An arbitrary geospatial literal expression that can be folded (simplified) in to a geospatial literal value.

    Examples  

    The following examples show the use of geospatial single-table predicates that enable the optimizer to use a geospatial index for data access.

    CREATE TABLE StatT2
     (a INT,b CHAR(20000),sp ST_Geometry)
     INDEX(sp);

    INSERT StatT2(1, 'Teradata01','Point(10 10 10)');

    The number of rows inserted into the table and the nature of the Geospatial Data itself will play a significant role in whether or not the Optimizer chooses to use the index.

    SELECT a,b(format 'x(10)'), sp
    FROM StatT2 
    WHERE StatT2.sp.ST_WITHIN(CAST(
    'POLYGON((0 0 0, 0 50 50, 50 50 50, 50 0 50, 0 0 0))'
    AS ST_Geometry)) = 1;
     
    SELECT a,b (format 'x(10)'), sp
    FROM StatT2 
    WHERE StatT2.sp.ST_WITHIN(NEW ST_GEOMETRY(
     'POLYGON((0 0 0, 0 50 50, 50 50 50, 50 0 50, 0 0 0))')) = 1;
     
    SELECT a,b (format 'x(10)'), sp 
    FROM StatT2
    WHERE StatT2.sp.ST_WITHIN(NEW ST_GEOMETRY(
     'POLYGON((0 0, 0 20, 20 20, 20 0, 0 0))').ST_INTERSECTION(
     'POLYGON((0 0, 0 15, 15 15, 15 0, 0 0))')) = 1;
     

    The following shows an example of EXPLAIN output showing the optimizer utilizing a geospatial index.

    Explanation
    -----------------------------------------------------------------------
    1) First, we lock JOSHUA.StatT2 for read on a reserved RowHash to
    prevent global deadlock.
    2) Next, we lock JOSHUA.StatT2 for read.
    3) We do an all-AMPs SUM step to aggregate from JOSHUA.StatT2 by way
    of Spatial index # 4 "Retrieving from CDT indexed column
    JOSHUA.StatT2.sp via CDT key expression NEW ST_GEOMETRY (
    'POLYGON((0 0, 0 20, 20 20, 20 0, 0 0))'(VARCHAR(64000), CHARACTER
    SET LATIN, NOT CASESPECIFIC)).ST_INTERSECTION (CAST(('POLYGON((9
    9, 9 18, 18 18, 9 9))'(VARCHAR(64000), CHARACTER SET LATIN, NOT
    CASESPECIFIC)) AS SYSUDTLIB.ST_Geometry)).ST_MBR ()"with a
    residual condition of ("(JOSHUA.StatT2.sp .ST_WITHIN (NEW
    ST_GEOMETRY ('POLYGON((0 0, 0 20, 20 20, 20 0, 0
    0))'(VARCHAR(64000), CHARACTER SET LATIN, NOT
    CASESPECIFIC)).ST_INTERSECTION (CAST(('POLYGON((9 9, 9 18, 18 18,
    9 9))'(VARCHAR(64000), CHARACTER SET LATIN, NOT CASESPECIFIC)) AS
    SYSUDTLIB.ST_Geometry))))= 1"), and the grouping identifier in
    field 1027. Aggregate Intermediate Results are computed globally,
    then placed in Spool 3. The size of Spool 3 is estimated with no
    confidence to be 3 rows (63 bytes). The estimated time for this
    step is 0.07 seconds.
    4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
    an all-rows scan into Spool 1 (group_amps), which is built locally
    on the AMPs. Then we do a SORT to order Spool 1 by the sort key
    in spool field1. The size of Spool 1 is estimated with no
    confidence to be 3 rows (96 bytes). The estimated time for this
    step is 0.04 seconds.
    5) Finally, we send out an END TRANSACTION step to all AMPs involved
    in processing the request.
    -> The contents of Spool 1 are sent back to the user as the result of
    statement 1. The total estimated time is 0.11 seconds.

    How the Optimizer Determines Index Utilization

    The Teradata Optimizer weighs the cost and benefit of using the alternative data paths to execute a query.

    If a table has a secondary index, the Optimizer can use the index when executing a query against that table, rather than performing a more costly full table scan. By using the index, the Optimizer can reduce the number of rows that must be considered for predicate qualification, usually resulting in faster access to the necessary data.

    Although use of the index can reduce the amount of data that needs to be scanned, there is a cost to using the index due to writing, reading, and manipulating an auxiliary Row ID spool.

    The Optimizer uses factors such as the following to judge whether to use an available index:

  • The selectivity of the single-table predicate determines how many rows of the table are selected. It determines the cardinality of the Row ID spool. The Optimizer is more likely to use an index to satisfy a query that uses more selective single-table predicates.
  • The accuracy with which the Optimizer can determine the selectivity of the predicate depends on whether statistics have been collected on the geospatial column, and whether the collected statistics reflect the current data demographics.

  • The size of the Row ID Spool row, which depends on the number of columns present in the selectivity list or WHERE-clause. This factor, together with the selectivity, determines the number of IO blocks present in the Row ID spool.
  • The size of the table rows. In general, larger rows make it more cost effective for the Optimizer to use the index.
  • For more information on how the Teradata Optimizer chooses execution paths for queries, see SQL Request and Transaction Processing.

    Geospatial join predicates cause the optimizer to consider executing a nested join to speed access to the data required to satisfy the geospatial query. Supported join predicates use either of the two following forms:

    SELECT a, b FROM Table1, Table2
    WHERE Table1.GeoCol.ST_SupportedGeoMethod(Table2.GeoCol)=1

    or

    SELECT a, b, FROM Table2
    JOIN Table1 ON Table2.GeoCol.SupportedGeoMethod(Table2.GeoCol)=1

    Note: The expressions must be set to evaluate to 1 (true) for these to qualify as a join predicates.

     

    Syntax Element

    Description

    Table1 and Table2

    Tables containing geospatial data columns.

    GeoCol

    Geospatial data column defined as one of the ST_GEOMETRY types.

    Note: Supported predicate methods must be invoked directly from simple column references, and only simple column references can be passed to the predicate methods.

    ST_SupportedGeoMethod

    One of the geospatial methods listed above.

    Examples  

    The following examples show the use of geospatial join predicates for which the Teradata Optimizer would consider creating a nested join to speed data access.

     
    CREATE TABLE T1
     (a INTEGER, b char(20000), GeoCol ST_Geometry)
     INDEX (GeoCol);
    CREATE TABLE T2
     (pkey INTEGER, buffer char(20000), Geom ST_Geometry)
     INDEX (Geom);
     
    INSERT INTO T1
     (1,'Teradata01', 'Point(10 20 30)');

    The number of rows inserted into the table and the nature of the Geospatial Data itself will play a significant role in whether or not the Optimizer chooses to use the index.

    SELECT a,b (format 'x(12)'), GeoCol 
    FROM T2, T1
    WHERE T1.GeoCol.ST_WITHIN(T2.geom)= 1;

    SELECT a,b (format 'x(12)'), geocol
    FROM T2
    INNER JOIN T1 ON T1.GeoCol.ST_WITHIN(T2.Geom)= 1;

    The following shows an example of EXPLAIN output showing the optimizer utilizing a nested join.

    Explanation
    ------------------------------------------------------------------------
    1) First, we lock JOSHUA.T1 for read on a reserved RowHash to
    prevent global deadlock.
    2) Next, we lock JOSHUA.T2 for read on a reserved RowHash to
    prevent global deadlock.
    3) We lock JOSHUA.T1 for read, and we lock JOSHUA.T2 for read.
    4) We do an all-AMPs RETRIEVE step from JOSHUA.T2 by way of an
    all-rows scan with a condition of ("JOSHUA.T2.b > 1") into
    Spool 2 (all_amps), which is duplicated on all AMPs. The size of
    Spool 2 is estimated with no confidence to be 116 rows (1,168,468
    bytes). The estimated time for this step is 0.67 seconds.
    5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
    all-rows scan, which is joined to JOSHUA.T1 by way of Spatial
    index # 4 "Retrieving from CDT indexed column JOSHUA.T1.sp via
    CDT key expression {LeftTable}.sp .ST_MBR ()"extracting row ids
    only. Spool 2 and JOSHUA.T1 are joined using a nested join,
    with a join condition of ("(1=1)"). The result goes into Spool 3
    (all_amps), which is built locally on the AMPs. Then we do a SORT
    to order Spool 3 by field Id 1. The size of Spool 3 is estimated
    with no confidence to be 1 row (10,083 bytes). The estimated time
    for this step is 0.01 seconds.
    6) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of an
    all-rows scan, which is joined to JOSHUA.T1 by way of an
    all-rows scan with a condition of ("JOSHUA.T1.b > 1"). Spool
    3 and JOSHUA.T1 are joined using a row id join, with a join
    condition of ("(sp .ST_WITHIN ({RightTable}.sp ))= 1"). The
    result goes into Spool 1 (group_amps), which is built locally on
    the AMPs. Then we do a SORT to order Spool 1 by the sort key in
    spool field1. The size of Spool 1 is estimated with no confidence
    to be 1 row (95 bytes). The estimated time for this step is 0.05 seconds.
    7) Finally, we send out an END TRANSACTION step to all AMPs involved
    in processing the request.
    -> The contents of Spool 1 are sent back to the user as the result of
    statement 1. The total estimated time is 0.73 seconds.

     

    How the Optimizer Determines Nested Join Utilization

    The Teradata Optimizer weighs the costs against the benefits of using a nested join strategy for satisfying a query. The Optimizer uses factors such as the following to judge whether to construct a nested join:

  • The cardinality and size of the duplicated table.
  • If there is a geospatial index defined on only one of the tables involved in the join operation, the Optimizer will duplicate the non-indexed table.

    If there is a geospatial index on both tables, Table1 and Table2, the optimizer will calculate the costs of duplicating each table and using the index of the other table, and choose the most cost-effective combination.

  • The cardinality and size of the row ID spool.
  • The number of rows populating the row ID spool depends on the selectivity of the join. The row size of a row lying within the row ID spool depends on the number of columns belonging to the duplicated table, which were referenced in the query.

    For more information on how the Teradata Optimizer chooses execution paths for queries, see SQL Request and Transaction Processing.