Geospatial NUSIs can be used by the Teradata Optimizer to speed access to geospatial data for queries that use single-table predicates and join predicates. For more information on Geospatial Indexes, see “Geospatial Indexes” on page 18.
The following geospatial methods can be used to form predicates:
Geospatial single-table predicates cause the optimizer to evaluate a geospatial index, if one exists, as a potential access path to the geospatial data. In these circumstances, the index can greatly speed access to the data and execution of the query. Supported single-table predicates use either of the two following forms:
SELECT a, b FROM TableName
WHERE TableName.GeoCol.ST_SupportedGeoMethod(GeospatialLiteralExpression)=1
Note: The expression must be set to evaluate to 1
(true) for this to qualify as a single-table predicate.
or
SELECT a, b, FROM TableName
WHERE TableName.GeoCol IS NULL
Syntax Element |
Description |
TableName |
Table containing a geospatial data column. |
GeoCol |
Geospatial data column defined as one of the ST_GEOMETRY types. |
ST_SupportedGeoMethod |
One of the geospatial methods listed above. |
GeospatialLiteralExpression |
An arbitrary geospatial literal expression that can be folded (simplified) in to a geospatial literal value. |
Examples
The following examples show the use of geospatial single-table predicates that enable the optimizer to use a geospatial index for data access.
CREATE TABLE StatT2
(a INT,b CHAR(20000),sp ST_Geometry)
INDEX(sp);
INSERT StatT2(1, 'Teradata01','Point(10 10 10)');
The number of rows inserted into the table and the nature of the Geospatial Data itself will play a significant role in whether or not the Optimizer chooses to use the index.
SELECT a,b(format 'x(10)'), sp
FROM StatT2
WHERE StatT2.sp.ST_WITHIN(CAST(
'POLYGON((0 0 0, 0 50 50, 50 50 50, 50 0 50, 0 0 0))'
AS ST_Geometry)) = 1;
SELECT a,b (format 'x(10)'), sp
FROM StatT2
WHERE StatT2.sp.ST_WITHIN(NEW ST_GEOMETRY(
'POLYGON((0 0 0, 0 50 50, 50 50 50, 50 0 50, 0 0 0))')) = 1;
SELECT a,b (format 'x(10)'), sp
FROM StatT2
WHERE StatT2.sp.ST_WITHIN(NEW ST_GEOMETRY(
'POLYGON((0 0, 0 20, 20 20, 20 0, 0 0))').ST_INTERSECTION(
'POLYGON((0 0, 0 15, 15 15, 15 0, 0 0))')) = 1;
The following shows an example of EXPLAIN output showing the optimizer utilizing a geospatial index.
Explanation
-----------------------------------------------------------------------
1) First, we lock JOSHUA.StatT2 for read on a reserved RowHash to
prevent global deadlock.
2) Next, we lock JOSHUA.StatT2 for read.
3) We do an all-AMPs SUM step to aggregate from JOSHUA.StatT2 by way
of Spatial index # 4 "Retrieving from CDT indexed column
JOSHUA.StatT2.sp via CDT key expression NEW ST_GEOMETRY (
'POLYGON((0 0, 0 20, 20 20, 20 0, 0 0))'(VARCHAR(64000), CHARACTER
SET LATIN, NOT CASESPECIFIC)).ST_INTERSECTION (CAST(('POLYGON((9
9, 9 18, 18 18, 9 9))'(VARCHAR(64000), CHARACTER SET LATIN, NOT
CASESPECIFIC)) AS SYSUDTLIB.ST_Geometry)).ST_MBR ()"with a
residual condition of ("(JOSHUA.StatT2.sp .ST_WITHIN (NEW
ST_GEOMETRY ('POLYGON((0 0, 0 20, 20 20, 20 0, 0
0))'(VARCHAR(64000), CHARACTER SET LATIN, NOT
CASESPECIFIC)).ST_INTERSECTION (CAST(('POLYGON((9 9, 9 18, 18 18,
9 9))'(VARCHAR(64000), CHARACTER SET LATIN, NOT CASESPECIFIC)) AS
SYSUDTLIB.ST_Geometry))))= 1"), and the grouping identifier in
field 1027. Aggregate Intermediate Results are computed globally,
then placed in Spool 3. The size of Spool 3 is estimated with no
confidence to be 3 rows (63 bytes). The estimated time for this
step is 0.07 seconds.
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. Then we do a SORT to order Spool 1 by the sort key
in spool field1. The size of Spool 1 is estimated with no
confidence to be 3 rows (96 bytes). The estimated time for this
step is 0.04 seconds.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.11 seconds.
How the Optimizer Determines Index Utilization
The Teradata Optimizer weighs the cost and benefit of using the alternative data paths to execute a query.
If a table has a secondary index, the Optimizer can use the index when executing a query against that table, rather than performing a more costly full table scan. By using the index, the Optimizer can reduce the number of rows that must be considered for predicate qualification, usually resulting in faster access to the necessary data.
Although use of the index can reduce the amount of data that needs to be scanned, there is a cost to using the index due to writing, reading, and manipulating an auxiliary Row ID spool.
The Optimizer uses factors such as the following to judge whether to use an available index:
The accuracy with which the Optimizer can determine the selectivity of the predicate depends on whether statistics have been collected on the geospatial column, and whether the collected statistics reflect the current data demographics.
For more information on how the Teradata Optimizer chooses execution paths for queries, see SQL Request and Transaction Processing.
Geospatial join predicates cause the optimizer to consider executing a nested join to speed access to the data required to satisfy the geospatial query. Supported join predicates use either of the two following forms:
SELECT a, b FROM Table1, Table2
WHERE Table1.GeoCol.ST_SupportedGeoMethod(Table2.GeoCol)=1
or
SELECT a, b, FROM Table2
JOIN Table1 ON Table2.GeoCol.SupportedGeoMethod(Table2.GeoCol)=1
Note: The expressions must be set to evaluate to 1
(true) for these to qualify as a join predicates.
Syntax Element |
Description |
Table1 and Table2 |
Tables containing geospatial data columns. |
GeoCol |
Geospatial data column defined as one of the ST_GEOMETRY types. Note: Supported predicate methods must be invoked directly from simple column references, and only simple column references can be passed to the predicate methods. |
ST_SupportedGeoMethod |
One of the geospatial methods listed above. |
Examples
The following examples show the use of geospatial join predicates for which the Teradata Optimizer would consider creating a nested join to speed data access.
CREATE TABLE T1
(a INTEGER, b char(20000), GeoCol ST_Geometry)
INDEX (GeoCol);
CREATE TABLE T2
(pkey INTEGER, buffer char(20000), Geom ST_Geometry)
INDEX (Geom);
INSERT INTO T1
(1,'Teradata01', 'Point(10 20 30)');
The number of rows inserted into the table and the nature of the Geospatial Data itself will play a significant role in whether or not the Optimizer chooses to use the index.
SELECT a,b (format 'x(12)'), GeoCol
FROM T2, T1
WHERE T1.GeoCol.ST_WITHIN(T2.geom)= 1;
SELECT a,b (format 'x(12)'), geocol
FROM T2
INNER JOIN T1 ON T1.GeoCol.ST_WITHIN(T2.Geom)= 1;
The following shows an example of EXPLAIN output showing the optimizer utilizing a nested join.
Explanation
------------------------------------------------------------------------
1) First, we lock JOSHUA.T1 for read on a reserved RowHash to
prevent global deadlock.
2) Next, we lock JOSHUA.T2 for read on a reserved RowHash to
prevent global deadlock.
3) We lock JOSHUA.T1 for read, and we lock JOSHUA.T2 for read.
4) We do an all-AMPs RETRIEVE step from JOSHUA.T2 by way of an
all-rows scan with a condition of ("JOSHUA.T2.b > 1") into
Spool 2 (all_amps), which is duplicated on all AMPs. The size of
Spool 2 is estimated with no confidence to be 116 rows (1,168,468
bytes). The estimated time for this step is 0.67 seconds.
5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to JOSHUA.T1 by way of Spatial
index # 4 "Retrieving from CDT indexed column JOSHUA.T1.sp via
CDT key expression {LeftTable}.sp .ST_MBR ()"extracting row ids
only. Spool 2 and JOSHUA.T1 are joined using a nested join,
with a join condition of ("(1=1)"). The result goes into Spool 3
(all_amps), which is built locally on the AMPs. Then we do a SORT
to order Spool 3 by field Id 1. The size of Spool 3 is estimated
with no confidence to be 1 row (10,083 bytes). The estimated time
for this step is 0.01 seconds.
6) We do an all-AMPs JOIN step from Spool 3 (Last Use) by way of an
all-rows scan, which is joined to JOSHUA.T1 by way of an
all-rows scan with a condition of ("JOSHUA.T1.b > 1"). Spool
3 and JOSHUA.T1 are joined using a row id join, with a join
condition of ("(sp .ST_WITHIN ({RightTable}.sp ))= 1"). The
result goes into Spool 1 (group_amps), which is built locally on
the AMPs. Then we do a SORT to order Spool 1 by the sort key in
spool field1. The size of Spool 1 is estimated with no confidence
to be 1 row (95 bytes). The estimated time for this step is 0.05 seconds.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.73 seconds.
How the Optimizer Determines Nested Join Utilization
The Teradata Optimizer weighs the costs against the benefits of using a nested join strategy for satisfying a query. The Optimizer uses factors such as the following to judge whether to construct a nested join:
If there is a geospatial index defined on only one of the tables involved in the join operation, the Optimizer will duplicate the non-indexed table.
If there is a geospatial index on both tables, Table1 and Table2, the optimizer will calculate the costs of duplicating each table and using the index of the other table, and choose the most cost-effective combination.
The number of rows populating the row ID spool depends on the selectivity of the join. The row size of a row lying within the row ID spool depends on the number of columns belonging to the duplicated table, which were referenced in the query.
For more information on how the Teradata Optimizer chooses execution paths for queries, see SQL Request and Transaction Processing.