The examples use a sample river flow data set. To use your own data, replace the table and column names, and authorization object. See Variable Substitutions for Examples for the credentials and location values for the sample data set.
- To run NOS-related commands, log on to the database as a user with the required privileges.
- If it does not exist, create the foreign table or ask your database administrator to create the foreign table called riverflow_json_path: See Filtering on the Payload Column of a Foreign Table.
-
Create the view of the foreign table:
REPLACE VIEW riverflowview_json AS ( SELECT CAST($path.$siteno AS CHAR(10)) TheSite, CAST($path.$year AS CHAR(4)) TheYear, CAST($path.$month AS CHAR(2)) TheMonth, CAST($path.$day AS CHAR(2)) TheDay, CAST(payload.Flow AS FLOAT) ( FORMAT '-ZZZ9.99') Flow, CAST(payload.GageHeight AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight1, CAST(payload.Precipitation AS FLOAT) ( FORMAT '-ZZZ9.99') Precipitation, CAST(payload.Temp AS FLOAT) ( FORMAT '-ZZZ9.99') Temperature, CAST(payload.Velocity AS FLOAT) ( FORMAT '-ZZZ9.99') Velocity, CAST(payload.BatteryVoltage AS FLOAT) ( FORMAT '-ZZZ9.99') BatteryVoltage, CAST(payload.GageHeight2 AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight2 FROM riverflow_json_path WHERE TheSite = payload.site_no);
-
Query the view:
SELECT TOP 2 * FROM riverflowview_json;
Results:
TheSite TheYear TheMonth TheDay Site_no Flow GageHeight1 Precipitation Temperature Velocity BatteryVoltage ------- ------- -------- ------ -------- ----- ----------- ------------- ----------- -------- -------------- 09400815 2018 07 10 09400815 0.00 0.00 0.00 ? ? 12.60 09400815 2018 07 10 09400815 0.00 - 0.01 0.00 ? ? 0.00
Columns with missing attributes return NULL values when the record does not contain the attribute.
-
Filter on the view to find the number of sites with the site number of 09396100:
SELECT thesite,COUNT(*) FROM riverflowview_json WHERE thesite='09396100' GROUP BY 1;
Result:
TheSite Count(*) ---------- ----------- 09396100 3086
-
Run EXPLAIN on the query to see how the filtering is done:
EXPLAIN SELECT thesite,COUNT(*) FROM riverflowview_json WHERE thesite='09396100' GROUP BY 1;
Result:
Explanation -------------------------------------------------------------- […] 3) We do a single-AMP RETRIEVE step from NOS_USR.riverflow_json_path in view riverflowview_json metadata by way of an all-rows scan with a condition of ("(TD_SYSFNLIB.NosExtractVarFromPath ( NOS_USR.riverflow_json_path in view riverflowview_json.Location, '/S3/td-usgs.s3.amazonaws.com/', 2 )(CHAR(10), CHARACTER SET UNICODE, NOT CASESPECIFIC))= '09396100 '") into Spool 3 (all_amps), which is built locally on that AMP. Then we do a SORT to order Spool 3 by the sort key as the field_id list( 3, 2). The size of Spool 3 is estimated with no confidence to be 38 rows (26,790 bytes). The estimated time for this step is 0.74 seconds. 4) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 3 (Last Use) by way of an all-rows scan into Spool 2 (all_amps), which is binpacked and redistributed by size to all AMPs in TD_Map1. The size of Spool 2 is estimated with no confidence to be 38 rows ( 27,094 bytes). The estimated time for this step is 0.16 seconds. 5) We do an all-AMPs SUM step in TD_MAP1 to aggregate from NOS_USR.riverflow_json_path in view riverflowview_json by way of external metadata in Spool 2 (Last Use) with a condition of ("((TD_SYSFNLIB.NosExtractVarFromPath ( NOS_USR.riverflow_json_path in view riverflowview_json.Location, '/S3/td-usgs.s3.amazonaws.com/', 2 ))= (NOS_USR.riverflow_json_path in view riverflowview_json.Payload .JSONEXTRACTVALUE ( '$.site_no'(VARCHAR(32000), CHARACTER SET UNICODE, NOT CASESPECIFIC)) )) AND ((TD_SYSFNLIB.NosExtractVarFromPath ( NOS_USR.riverflow_json_path in view riverflowview_json.Location, '/S3/td-usgs.s3.amazonaws.com/', 2 )(CHAR(10), CHARACTER SET UNICODE, NOT CASESPECIFIC))= (NOS_USR.riverflow_json_path in view riverflowview_json.Payload .JSONEXTRACTVALUE ( '$.site_no'(VARCHAR(32000), CHARACTER SET UNICODE, NOT CASESPECIFIC))))"), and the grouping identifier in field 1. Aggregate Intermediate Results are computed globally, then placed in Spool 5 in TD_Map1. The size of Spool 5 is estimated with no confidence to be 1,085 rows (1,521,170 bytes). The estimated time for this step is 0.52 seconds. […]
(3) is doing the path filtering. It is using the constant 09396100 as a path filtering expression to build the metadata spool. The metadata spool is the spool table that identifies the list of objects the query will actually process.
(5) is doing traditional row filtering. It is compares the site number extracted from the location string to the value in the JSON payload (the actual data in the object store).