The following examples show accessing JSON-formatted data from Teradata's public Azure Blob storage container.
The following table shows the LOCATION values for the Teradata-supplied public external object stores that contain JSON data. You can run the examples using any of these locations, after first setting up the function mapping here Setting Up to Run the Examples.
Platform | LOCATION |
---|---|
Amazon S3 | /s3/td-usgs-public.s3.amazonaws.com/JSONDATA/ |
Azure Blob storage | /az/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/JSONDATA/ |
Google Cloud Storage | /gs/storage.googleapis.com/td-usgs/JSONDATA/ |
If you want to use your own external object store, see Variable Substitutions for Examples for examples of LOCATION, ACCESS_ID/USER, and ACCESS_KEY/PASSWORD.
Using SHOW AUTHORIZATION
SHOW AUTHORIZATION NOS_USR.DefAuth;
If the authorization object does not exist, run the steps in Setting Up to Run the Examples.
Setting Up to Run JSON Examples
CREATE FOREIGN TABLE riverflow_json , EXTERNAL SECURITY DEFINER TRUSTED DefAuth ( LOCATION VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC , Payload JSON INLINE LENGTH 32000 CHARACTER SET UNICODE ) USING( LOCATION('/az/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/JSONDATA/') );
Sampling External Data In Place Using READ_NOS
SELECT TOP 2 payload.* FROM READ_NOS_FM ( ON ( SELECT CAST( NULL AS JSON ) ) USING LOCATION('/az/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/JSONDATA/') ) AS D;
Sample result:
Payload.* ------------------------------------------------------------------- ["09380000","2018-07-04 00:00","17700","10.24","10.6","678","0.00"] ["09380000","2018-06-30 00:00","15900","9.92","10.4","685","0.00"]
Sampling External Data In Place Using a Foreign Table
SELECT TOP 2 * FROM riverflow_json;
Sample result:
LOCATION -------------------------------------------------------------- /AZ/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/JSONDATA/09396100/2018/07/14.json /AZ/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/JSONDATA/09396100/2018/07/14.json
Using JSON_KEYS to See Header Names
SELECT DISTINCT * FROM JSON_KEYS ( ON (SELECT payload FROM riverflow_json) ) AS jsonKeys;
Sample result:
JSONKeys ---------------- "Velocity" "datetime" "Precipitation" "Flow" "BatteryVoltage" "WaterVelocity" "Conductance" "Temp" "GageHeight2" "GageHeight" "site_no"
Finding Maximum and Minimum Values
SELECT MAX(CAST(payload.Flow AS FLOAT) ( FORMAT '-ZZZZ9.99')) , MIN(CAST(payload.Flow AS FLOAT) ( FORMAT '-(5)ZZZ9.99')) FROM riverflow_json;
Sample result:
Maximum(Payload.Flow) Minimum(Payload.Flow) --------------------- --------------------- 19000.00 -9.72
Querying for a Specific Set of Rows
SELECT MAX(CAST(payload.Flow AS FLOAT) ( FORMAT '-ZZZZ9.99')) FROM riverflow_json WHERE payload.site_no = '09394500';
Sample result:
Maximum(Payload.Flow) ---------------------- 1230.00
Previewing Data Using READ_NOS
SELECT TOP 2 payload FROM READ_NOS_FM ( ON ( SELECT CAST(NULL AS JSON) ) USING LOCATION('/az/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/JSONDATA/') RETURNTYPE('NOSREAD_RECORD') ) AS D;
Sample result:
Payload -------------------------------------------------------------------- { "site_no":"09380000", "datetime":"2018-07-04 00:00", "Flow":"17700", "GageHeight":"10.24", "Temp":"10.6", "Conductance":"678", "Precipitation":"0.00"} { "site_no":"09380000", "datetime":"2018-07-03 00:00", "Flow":"17600", "GageHeight":"10.23", "Temp":"10.6", "Conductance":"674", "Precipitation":"0.00"}
Note, the payload column is truncated in the sample result.
Viewing the Key Structure
SELECT TOP 2 LOCATION FROM READ_NOS_FM ( USING RETURNTYPE ('NOSREAD_KEYS') LOCATION('/az/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/JSONDATA/') ) AS D;
Sample result:
Location ------------------------------------------------------------ /AZ/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/JSONDATA/09380000/2018/07/04.json /AZ/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/JSONDATA/09380000/2018/06/30.json
Loading External Data into the Database Using CREATE TABLE AS...WITH DATA
CREATE MULTISET TABLE riverflowprecip_json AS ( SELECT CAST(Payload.site_no AS CHAR(8)) SiteNo, CAST(Payload.Flow AS FLOAT) ( FORMAT '-ZZZZ9.99') Flow, CAST(Payload.GageHeight AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight, CAST(Payload.datetime AS TIMESTAMP(0) FORMAT 'YYYY-MM-DDBHH:MI') Date_Time FROM riverflow_json WHERE Payload.Precipitation > 0 ) WITH DATA NO PRIMARY INDEX;
Display the number of rows in the table:
SELECT COUNT(*) FROM riverflowprecip_json;
Sample result:
Count(*) ----------- 396
See Loading External Data into the Database Using CREATE TABLE AS...WITH DATA.
Loading External Data into the Database Using READ_NOS and CREATE TABLE AS
CREATE MULTISET TABLE riverflowperm_json AS ( SELECT CAST(payload.site_no AS CHAR(8)) Site_no, CAST(payload.Flow AS FLOAT) ( FORMAT '-ZZZ9.99') Flow, CAST(payload.GageHeight AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight1, CAST(payload.Precipitation AS FLOAT) ( FORMAT '-ZZZ9.99') Precipitation, CAST(payload.Temp AS FLOAT) ( FORMAT '-ZZZ9.99') Temperature, CAST(payload.Velocity AS FLOAT) ( FORMAT '-ZZZ9.99') Velocity FROM READ_NOS_FM ( ON (SELECT CAST(NULL AS JSON)) USING LOCATION('/az/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/JSONDATA/') ) AS D ) WITH DATA;
Display the number of rows in the table:
SELECT COUNT (*) FROM riverflowperm_json;
Sample result:
Count(*) ----------- 36301
See Loading External Data into the Database Using READ_NOS and CREATE TABLE AS.
Loading External Data into the Database Using INSERT SELECT
CREATE TABLE RiverFlowPermInsert_json (DateTS CHAR(20) ,SiteNo CHAR(8) ,Flow FLOAT FORMAT '-ZZZZ9.99' ,Velocity FLOAT FORMAT '-ZZZ9.99' ,GageHeight FLOAT FORMAT '-ZZZ9.99' ,Temp FLOAT FORMAT '-ZZZ9.99' ) PRIMARY INDEX (SiteNo);
Insert the external data into the database table:
INSERT INTO RiverFlowPermInsert_json SELECT payload.datetime ,payload.site_no ,Cast (payload.Flow AS FLOAT) ,Cast (payload.Velocity AS FLOAT) ,Cast (payload.GageHeight AS FLOAT) ,Cast (payload.Temp AS FLOAT) WHERE payload.site_no='09429070' FROM riverflow_json;
Query the data from the database table:
SELECT TOP 2 * FROM RiverFlowPermInsert_json;
Sample result:
DateTS SiteNo Flow Velocity GageHeight Temp -------------------- -------- -------- -------- ---------- ------- 2018-07-02 01:00 09429070 149.00 0.78 5.75 78.80 2018-07-02 00:30 09429070 145.00 0.77 5.68 78.90
See Loading External Data into the Database Using INSERT SELECT.
Joining External Data and Database Tables
Create a database dimension table to join to:
CREATE SET TABLE rivernames ( site_no CHAR(8) CHARACTER SET LATIN NOT CASESPECIFIC, name CHAR(60) CHARACTER SET LATIN NOT CASESPECIFIC) UNIQUE PRIMARY INDEX ( site_no );
Create the foreign table or ask your database administrator to create the foreign table that is used to populate the dimension table:
CREATE FOREIGN TABLE nos_rivernames , EXTERNAL SECURITY DEFINER TRUSTED DefAuth ( Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC, PAYLOAD DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV ) USING ( LOCATION('/az/akiaxox5jikeotfww4ul.blob.core.windows.net/RIVERS/rivers.csv') );
Populate the dimension table:
INSERT INTO rivernames SELECT payload..site_no, payload..name FROM nos_rivernames;
Join the dimension table and external data (foreign table):
SELECT DISTINCT name(CHAR(60)) FROM riverflow_json rf, rivernames rn WHERE rf.payload.site_no = rn.site_no AND rf.payload.Precipitation > 0.1 ORDER BY 1;
Sample result:
name ---------------------------------- CIBECUE CREEK NEAR OVERGAARD COLORADO RIVER AT LEES FERRY GILA RIVER AT KELVIN LITTLE COLORADO RIVER AT WOODRUFF NEW RIVER NEAR ROCK SPRINGS NEWMAN CANYON ABOVE UPPER LAKE MARY POLACCA WASH NEAR SECOND MESA PUERCO RIVER NEAR CHAMBERS SALT RIVER NEAR CHRYSOTILE SANTA MARIA RIVER NEAR BAGDAD
Filtering Using A Path Filter in the CREATE FOREIGN TABLE Definition
CREATE FOREIGN TABLE riverflow_json_pathfilter
, EXTERNAL SECURITY DEFINER TRUSTED DefAuth
(
Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
Payload JSON INLINE LENGTH 32000 CHARACTER SET UNICODE
)
USING (
LOCATION('/az/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/JSONDATA/site_no')
PATHPATTERN('$data/$siteno/$year/$month/$day')
);
Where site_no is a site number value; for example, 0938.
Show the table definition:
SHOW TABLE riverflow_json_pathfilter;
See Filtering Using a Path Filter in the CREATE FOREIGN TABLE Definition.
Filtering on the Payload Column of a Foreign Table
Create a foreign table or ask your database administrator to create the foreign table using a custom PATHPATTERN:
CREATE FOREIGN TABLE riverflow_json_path , EXTERNAL SECURITY DEFINER TRUSTED DefAuth ( Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC, Payload JSON INLINE LENGTH 32000 CHARACTER SET UNICODE ) USING ( LOCATION('/az/akiaxox5jikeotfww4ul.blob.core.windows.net/td-usgs/JSONDATA/') PATHPATTERN('$data/$siteno/$year/$month/$day') );
Run a query using a payload filter only:
SELECT TOP 2 CAST(payload.GageHeight AS FLOAT)( FORMAT '-ZZZ9.99') TheGageHeight, CAST(payload.Flow AS FLOAT)(FORMAT '-ZZZZ9.99') TheFlow FROM riverflow_json_path WHERE payload.site_no = '09394500';
Sample result:
TheGageHeight TheFlow ------------- --------- 6.39 45.60 5.04 3.53
Run a similar query using equivalent path values in the filter:
SELECT TOP 2 CAST(payload.GageHeight AS FLOAT)( FORMAT '-ZZZ9.99') TheGageHeight, CAST (payload.Flow AS FLOAT)(FORMAT '-ZZZZ9.99') TheFlow FROM riverflow_json_path WHERE $path.$siteno = '09394500';
Sample result:
TheGageHeight TheFlow ------------- --------- 6.40 62.90 5.04 3.53
Querying a Foreign Table with Path Filtering
Use path filtering to narrow down search results:
SELECT TOP 2 payload.* FROM riverflow_json_path WHERE $PATH.$year = '2018' AND $PATH.$month = '07' AND $PATH.$day = '01.json' AND $PATH.$siteno = '09394500';
Sample result:
Payload.* --------------------------------------------------------------------------- ["09394500","2018-07-01 00:15","0.00","4.75","0.00","4.81"] ["09394500","2018-07-01 00:00","0.01","4.75","0.00","4.82"]
Combining Path and Payload Filtering
Combine path and payload filtering:
SELECT COUNT(*) FROM riverflow_json_path WHERE payload.GageHeight > 5 AND $path.$siteno = '09394500' AND $path.$year = '2018' AND $path.$month = '07';
Sample result:
Count(*) ----------- 1602
Creating a Basic View
Create the view of the foreign table:
CREATE VIEW riverflowview_json AS ( SELECT CAST(payload.Flow AS FLOAT) ( FORMAT '-ZZZ9.99') Flow, CAST(payload.GageHeight AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight1, CAST(payload.Precipitation AS FLOAT) ( FORMAT '-ZZZ9.99') Precipitation, CAST(payload.Temp AS FLOAT) ( FORMAT '-ZZZ9.99') Temperature, CAST(payload.Velocity AS FLOAT) ( FORMAT '-ZZZ9.99') Velocity, CAST(payload.BatteryVoltage AS FLOAT) ( FORMAT '-ZZZ9.99') BatteryVoltage, CAST(payload.GageHeight2 AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight2 FROM riverflow_json_path );
Query the view:
SELECT TOP 2 * FROM riverflowview_json;
Sample result:
Flow GageHeight1 Precipitation Temperature Velocity BatteryVoltage GageHeight2 -------- ----------- ------------- ----------- -------- -------------- ----------- 186.00 2.05 0.00 ? ? ? 2.05 232.00 2.16 0.00 ? ? ? 2.16
Using Path Variables as Columns in a View
REPLACE VIEW riverflowview_json AS ( SELECT CAST($path.$siteno AS CHAR(10)) TheSite, CAST($path.$year AS CHAR(4)) TheYear, CAST($path.$month AS CHAR(2)) TheMonth, CAST(SUBSTR($path.$day, 1, 2) AS CHAR(2)) TheDay, CAST(payload.Flow AS FLOAT) ( FORMAT '-ZZZ9.99') Flow, CAST(payload.GageHeight AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight1, CAST(payload.Precipitation AS FLOAT) ( FORMAT '-ZZZ9.99') Precipitation, CAST(payload.Temp AS FLOAT) ( FORMAT '-ZZZ9.99') Temperature, CAST(payload.Velocity AS FLOAT) ( FORMAT '-ZZZ9.99') Velocity, CAST(payload.BatteryVoltage AS FLOAT) ( FORMAT '-ZZZ9.99') BatteryVoltage, CAST(payload.GageHeight2 AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight2 FROM riverflow_json_path);
Using a Foreign Table View with Path Filtering
Replace the view of the foreign table:
REPLACE VIEW riverflowview_json AS ( SELECT CAST($path.$siteno AS CHAR(10)) TheSite, CAST($path.$year AS CHAR(4)) TheYear, CAST($path.$month AS CHAR(2)) TheMonth, CAST($path.$day AS CHAR(2)) TheDay, CAST(payload.Flow AS FLOAT) ( FORMAT '-ZZZ9.99') Flow, CAST(payload.GageHeight AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight1, CAST(payload.Precipitation AS FLOAT) ( FORMAT '-ZZZ9.99') Precipitation, CAST(payload.Temp AS FLOAT) ( FORMAT '-ZZZ9.99') Temperature, CAST(payload.Velocity AS FLOAT) ( FORMAT '-ZZZ9.99') Velocity, CAST(payload.BatteryVoltage AS FLOAT) ( FORMAT '-ZZZ9.99') BatteryVoltage, CAST(payload.GageHeight2 AS FLOAT) ( FORMAT '-ZZZ9.99') GageHeight2 FROM riverflow_json_path WHERE TheSite = payload.site_no);
Query the view:
SELECT TOP 2 * FROM riverflowview_json;
Sample result:
TheSite TheYear TheMonth TheDay Flow GageHeight1 Precipitation Temperature Velocity BatteryVoltage GageHeight2 ---------- ------- -------- ------ -------- ----------- ------------- ----------- -------- -------------- ----------- 09429070 2018 07 02 156.00 5.52 ? 79.10 0.86 ? ? 09396100 2018 07 14 232.00 2.16 0.00 ? ? ? 2.16
Filter the view:
SELECT thesite,COUNT(*) FROM riverflowview_json WHERE thesite='09396100' GROUP BY 1;
Sample result:
TheSite Count(*) ---------- ----------- 09396100 3086
Run EXPLAIN on the query to see how the filtering is done:
EXPLAIN SELECT thesite,COUNT(*) FROM riverflowview_json WHERE thesite='09396100' GROUP BY 1;
Sample result:
Explanation -------------------------------------------------------------- […] 3) We do a single-AMP RETRIEVE step from NOS_USR.riverflow_json_path in view riverflowview_json metadata by way of an all-rows scan with a condition of ("(TD_SYSFNLIB.NosExtractVarFromPath ( NOS_USR.riverflow_json_path in view riverflowview_json.Location, '/S3/td-usgs.s3.amazonaws.com/', 2 )(CHAR(10), CHARACTER SET UNICODE, NOT CASESPECIFIC))= '09396100 '") into Spool 3 (all_amps), which is built locally on that AMP. Then we do a SORT to order Spool 3 by the sort key as the field_id list( 3, 2). The size of Spool 3 is estimated with no confidence to be 38 rows (26,790 bytes). The estimated time for this step is 0.74 seconds. 4) We do an all-AMPs RETRIEVE step in TD_Map1 from Spool 3 (Last Use) by way of an all-rows scan into Spool 2 (all_amps), which is binpacked and redistributed by size to all AMPs in TD_Map1. The size of Spool 2 is estimated with no confidence to be 38 rows ( 27,094 bytes). The estimated time for this step is 0.16 seconds. 5) We do an all-AMPs SUM step in TD_MAP1 to aggregate from NOS_USR.riverflow_json_path in view riverflowview_json by way of external metadata in Spool 2 (Last Use) with a condition of ("((TD_SYSFNLIB.NosExtractVarFromPath ( NOS_USR.riverflow_json_path in view riverflowview_json.Location, '/S3/td-usgs.s3.amazonaws.com/', 2 ))= (NOS_USR.riverflow_json_path in view riverflowview_json.Payload .JSONEXTRACTVALUE ( '$.site_no'(VARCHAR(32000), CHARACTER SET UNICODE, NOT CASESPECIFIC)) )) AND ((TD_SYSFNLIB.NosExtractVarFromPath ( NOS_USR.riverflow_json_path in view riverflowview_json.Location, '/S3/td-usgs.s3.amazonaws.com/', 2 )(CHAR(10), CHARACTER SET UNICODE, NOT CASESPECIFIC))= (NOS_USR.riverflow_json_path in view riverflowview_json.Payload .JSONEXTRACTVALUE ( '$.site_no'(VARCHAR(32000), CHARACTER SET UNICODE, NOT CASESPECIFIC))))"), and the grouping identifier in field 1. Aggregate Intermediate Results are computed globally, then placed in Spool 5 in TD_Map1. The size of Spool 5 is estimated with no confidence to be 1,085 rows (1,521,170 bytes). The estimated time for this step is 0.52 seconds. […]