All foreign tables have a generic PATHPATTERN; for example, PATHPATTERN('$Var1/$Var2/$Var3/$Var4/$Var5/$Var6/$Var7/$Var8/$Var9/$Var10/$Var11').
To make path filtering in a query easier, explicitly specify a PATHPATTERN with meaningful names when you create the foreign table. For example, to match the path names selected in the river flow example, use ('$data/$siteno/$year/$month/$day'). PATHPATTERNs are useful for limiting results, instead of returning all the objects in an external object store.
The examples use a sample river flow data set. To use your own data, replace the table and column names, and authorization object. See Variable Substitutions for Examples for the credentials and location values for the sample data set.
- To run NOS-related commands, log on to the database as a user with the required privileges.
- If it does not exist, create the foreign table or ask your database administrator to create the foreign table called riverflow_parquet_path. See: Filtering External Parquet Data From a Foreign Table.
-
Use path filtering to narrow down search results. To see the data for a site number on a particular date, query the path attributes:
SELECT TOP 2 * FROM riverflow_parquet_path WHERE $PATH.$year = '2018' AND $PATH.$month = '07' AND $PATH.$day = '01.parquet' AND $PATH.$siteno = '09394500';
The $PATH.$day variable represents the actual object in the path segment name ( data / siteno / year / month / day). As such, it often has a file extension as part of the name, as shown in the example.
Your result will be similar to this, reflecting your external object store location:
Location /S3/td-usgs.s3.amazonaws.com/PARQUETDATA/09394500/2018/07/01.parquet GageHeight2 4.81 Flow 0.00 site_no 9394500 datetime 2018-07-01 00:15 Precipitation 0.00 GageHeight 4.75 Location /S3/td-usgs.s3.amazonaws.com/PARQUETDATA/09394500/2018/07/01.parquet GageHeight2 4.82 Flow 0.01 site_no 9394500 datetime 2018-07-01 00:00 Precipitation 0.00 GageHeight 4.75
For readability, the output is displayed vertically.Note that the more specific you are with path filtering, the more selective are the results, potentially saving processing time and external storage access costs.
-
Filter on the siteno using path filtering:
SELECT TOP 2 GageHeight TheGageHeight , Flow TheFlow FROM riverflow_parquet_path WHERE $path.$siteno = '09394500';
Result:
TheGageHeight TheFlow ------------- --------- 4.75 0.01 5.04 3.53