Reading all the objects in a bucket can be time-consuming and expensive. An alternative solution is to filter the object store using one or both of the following filtering methods:
- Path Filtering
In path filtering, by default the folders are referred to as $var1, $var2, and so on.For CSV data, we attempt to use the column name, if it is the same as the folder name, instead of the generic $varX variable, for example, WHERE $path.$site_no='09380000'.
- A path is a series of values that constitute a multi-level key (for example, site_no, year, month, day) which identifies one or more objects within an object store.
- Path filtering lets you filter and reduce the amount of objects you bring back to the database by providing specified values for different levels in a path key (for example, WHERE $path.$site_no='09380000')
- Path filtering is the most efficient way to read a selected portion on object store.
In the example we show the generic syntax with the generic PATHPATTERN.
You create a foreign table against this structure like this:
CREATE FOREIGN TABLE table_name , EXTERNAL SECURITY authorization_object USING ( LOCATION('YOUR-OBJECT-STORE-URI') PATHPATTERN('$var1/$var2/$var3/$var4/$var5') );PATHPATTERN is optional. As an example, the PATHPATTERN in the CREATE FOREIGN statement matches the structure of the object store shown in the diagram.
You can look at the table definition with SHOW TABLE:
SHOW TABLE table_name;The table definition for the object store folder in the figure shows that the variables point to specific folders and files in the folder:
- $var1 points to the CSVDATA folder
- $var2 points to the 09380000 folder
- $var3 points to the 2018 folder
- $var4 points to the 06 folder
- $var5 points to the files, such as 27.csv, 28.csv, and so on
You can change $var names in the foreign table definition with PATHPATTERN, if you know the structure of your data.
CREATE FOREIGN TABLE table_name , EXTERNAL SECURITY authorization_object USING ( LOCATION('YOUR-OBJECT-STORE-URI') PATHPATTERN('$datatype/$site_no/$year/$month/$filename') );
If you defined the table with meaningful names, you use those names instead of $varX when referring to a $path. for example, see Using Path Variables as Columns in a View.
- Payload Filtering
- The payload column within a foreign table contains all of the values within an object.
- In payload filtering you apply a filter to the data itself and not to the partitioning (path key) of the object store.
- When filtering is applied to a column in the data (such as WHERE Flow > '15'), all objects in the object store have to be brought inside the database and be transformed prior to applying filtering.
- When given a choice, this is usually less efficient because it will incur greater effort at both the object store level and the database level than does path filtering.