These examples are for demonstration purposes only. For a fully runnable set of WRITE_NOS examples, see Teradata Vantageā¢ - Native Object Store Getting Started Guide, B035-1214.
Write All Data from a Vantage Table to External Storage With Partitioning
The following example writes all the data from a native relational database table to external object storage. The data is partitioned (sorted) by the Vantage Advanced SQL Engine before being written, and the objects containing the data in external storage are named to reflect the sorted data they contain.
SELECT NodeId, AmpId, Sequence, ObjectName, ObjectSize, RecordCount FROM WRITE_NOS_FM ( ON ( SELECT * FROM RiverFlowPerm ) PARTITION BY SiteNo ORDER BY SiteNo USING LOCATION('YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/') STOREDAS('PARQUET') NAMING('DISCRETE') MANIFESTFILE('YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/flowdata_manifest.json') INCLUDE_ORDERING('TRUE') MAXOBJECTSIZE('4MB') COMPRESSION('GZIP') ) AS d ORDER BY AmpId;Four rows from an example result set are presented here in a vertical layout (column names on the left) for a more efficient presentation:
NodeId 33 AmpId 0 Sequence 1 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/09497500/object_33_0_1.parquet ObjectSize 27682 RecordCount 2944 NodeId 33 AmpId 0 Sequence 1 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/09513780/object_33_0_1.parquet ObjectSize 19136 RecordCount 2941 NodeId 33 AmpId 0 Sequence 1 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/09424900/object_33_0_1.parquet ObjectSize 23174 RecordCount 2946 NodeId 33 AmpId 0 Sequence 1 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/flowdata_manifest.json ObjectSize 513 RecordCount ? [...]
Things to note in this example:
- Each row of the result set describes one object written to external storage. The data objects with names ending in the .parquet extension contain data from Vantage. The object with name ending in the .json extension is the manifest file, which is written after the other data objects.
- Each data object contains data from one partition created according to the PARTITION BY clause. In this case there is one partition for each data collection site, that is, for each SiteNo column value.
- The NAMING('DISCRETE') parameter causes the object names to include the partition name, in this case the object names include the values for SiteNo. In the object name,YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/09497500/object_33_0_1.parquet, "09497500" is the SiteNo of the data contained by this object.
- Because the query included INCLUDE_ORDERING('TRUE'), the object data includes data from the ORDER BY column, in this case the site number from which the data in that row was collected.
- The MANIFESTFILE option causes a manifest file to be written to external storage. The manifest file lists all objects that WRITE_NOS wrote.
Write All Data from a Vantage Table to External Storage with Hashing
SELECT NodeId, AmpId, Sequence, ObjectName, ObjectSize, RecordCount FROM WRITE_NOS_FM ( ON ( SELECT * FROM RiverFlowPerm ) HASH BY SiteNo LOCAL ORDER BY SiteNo USING LOCATION('YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/') STOREDAS('PARQUET') NAMING('RANGE') MANIFESTFILE('YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/flowdata_manifest.json') INCLUDE_ORDERING('TRUE') INCLUDE_HASHBY('TRUE') MAXOBJECTSIZE('4MB') COMPRESSION('GZIP') ) AS d ORDER BY AmpId;
Four rows from an example result set are presented here in a vertical layout (column names on the left) for a more efficient presentation:
NodeId 33 AmpId 0 Sequence 1 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/09394500/09513780/object_33_0_1.parquet ObjectSize 79398 RecordCount 11778 NodeId 33 AmpId 1 Sequence 1 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/09380000/09429070/object_33_1_1.parquet ObjectSize 84999 RecordCount 8926 NodeId 33 AmpId 2 Sequence 1 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/09396100/09400815/object_33_2_1.parquet ObjectSize 47659 RecordCount 6765 NodeId 33 AmpId 0 Sequence 1 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/flowdata_manifest.json ObjectSize 624 RecordCount ? [...]
Things to note in this example:
- Each row of the result set describes one object written to external storage. The data objects with names ending in the .parquet extension contain data from Vantage. The object with name ending in the .json extension is the manifest file, which is written after the other data objects.
- Each object contains data grouped according to to the HASH BY clause. In this example it determines how the database table row data is distributed and grouped to different AMPs using the hash of the SiteNo column values. In this case there is one partition for the data on each AMP, which may include a combination of data rows from different collection sites.
- The NAMING('RANGE') parameter causes the object names to include the upper and lower values in the range of site numbers which data the object contains.In the object name,YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/09394500/09513780/object_33_0_1.parquet contains data from multiple site numbers between site 09394500 and 9513780, inclusive.
- Because the query included INCLUDE_ORDERING('TRUE') and INCLUDE_HASHBY('TRUE'), the object data includes data from the ORDER BY and HASH BY columns. In this case both are the same column, SiteNo, so the data contained in the objects includes the the site number from which the data in that row was collected.
- The MANIFESTFILE option causes a manifest file to be written to external storage. The manifest file lists all objects that WRITE_NOS wrote.