WRITE_NOS | Examples | Teradata Vantage - 17.10 - WRITE_NOS Examples - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - SQL Operators and User-Defined Functions

Product
Advanced SQL Engine
Teradata Database
Release Number
17.10
Release Date
July 2021
Content Type
Administration
Programming Reference
Publication ID
B035-1210-171K
Language
English (United States)

These examples are for demonstration purposes only. For a fully runnable set of WRITE_NOS examples, see Teradata Vantage™ - Native Object Store Getting Started Guide, B035-1214.

Write All Data from a Vantage Table to External Storage With Partitioning

The following example writes all the data from a native relational database table to external object storage. The data is partitioned (sorted) by the Vantage Advanced SQL Engine before being written, and the objects containing the data in external storage are named to reflect the sorted data they contain.

SELECT NodeId, AmpId, Sequence, ObjectName, ObjectSize, RecordCount
FROM WRITE_NOS_FM (
 ON  ( SELECT * FROM RiverFlowPerm )
  PARTITION BY SiteNo ORDER BY SiteNo
 USING
  LOCATION('YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/')
  STOREDAS('PARQUET')
  NAMING('DISCRETE')
  MANIFESTFILE('YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/flowdata_manifest.json')
  INCLUDE_ORDERING('TRUE')
  MAXOBJECTSIZE('4MB')
  COMPRESSION('GZIP')
) AS d 
ORDER BY AmpId;
Four rows from an example result set are presented here in a vertical layout (column names on the left) for a more efficient presentation:
     NodeId          33
      AmpId           0
   Sequence           1
 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/09497500/object_33_0_1.parquet
 ObjectSize       27682
RecordCount        2944

     NodeId          33
      AmpId           0
   Sequence           1
 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/09513780/object_33_0_1.parquet
 ObjectSize       19136
RecordCount        2941

     NodeId          33
      AmpId           0
   Sequence           1
 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/09424900/object_33_0_1.parquet
 ObjectSize       23174
RecordCount        2946

     NodeId          33
      AmpId           0
   Sequence           1
ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/flowdata_manifest.json
ObjectSize          513
RecordCount           ?

[...]
Things to note in this example:
  • Each row of the result set describes one object written to external storage. The data objects with names ending in the .parquet extension contain data from Vantage. The object with name ending in the .json extension is the manifest file, which is written after the other data objects.
  • Each data object contains data from one partition created according to the PARTITION BY clause. In this case there is one partition for each data collection site, that is, for each SiteNo column value.
  • The NAMING('DISCRETE') parameter causes the object names to include the partition name, in this case the object names include the values for SiteNo. In the object name,YOUR-STORAGE-ACCOUNT/RiverFlowPerm_PartitionBy/09497500/object_33_0_1.parquet, "09497500" is the SiteNo of the data contained by this object.
  • Because the query included INCLUDE_ORDERING('TRUE'), the object data includes data from the ORDER BY column, in this case the site number from which the data in that row was collected.
  • The MANIFESTFILE option causes a manifest file to be written to external storage. The manifest file lists all objects that WRITE_NOS wrote.

Write All Data from a Vantage Table to External Storage with Hashing

SELECT NodeId, AmpId, Sequence, ObjectName, ObjectSize, RecordCount
FROM WRITE_NOS_FM (
 ON  ( SELECT * FROM RiverFlowPerm )
  HASH BY SiteNo LOCAL ORDER BY SiteNo
 USING
  LOCATION('YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/')
  STOREDAS('PARQUET')
  NAMING('RANGE')
  MANIFESTFILE('YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/flowdata_manifest.json')
  INCLUDE_ORDERING('TRUE')
  INCLUDE_HASHBY('TRUE')
  MAXOBJECTSIZE('4MB')
  COMPRESSION('GZIP')
) AS d 
ORDER BY AmpId;

Four rows from an example result set are presented here in a vertical layout (column names on the left) for a more efficient presentation:

     NodeId          33
      AmpId           0
   Sequence           1
 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/09394500/09513780/object_33_0_1.parquet
 ObjectSize       79398
RecordCount       11778

     NodeId          33
      AmpId           1
   Sequence           1
 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/09380000/09429070/object_33_1_1.parquet
 ObjectSize       84999
RecordCount        8926

     NodeId          33
      AmpId           2
   Sequence           1
 ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/09396100/09400815/object_33_2_1.parquet
 ObjectSize       47659
RecordCount        6765

     NodeId          33
      AmpId           0
   Sequence           1
ObjectName YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/flowdata_manifest.json
ObjectSize          624
RecordCount           ?

[...]
Things to note in this example:
  • Each row of the result set describes one object written to external storage. The data objects with names ending in the .parquet extension contain data from Vantage. The object with name ending in the .json extension is the manifest file, which is written after the other data objects.
  • Each object contains data grouped according to to the HASH BY clause. In this example it determines how the database table row data is distributed and grouped to different AMPs using the hash of the SiteNo column values. In this case there is one partition for the data on each AMP, which may include a combination of data rows from different collection sites.
  • The NAMING('RANGE') parameter causes the object names to include the upper and lower values in the range of site numbers which data the object contains.In the object name,YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/09394500/09513780/object_33_0_1.parquet contains data from multiple site numbers between site 09394500 and 9513780, inclusive.
  • Because the query included INCLUDE_ORDERING('TRUE') and INCLUDE_HASHBY('TRUE'), the object data includes data from the ORDER BY and HASH BY columns. In this case both are the same column, SiteNo, so the data contained in the objects includes the the site number from which the data in that row was collected.
  • The MANIFESTFILE option causes a manifest file to be written to external storage. The manifest file lists all objects that WRITE_NOS wrote.