Write All Rows to External Storage with Hashing | NOS | Teradata Vantage - Example: Offload All Data to External Object Storage with Hashing - Advanced SQL Engine

Write All Rows to External Storage with Hashing | NOS | Teradata Vantage - Example: Offload All Data to External Object Storage with Hashing - Advanced SQL Engine - Teradata Database

Teradata Vantage™ - Native Object Store Getting Started Guide

Product

Advanced SQL Engine

Teradata Database

Release Number

17.05

Published

January 2021

Language

English (United States)

Last Update

2021-01-22

dita:mapPath

zws1595641486108.ditamap

dita:ditavalPath

hoy1596145193032.ditaval

dita:id

B035-1214

lifecycle

Product Category

Software

Teradata Vantage

The example shows how to write all the rows in a table from Advanced SQL Engine to a Teradata-supported external object store.

Prerequisites

If not already done, set up an authorization to access external object store. See Setting Up Authorization for WRITE_NOS Examples.
If not already done, set up a database table to extract data from. You can use one of your existing tables or set up a test table to use with the examples. See Setting Up a Test Table.
If not already done, set up or obtain access to the external object store where you want to write data. Follow the instructions from your external storage vendor.

Example: Write All Rows to External Object Store, Use HASH BY and NAMING('RANGE')

Write all rows using HASH BY SiteNo, NAMING('RANGE'):
```
SELECT NodeId, AmpId, Sequence, ObjectName, ObjectSize, RecordCount
FROM WRITE_NOS_FM (
 ON  ( SELECT * FROM RiverFlowPerm )
  HASH BY SiteNo LOCAL ORDER BY SiteNo
 USING
  LOCATION('YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/')
  STOREDAS('PARQUET')
  COMPRESSION('GZIP')
  NAMING('RANGE')
  INCLUDE_ORDERING('TRUE')
  INCLUDE_HASHBY('TRUE')
  MAXOBJECTSIZE('4MB')
) AS d 
ORDER BY AmpId;
```
Replace LOCATION with the URI to the external object store where you want to write the data.
Things to note in this example:
- Each row of the result set describes one object written to external storage that contains data from Vantage.
- Each object contains data grouped according to to the HASH BY clause. In this example it determines how the database table row data is distributed and grouped to different AMPs using the hash of the SiteNo column values. In this case there is one partition for the data on each AMP, which may include a combination of data rows from different collection sites.
- NAMING('RANGE') causes the object names to include the upper and lower values in the range of site numbers which data the object contains.
  In the object name /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy/09394500/09513780/object_33_0_1.parquet contains data from multiple site numbers between sites 09394500 and 09513780, inclusive.
- Because the query included INCLUDE_ORDERING('TRUE') and INCLUDE_HASHBY('TRUE'), the objects include data from the ORDER BY and HASH BY columns. In this case both are the same column, SiteNo, so the data contained in the objects includes the values from the SiteNo column in each row of data, indicating where the data in that row was collected.
Your result will be similar to:
```
     NodeId          33
      AmpId           0
   Sequence                    1
 ObjectName /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy/09394500/09513780/object_33_0_1.parquet
 ObjectSize                79398
RecordCount                11778

     NodeId          33
      AmpId           1
   Sequence                    1
 ObjectName /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy/09380000/09429070/object_33_1_1.parquet
 ObjectSize                84999
RecordCount                 8926

     NodeId          33
      AmpId           2
   Sequence                    1
 ObjectName /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy/09396100/09400815/object_33_2_1.parquet
 ObjectSize                47659
RecordCount                 6765
[...]
```
For readability, the output is displayed vertically.

Example: Write All Rows to External Object Store, Use HASH BY and NAMING('DISCRETE')

Write all rows using HASH BY and NAMING('DISCRETE'):
```
SELECT NodeId, AmpId, Sequence, ObjectName, ObjectSize, RecordCount
FROM WRITE_NOS_FM (
 ON  ( SELECT * FROM RiverFlowPerm )
  HASH BY SiteNo, Conductance LOCAL ORDER BY SiteNo
 USING
  LOCATION('YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy_Discrete/')
  STOREDAS('PARQUET')
  COMPRESSION('GZIP')
  NAMING('DISCRETE')
  INCLUDE_ORDERING('TRUE')
  INCLUDE_HASHBY('TRUE')
  MAXOBJECTSIZE('4MB')
) AS d 
ORDER BY AmpId;
```
Replace LOCATION with the URI to the external object store where you want to write the data.
Things to note in this example:
- Each row of the result set describes one object written to external storage that contains data from Vantage.
- Each object contains data grouped according to to the HASH BY clause. In this example it determines how the database table row data is distributed and grouped to different AMPs using the hash of the SiteNo column values. In this case there is one partition for the data on each AMP, which may include a combination of data rows from different collection sites.
- NAMING('DISCRETE') causes the object names to include the partition name, in this case the object names include the values for SiteNo. In the object name, /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_Discrete/09424900/object_33_0_1.parquet, "09424900" is the SiteNo of the data contained by this object.
- Because the query included INCLUDE_ORDERING('TRUE') and INCLUDE_HASHBY('TRUE'), the objects include data from the ORDER BY and HASH BY columns. In this case both are the same column, SiteNo, so the data contained in the objects includes the values from the SiteNo column in each row of data, indicating where the data in that row was collected.
Your result will be similar to:
```
     NodeId          33
      AmpId           0
   Sequence                    3
 ObjectName /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy_Discrete/09424900/object_33_0_3.parquet
 ObjectSize                23551
RecordCount                 2946

     NodeId          33
      AmpId           0
   Sequence                    5
 ObjectName /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy_Discrete/09513780/object_33_0_5.parquet
 ObjectSize                18668
RecordCount                 2941

     NodeId          33
      AmpId           0
   Sequence                    4
 ObjectName /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy_Discrete/09497500/object_33_0_4.parquet
 ObjectSize                27504
RecordCount                 2944

[...]
```
For readability, the output is displayed vertically.