Write All Rows to External Storage with Hashing | NOS | Teradata Vantage - 17.05 - Example: Offload All Data to External Object Storage with Hashing - Teradata Database

Teradata Vantageā„¢ - Native Object Store Getting Started Guide

prodname
Advanced SQL Engine
Teradata Database
vrm_release
17.05
created_date
January 2021
category
Programming Reference
featnum
B035-1214-175K
The example shows how to write all the rows in a table from Advanced SQL Engine to a Teradata-supported external object store.

    Prerequisites

  1. If not already done, set up an authorization to access external object store. See Setting Up Authorization for WRITE_NOS Examples.
  2. If not already done, set up a database table to extract data from. You can use one of your existing tables or set up a test table to use with the examples. See Setting Up a Test Table.
  3. If not already done, set up or obtain access to the external object store where you want to write data. Follow the instructions from your external storage vendor.
  4. Example: Write All Rows to External Object Store, Use HASH BY and NAMING('RANGE')

  5. Write all rows using HASH BY SiteNo, NAMING('RANGE'):
    SELECT NodeId, AmpId, Sequence, ObjectName, ObjectSize, RecordCount
    FROM WRITE_NOS_FM (
     ON  ( SELECT * FROM RiverFlowPerm )
      HASH BY SiteNo LOCAL ORDER BY SiteNo
     USING
      LOCATION('YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy/')
      STOREDAS('PARQUET')
      COMPRESSION('GZIP')
      NAMING('RANGE')
      INCLUDE_ORDERING('TRUE')
      INCLUDE_HASHBY('TRUE')
      MAXOBJECTSIZE('4MB')
    ) AS d 
    ORDER BY AmpId;

    Replace LOCATION with the URI to the external object store where you want to write the data.

    Things to note in this example:
    • Each row of the result set describes one object written to external storage that contains data from Vantage.
    • Each object contains data grouped according to to the HASH BY clause. In this example it determines how the database table row data is distributed and grouped to different AMPs using the hash of the SiteNo column values. In this case there is one partition for the data on each AMP, which may include a combination of data rows from different collection sites.
    • NAMING('RANGE') causes the object names to include the upper and lower values in the range of site numbers which data the object contains.

      In the object name /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy/09394500/09513780/object_33_0_1.parquet contains data from multiple site numbers between sites 09394500 and 09513780, inclusive.

    • Because the query included INCLUDE_ORDERING('TRUE') and INCLUDE_HASHBY('TRUE'), the objects include data from the ORDER BY and HASH BY columns. In this case both are the same column, SiteNo, so the data contained in the objects includes the values from the SiteNo column in each row of data, indicating where the data in that row was collected.

    Your result will be similar to:

         NodeId          33
          AmpId           0
       Sequence                    1
     ObjectName /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy/09394500/09513780/object_33_0_1.parquet
     ObjectSize                79398
    RecordCount                11778
    
         NodeId          33
          AmpId           1
       Sequence                    1
     ObjectName /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy/09380000/09429070/object_33_1_1.parquet
     ObjectSize                84999
    RecordCount                 8926
    
         NodeId          33
          AmpId           2
       Sequence                    1
     ObjectName /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy/09396100/09400815/object_33_2_1.parquet
     ObjectSize                47659
    RecordCount                 6765
    [...]

    For readability, the output is displayed vertically.

  6. Example: Write All Rows to External Object Store, Use HASH BY and NAMING('DISCRETE')

  7. Write all rows using HASH BY and NAMING('DISCRETE'):
    SELECT NodeId, AmpId, Sequence, ObjectName, ObjectSize, RecordCount
    FROM WRITE_NOS_FM (
     ON  ( SELECT * FROM RiverFlowPerm )
      HASH BY SiteNo, Conductance LOCAL ORDER BY SiteNo
     USING
      LOCATION('YOUR-STORAGE-ACCOUNT/RiverFlowPerm_HashBy_Discrete/')
      STOREDAS('PARQUET')
      COMPRESSION('GZIP')
      NAMING('DISCRETE')
      INCLUDE_ORDERING('TRUE')
      INCLUDE_HASHBY('TRUE')
      MAXOBJECTSIZE('4MB')
    ) AS d 
    ORDER BY AmpId;

    Replace LOCATION with the URI to the external object store where you want to write the data.

    Things to note in this example:
    • Each row of the result set describes one object written to external storage that contains data from Vantage.
    • Each object contains data grouped according to to the HASH BY clause. In this example it determines how the database table row data is distributed and grouped to different AMPs using the hash of the SiteNo column values. In this case there is one partition for the data on each AMP, which may include a combination of data rows from different collection sites.
    • NAMING('DISCRETE') causes the object names to include the partition name, in this case the object names include the values for SiteNo. In the object name, /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_Discrete/09424900/object_33_0_1.parquet, "09424900" is the SiteNo of the data contained by this object.
    • Because the query included INCLUDE_ORDERING('TRUE') and INCLUDE_HASHBY('TRUE'), the objects include data from the ORDER BY and HASH BY columns. In this case both are the same column, SiteNo, so the data contained in the objects includes the values from the SiteNo column in each row of data, indicating where the data in that row was collected.

    Your result will be similar to:

         NodeId          33
          AmpId           0
       Sequence                    3
     ObjectName /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy_Discrete/09424900/object_33_0_3.parquet
     ObjectSize                23551
    RecordCount                 2946
    
         NodeId          33
          AmpId           0
       Sequence                    5
     ObjectName /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy_Discrete/09513780/object_33_0_5.parquet
     ObjectSize                18668
    RecordCount                 2941
    
         NodeId          33
          AmpId           0
       Sequence                    4
     ObjectName /S3/s3.amazonaws.com/ie-writenos-bucket/RiverFlowPerm_HashBy_Discrete/09497500/object_33_0_4.parquet
     ObjectSize                27504
    RecordCount                 2944
    
    [...]

    For readability, the output is displayed vertically.