READ_NOS returns Parquet Schema | SELECT SQL Statement | Teradata Vantage - Example: Using READ_NOS with NOSREAD_PARQUET_SCHEMA Return Type - Advanced SQL Engine - Teradata Database

SQL Data Manipulation Language

Product
Advanced SQL Engine
Teradata Database
Release Number
17.00
Published
September 2020
Language
English (United States)
Last Update
2021-01-23
dita:mapPath
qtb1554762060450.ditamap
dita:ditavalPath
lze1555437562152.ditaval
dita:id
B035-1146
lifecycle
previous
Product Category
Teradata Vantage™

This example shows how to use the READ_NOS table operator with the NOSREAD_PARQUET_SCHEMA return type to display information about Parquet-formatted external data.

Sample Query and Output

This example uses the Basic Teradata Query BTEQ utility to submit requests to the Teradata Vantage Advanced SQL Engine. The .foldline and .sidetitles BTEQ options format the results so the column headings appear at the left side of the corresponding column values.

 BTEQ -- Enter your SQL request or BTEQ command:
.foldline

 BTEQ -- Enter your SQL request or BTEQ command:
.sidetitles on

For information about BTEQ, see Basic Teradata® Query Reference, B035-2414.

Create an authorization for access to the external data:

BTEQ -- Enter your SQL request or BTEQ command:
CREATE AUTHORIZATION DefAuth_S3
 AS DEFINER TRUSTED
USER 'YOUR-ACCESS-KEY-ID'
PASSWORD 'YOUR-SECRET-ACCESS-KEY';

For information about CREATE AUTHORIZATION, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.

Create a function mapping for the READ_NOS table operator. The mapping is a function alias for your users that can serve to hide the AUTHORIZATION object name from function users. This provides best-practice security for external authorization credentials.

BTEQ -- Enter your SQL request or BTEQ command:
CREATE FUNCTION MAPPING READ_NOS_Parquet_schema
FOR READ_NOS
EXTERNAL SECURITY DEFINER TRUSTED DefAuth
USING 
  LOCATION,
  STOREDAS('PARQUET'),
  FULLSCAN,
  RETURNTYPE('NOSREAD_PARQUET_SCHEMA');
You must define any parameters for the function you want users to be able to specify when they call your mapped (aliased) function. For information on CREATE FUNCTION MAPPING, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.

To get schema information for the external Parquet data, call the READ_NOS function mapping, passing the LOCATION information identifying the external object store data, and FULLSCAN value indicating that you want Vantage to scan all the data for variable-length data types to determine the size of the largest-length value in the Parquet data.

SELECT * 
FROM READ_NOS_Parquet_schema(
 USING
   LOCATION ('/S3/td-usgs/s3.amazonaws.com/PARQUETDATA/09394500/2018/06/27.parquet')
   FULLSCAN ('TRUE')
)
as dt;

Below is a portion of the output, showing data type information for every column of Parquet data. Your result will be similar to the following:

        Location  /S3/s3.amazonaws.com/td-usgs/PARQUETDATA/09394500/2018/06/27.parquet
        ParquetColumnName  GageHeight2
ParquetColumnPhysicalType  DOUBLE
 ParquetColumnLogicalType  NONE
             TDColumnType  FLOAT
   ParquetColumnPrecision            0
       ParquetColumnScale            0
   ParquetColumnMinLength            0
   ParquetColumnMaxLength            0
      ParquetColumnIsNull     1
         ParquetColumnPos     1
 
                 Location  /S3/s3.amazonaws.com/td-usgs/PARQUETDATA/09394500/2018/06/27.parquet
        ParquetColumnName  Flow
ParquetColumnPhysicalType  DOUBLE
 ParquetColumnLogicalType  NONE
             TDColumnType  FLOAT
   ParquetColumnPrecision            0
       ParquetColumnScale            0
   ParquetColumnMinLength            0
   ParquetColumnMaxLength            0
      ParquetColumnIsNull     1
         ParquetColumnPos     2
 
                 Location  /S3/s3.amazonaws.com/td-usgs/PARQUETDATA/09394500/2018/06/27.parquet
        ParquetColumnName  site_no
ParquetColumnPhysicalType  INT64
P arquetColumnLogicalType  NONE
             TDColumnType  BIGINT
   ParquetColumnPrecision            0
       ParquetColumnScale            0
   ParquetColumnMinLength            0
   ParquetColumnMaxLength            0
      ParquetColumnIsNull     1
         ParquetColumnPos     3
 
                 Location  /S3/s3.amazonaws.com/td-usgs/PARQUETDATA/09394500/2018/06/27.parquet
        ParquetColumnName  datetime
ParquetColumnPhysicalType  BYTE_ARRAY
ParquetColumnLogicalType  UTF8
             TDColumnType  VARCHAR
   ParquetColumnPrecision            0
       ParquetColumnScale            0
   ParquetColumnMinLength           16
   ParquetColumnMaxLength        16
      ParquetColumnIsNull     1
         ParquetColumnPos     4

Notice also that the VARCHAR column shows a ParquetColumnMaxLength of 16 characters, because that is the length of the largest value in that column of the Parquet data. If the FULLSCAN value had been set to FALSE, the ParquetColumnMaxLength value would have been 32000 characters, the Vantage default maximum length value for VARCHAR data types using the UNICODE server character set.