This example shows how to use the READ_NOS table operator with the NOSREAD_PARQUET_SCHEMA return type to display information about Parquet-formatted external data.
Sample Query and Output
This example uses the Basic Teradata Query BTEQ utility to submit requests to the Teradata Vantage Advanced SQL Engine. The .foldline and .sidetitles BTEQ options format the results so the column headings appear at the left side of the corresponding column values.
BTEQ -- Enter your SQL request or BTEQ command: .foldline BTEQ -- Enter your SQL request or BTEQ command: .sidetitles on
For information about BTEQ, see Basic Teradata® Query Reference, B035-2414.
Create an authorization for access to the external data:
BTEQ -- Enter your SQL request or BTEQ command: CREATE AUTHORIZATION DefAuth_S3 AS DEFINER TRUSTED USER 'YOUR-ACCESS-KEY-ID' PASSWORD 'YOUR-SECRET-ACCESS-KEY';
For information about CREATE AUTHORIZATION, see Teradata Vantage™ - SQL Data Definition Language Syntax and Examples, B035-1144.
Create a function mapping for the READ_NOS table operator. The mapping is a function alias for your users that can serve to hide the AUTHORIZATION object name from function users. This provides best-practice security for external authorization credentials.
BTEQ -- Enter your SQL request or BTEQ command: CREATE FUNCTION MAPPING READ_NOS_Parquet_schema FOR READ_NOS EXTERNAL SECURITY DEFINER TRUSTED DefAuth USING LOCATION, STOREDAS('PARQUET'), FULLSCAN, RETURNTYPE('NOSREAD_PARQUET_SCHEMA');
To get schema information for the external Parquet data, call the READ_NOS function mapping, passing the LOCATION information identifying the external object store data, and FULLSCAN value indicating that you want Vantage to scan all the data for variable-length data types to determine the size of the largest-length value in the Parquet data.
SELECT * FROM READ_NOS_Parquet_schema( USING LOCATION ('/S3/td-usgs/s3.amazonaws.com/PARQUETDATA/09394500/2018/06/27.parquet') FULLSCAN ('TRUE') ) as dt;
Below is a portion of the output, showing data type information for every column of Parquet data. Your result will be similar to the following:
Location /S3/s3.amazonaws.com/td-usgs/PARQUETDATA/09394500/2018/06/27.parquet
ParquetColumnName GageHeight2
ParquetColumnPhysicalType DOUBLE
ParquetColumnLogicalType NONE
TDColumnType FLOAT
ParquetColumnPrecision 0
ParquetColumnScale 0
ParquetColumnMinLength 0
ParquetColumnMaxLength 0
ParquetColumnIsNull 1
ParquetColumnPos 1
Location /S3/s3.amazonaws.com/td-usgs/PARQUETDATA/09394500/2018/06/27.parquet
ParquetColumnName Flow
ParquetColumnPhysicalType DOUBLE
ParquetColumnLogicalType NONE
TDColumnType FLOAT
ParquetColumnPrecision 0
ParquetColumnScale 0
ParquetColumnMinLength 0
ParquetColumnMaxLength 0
ParquetColumnIsNull 1
ParquetColumnPos 2
Location /S3/s3.amazonaws.com/td-usgs/PARQUETDATA/09394500/2018/06/27.parquet
ParquetColumnName site_no
ParquetColumnPhysicalType INT64
P arquetColumnLogicalType NONE
TDColumnType BIGINT
ParquetColumnPrecision 0
ParquetColumnScale 0
ParquetColumnMinLength 0
ParquetColumnMaxLength 0
ParquetColumnIsNull 1
ParquetColumnPos 3
Location /S3/s3.amazonaws.com/td-usgs/PARQUETDATA/09394500/2018/06/27.parquet
ParquetColumnName datetime
ParquetColumnPhysicalType BYTE_ARRAY
ParquetColumnLogicalType UTF8
TDColumnType VARCHAR
ParquetColumnPrecision 0
ParquetColumnScale 0
ParquetColumnMinLength 16
ParquetColumnMaxLength 16
ParquetColumnIsNull 1
ParquetColumnPos 4
Notice also that the VARCHAR column shows a ParquetColumnMaxLength of 16 characters, because that is the length of the largest value in that column of the Parquet data. If the FULLSCAN value had been set to FALSE, the ParquetColumnMaxLength value would have been 32000 characters, the Vantage default maximum length value for VARCHAR data types using the UNICODE server character set.