Data in JSON or CSV Format | Native Object Store | Teradata Python Package - Data in JSON or CSV Format - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Published
November 2021
Language
English (United States)
Last Update
2022-01-14
dita:mapPath
bol1585763678431.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage
Foreign table created in JSON or comma-separated values (CSV) format usually contains two columns:
  • Location
  • Payload

User can create a teradataml DataFrame on a foreign table using "DataFrame()" or "DataFrame.from_table()", the same way to create a teradataml DataFrame on a regular table. With the created DataFrame, user can easily access the data in these columns and process the data using teradataml DataFrame API or other Python packages.

The following sections show how to create teradataml DataFrames on NOS foreign tables.

Create teradataml DataFrame on NOS foreign table in JSON data format

Assume that the following foreign table has been created on JSON data in Amazon S3 bucket:
CREATE MULTISET FOREIGN TABLE riverflow ,FALLBACK ,
     EXTERNAL SECURITY DEFINER TRUSTED AUTH_OBJECT ,
     MAP = TD_MAP1
     (
      Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
      Payload JSON(8388096) INLINE LENGTH 32000 CHARACTER SET UNICODE)
USING
(
      LOCATION  ('/s3/s3.amazonaws.com/td-usgs/DATA/')
      PATHPATTERN  ('$data/$siteno/$year/$month/$day')
);
  • Create a dataframe on foreign table.
    # Create dataframe on foreign table which contains data in JSON format on S3
    >>> riverflow = DataFrame.from_table("riverflow")
  • Display the columns in the DataFrame.
    # As seen from create table statement, table has two columns 'Location' and 'Payload'
    >>> riverflow.columns
    ['Location', 'Payload']
  • Check the types of the columns.
    # Let's check their types
    >>> riverflow.dtypes
    Location    str
    Payload     str
  • Print the content of the table.
    # Let's print the DataFrame.
    # Equivalent SQL: 
    #      SELECT * FROM riverflow
    >>> riverflow.head().to_pandas()
    	Location                                                Payload
    0	/S3/s3.amazonaws.com/td-usgs/DATA/09380000/201...	{ "site_no":"09380000", "datetime":"2018-06-27...
    1	/S3/s3.amazonaws.com/td-usgs/DATA/09380000/201...	{ "site_no":"09380000", "datetime":"2018-06-27...
    2	/S3/s3.amazonaws.com/td-usgs/DATA/09380000/201...	{ "site_no":"09380000", "datetime":"2018-06-27...
    3	/S3/s3.amazonaws.com/td-usgs/DATA/09380000/201...	{ "site_no":"09380000", "datetime":"2018-06-27...
    4	/S3/s3.amazonaws.com/td-usgs/DATA/09380000/201...	{ "site_no":"09380000", "datetime":"2018-06-27...
    5	/S3/s3.amazonaws.com/td-usgs/DATA/09380000/201...	{ "site_no":"09380000", "datetime":"2018-06-27...
    6	/S3/s3.amazonaws.com/td-usgs/DATA/09380000/201...	{ "site_no":"09380000", "datetime":"2018-06-27...
    7	/S3/s3.amazonaws.com/td-usgs/DATA/09380000/201...	{ "site_no":"09380000", "datetime":"2018-06-27...
    8	/S3/s3.amazonaws.com/td-usgs/DATA/09380000/201...	{ "site_no":"09380000", "datetime":"2018-06-27...
    9	/S3/s3.amazonaws.com/td-usgs/DATA/09380000/201...	{ "site_no":"09380000", "datetime":"2018-06-27...

Create teradataml DataFrame on NOS foreign table in CSV data format

Assume that the following foreign table has been created on CSV data in Amazon S3 bucket:
CREATE FOREIGN TABLE riverflowcsv
, EXTERNAL SECURITY DEFINER TRUSTED AUTH_OBJECT
(
  Location VARCHAR(2048) CHARACTER SET UNICODE CASESPECIFIC,
  PAYLOAD DATASET INLINE LENGTH 64000 STORAGE FORMAT CSV)
USING
(
  LOCATION('/s3/s3.amazonaws.com/td-usgs/CSVDATA/')
  PATHPATTERN  ('$data/$siteno/$year/$month/$day')
);
  • Create a dataframe on foreign table.
    # Create dataframe on foreign table which contains data in CSV format on S3
    >>> riverflow = DataFrame.from_table("riverflowcsv")
  • Display the columns in the DataFrame.
    # As seen from create table statement, table has two columns 'Location' and 'Payload'
    >>> riverflow.columns
    ['Location', 'Payload']
  • Check the types of the columns.
    # Let's check their types
    >>> riverflow.dtypes
    Location    str
    Payload     str
  • Print the content of the table.
    # Let's print the DataFrame.
    # Equivalent SQL: 
    #      SELECT * FROM riverflow
    >>> riverflow.head().to_pandas()
    	Location			                                  PAYLOAD
    0	/S3/s3.amazonaws.com/td-usgs/CSVDATA/09380000/...	Temp,Flow,site_no,datetime,Conductance,Precipi...
    1	/S3/s3.amazonaws.com/td-usgs/CSVDATA/09380000/...	Temp,Flow,site_no,datetime,Conductance,Precipi...
    2	/S3/s3.amazonaws.com/td-usgs/CSVDATA/09380000/...	Temp,Flow,site_no,datetime,Conductance,Precipi...
    3	/S3/s3.amazonaws.com/td-usgs/CSVDATA/09380000/...	Temp,Flow,site_no,datetime,Conductance,Precipi...
    4	/S3/s3.amazonaws.com/td-usgs/CSVDATA/09380000/...	Temp,Flow,site_no,datetime,Conductance,Precipi...
    5	/S3/s3.amazonaws.com/td-usgs/CSVDATA/09380000/...	Temp,Flow,site_no,datetime,Conductance,Precipi...
    6	/S3/s3.amazonaws.com/td-usgs/CSVDATA/09380000/...	Temp,Flow,site_no,datetime,Conductance,Precipi...
    7	/S3/s3.amazonaws.com/td-usgs/CSVDATA/09380000/...	Temp,Flow,site_no,datetime,Conductance,Precipi...
    8	/S3/s3.amazonaws.com/td-usgs/CSVDATA/09380000/...	Temp,Flow,site_no,datetime,Conductance,Precipi...
    9	/S3/s3.amazonaws.com/td-usgs/CSVDATA/09380000/...	Temp,Flow,site_no,datetime,Conductance,Precipi...

How to access actual data and path variables

Though teradataml provides direct access to the data in foreign tables, actual data that resides in 'Payload' column and path variables for the foreign table can be accessed in one of the following ways: