Accessing Columns and Path Variables using DataFrame.from_query | NOS teradataml - Accessing Columns and Path Variables using DataFrame.from_query() - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2024
Language
English (United States)
Last Update
2024-10-10
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
lifecycle
latest
Product Category
Teradata Vantage

User can access actual columns and path variables using DataFrame.from_query() that pass a SELECT query with each column from JSON or CSV data projected from foreign table. Each column must be typecast to a valid type and then aliased to the appropriate column name. This allows user to access actual columns and keys in the JSON or CSV data. It is up to the user on what must be selected in SELECT query passed to "DataFrame.from_query()": columns, attributes, keys from JSON or CSV data and path variables.

Example for JSON data

SELECT query with each column from JSON selected.

# Select query with each column from JSON selected. Each column is type casted to a valid type and 
# then aliased to the required column name. Notice, we are selecting each attribute including path variables.

query = """SELECT CAST($path.$siteno AS CHAR(10)) TheSite,
       CAST($path.$year AS CHAR(4)) TheYear,
       CAST($path.$month AS CHAR(2)) TheMonth,
       CAST($path.$day AS CHAR(2)) TheDay,
       CAST(payload.site_no AS CHAR(8)) Site_no,
       CAST(payload.Flow AS FLOAT) Flow,
       CAST(payload.GageHeight AS FLOAT) GageHeight1,
       CAST(payload.Precipitation AS FLOAT) Precipitation,
       CAST(payload.Temp AS FLOAT) Temperature,
       CAST(payload.Velocity AS FLOAT) Velocity,
       CAST(payload.BatteryVoltage AS FLOAT) BatteryVoltage,
       CAST(payload.GageHeight2 AS FLOAT) GageHeight2
FROM riverflow"""

Create DataFrame from the query and display the head of the DataFrame.

>>> wrk1df = DataFrame.from_query(query)
>>> wrk1df.head().to_pandas()
     TheSite   TheYear TheMonth TheDay Site_no	 Flow	GageHeight1 Precipitation Temperature Velocity BatteryVoltage GageHeight2
0	09380000	2018	07	14	09380000	11400.0	8.99	 0.0	 11.4	    None	 None	       None
1	09380000	2018	07	27	09380000	11100.0	8.91 	0.0	 11.0	    None	 None	       None
2	09380000	2018	07	25	09380000	18600.0	10.41	 0.0	11.7	    None	 None	       None
3	09380000	2018	06	29	09380000	16700.0	10.07 	0.0	11.3	    None	 None	       None
4	09380000	2018	07	03	09380000	18900.0	10.46	 0.0	11.0	    None	 None	       None
5	09380000	2018	07	13	09380000	11000.0	8.90	 0.0	 10.8	    None	 None	       None
6	09380000	2018	07	15	09380000	11400.0	8.99 	0.0	 12.0	    None	 None	       None
7	09380000	2018	07	18	09380000	11300.0	8.96	 0.0	 10.7	    None	 None	       None
8	09380000	2018	06	30	09380000	    0.0	0.00 	0.0	 11.6	    None	 None	       None
9	09380000	2018	07	16	09380000	16700.0	10.07	 0.0	11.0	    None	 None	       None

Example for CSV data

SELECT query with each column from CSV selected.

# Select query with each column from CSV selected. Each column is type casted to a valid type and 
# then aliased to the required column name. Notice, we are selecting each attribute including path variables.

query = """SELECT CAST($path.$siteno AS CHAR(10)) TheSite,
       CAST($path.$year AS CHAR(4)) TheYear,
       CAST($path.$month AS CHAR(2)) TheMonth,
       CAST($path.$day AS CHAR(2)) TheDay,
       CAST(payload..site_no AS CHAR(8)) Site_no,
       CAST(payload..Flow AS FLOAT) Flow,
       CAST(payload..GageHeight AS FLOAT) GageHeight1,
       CAST(payload..Precipitation AS FLOAT) Precipitation,
       CAST(payload..Temp AS FLOAT) Temperature,
       CAST(payload..Velocity AS FLOAT) Velocity,
       CAST(payload..BatteryVoltage AS FLOAT) BatteryVoltage,
       CAST(payload..GageHeight2 AS FLOAT) GageHeight2
FROM riverflowcsv"""

Create DataFrame from the query and display the head of the DataFrame.

>>> wrk1df = DataFrame.from_query(query)
>>> wrk1df.head().to_pandas()
	TheSite	TheYear TheMonth TheDay Site_no	Flow	GageHeight1 Precipitation	Temperature Velocity BatteryVoltage GageHeight2
0	09380000	2018	06	30	09380000	11700.0	9.04	  0.0	10.0	    None	None	       None
1	09380000	2018	07	04	09380000	11400.0	8.99	  0.0	11.9	    None	None	       None
2	09380000	2018	07	11	09380000	18700.0	10.42	  0.0	11.2	    None	None	       None
3	09380000	2018	06	29	09380000	11600.0	9.02  	0.0	10.2	    None	None	       None
4	09380000	2018	07	06	09380000	16600.0	10.05	  0.0	10.8	    None	None	       None
5	09380000	2018	07	22	09380000	11500.0	9.00	  0.0	10.9	    None	None	       None
6	09380000	2018	07	11	09380000	15200.0	9.79	  0.0	10.9	    None	None	       None
7	09380000	2018	06	30	09380000	10000.0	8.65	  0.0	11.7	    None	None	       None
8	09380000	2018	07	15	09380000	11400.0	8.98  	0.0	11.8	    None	None	       None
9	09380000	2018	06	28	09380000	17000.0	10.12  	0.0	11.6	    None	None	       None