User can access actual columns and path variables using DataFrame.from_query() that pass a SELECT query with each column from JSON or CSV data projected from foreign table. Each column must be typecast to a valid type and then aliased to the appropriate column name. This allows user to access actual columns and keys in the JSON or CSV data. It is up to the user on what must be selected in SELECT query passed to "DataFrame.from_query()": columns, attributes, keys from JSON or CSV data and path variables.
Example for JSON data
SELECT query with each column from JSON selected.
# Select query with each column from JSON selected. Each column is type casted to a valid type and # then aliased to the required column name. Notice, we are selecting each attribute including path variables. query = """SELECT CAST($path.$siteno AS CHAR(10)) TheSite, CAST($path.$year AS CHAR(4)) TheYear, CAST($path.$month AS CHAR(2)) TheMonth, CAST($path.$day AS CHAR(2)) TheDay, CAST(payload.site_no AS CHAR(8)) Site_no, CAST(payload.Flow AS FLOAT) Flow, CAST(payload.GageHeight AS FLOAT) GageHeight1, CAST(payload.Precipitation AS FLOAT) Precipitation, CAST(payload.Temp AS FLOAT) Temperature, CAST(payload.Velocity AS FLOAT) Velocity, CAST(payload.BatteryVoltage AS FLOAT) BatteryVoltage, CAST(payload.GageHeight2 AS FLOAT) GageHeight2 FROM riverflow"""
Create DataFrame from the query and display the head of the DataFrame.
>>> wrk1df = DataFrame.from_query(query) >>> wrk1df.head().to_pandas()
TheSite TheYear TheMonth TheDay Site_no Flow GageHeight1 Precipitation Temperature Velocity BatteryVoltage GageHeight2 0 09380000 2018 07 14 09380000 11400.0 8.99 0.0 11.4 None None None 1 09380000 2018 07 27 09380000 11100.0 8.91 0.0 11.0 None None None 2 09380000 2018 07 25 09380000 18600.0 10.41 0.0 11.7 None None None 3 09380000 2018 06 29 09380000 16700.0 10.07 0.0 11.3 None None None 4 09380000 2018 07 03 09380000 18900.0 10.46 0.0 11.0 None None None 5 09380000 2018 07 13 09380000 11000.0 8.90 0.0 10.8 None None None 6 09380000 2018 07 15 09380000 11400.0 8.99 0.0 12.0 None None None 7 09380000 2018 07 18 09380000 11300.0 8.96 0.0 10.7 None None None 8 09380000 2018 06 30 09380000 0.0 0.00 0.0 11.6 None None None 9 09380000 2018 07 16 09380000 16700.0 10.07 0.0 11.0 None None None
Example for CSV data
SELECT query with each column from CSV selected.
# Select query with each column from CSV selected. Each column is type casted to a valid type and # then aliased to the required column name. Notice, we are selecting each attribute including path variables. query = """SELECT CAST($path.$siteno AS CHAR(10)) TheSite, CAST($path.$year AS CHAR(4)) TheYear, CAST($path.$month AS CHAR(2)) TheMonth, CAST($path.$day AS CHAR(2)) TheDay, CAST(payload..site_no AS CHAR(8)) Site_no, CAST(payload..Flow AS FLOAT) Flow, CAST(payload..GageHeight AS FLOAT) GageHeight1, CAST(payload..Precipitation AS FLOAT) Precipitation, CAST(payload..Temp AS FLOAT) Temperature, CAST(payload..Velocity AS FLOAT) Velocity, CAST(payload..BatteryVoltage AS FLOAT) BatteryVoltage, CAST(payload..GageHeight2 AS FLOAT) GageHeight2 FROM riverflowcsv"""
Create DataFrame from the query and display the head of the DataFrame.
>>> wrk1df = DataFrame.from_query(query) >>> wrk1df.head().to_pandas()
TheSite TheYear TheMonth TheDay Site_no Flow GageHeight1 Precipitation Temperature Velocity BatteryVoltage GageHeight2 0 09380000 2018 06 30 09380000 11700.0 9.04 0.0 10.0 None None None 1 09380000 2018 07 04 09380000 11400.0 8.99 0.0 11.9 None None None 2 09380000 2018 07 11 09380000 18700.0 10.42 0.0 11.2 None None None 3 09380000 2018 06 29 09380000 11600.0 9.02 0.0 10.2 None None None 4 09380000 2018 07 06 09380000 16600.0 10.05 0.0 10.8 None None None 5 09380000 2018 07 22 09380000 11500.0 9.00 0.0 10.9 None None None 6 09380000 2018 07 11 09380000 15200.0 9.79 0.0 10.9 None None None 7 09380000 2018 06 30 09380000 10000.0 8.65 0.0 11.7 None None None 8 09380000 2018 07 15 09380000 11400.0 8.98 0.0 11.8 None None None 9 09380000 2018 06 28 09380000 17000.0 10.12 0.0 11.6 None None None