Read JSON File | teradatamlspk | pyspark2teradataml - Read JSON File - Teradata Package for Python

Teradata® pyspark2teradataml User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2024
Language
English (United States)
Last Update
2024-04-11
dita:mapPath
oeg1710443196055.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
oeg1710443196055
Product Category
Teradata Vantage

When reading JSON file in teradatamlspk, the file must be in cloud storage.

Output vary. See following examples.

PySpark

>>> spark.read.options(header=True).json('path.json').show()
+-----+----+-----+----+-----+-----+
| col1|col2|col22|col3|col32| col4|
+-----+----+-----+----+-----+-----+
| val1|val2| null|val3| null| val4|
|val12|null|val22|null|val32|val42|
+-----+----+-----+----+-----+-----+

teradatamlspk

>>> spark.read.options(authorization = {"Access_ID": id, "Access_Key": key}).json(path = "/connector/bucket.endpoint/[key_prefix]")
+--------------------+---------------+---------------+----------------+------------+----------+--------------------+
|            Location|ObjectVersionId|ObjectTimeStamp|OffsetIntoObject|ObjectLength|ExtraField|             Payload|
+--------------------+---------------+---------------+----------------+------------+----------+--------------------+
|/S3/s3.amazonaws.com|           None|           None|              67|          70|      None|{"col1": "val12", "c|
|/S3/s3.amazonaws.com|           None|           None|               1|          64|      None|{"col1": "val1", "co|
+--------------------+---------------+---------------+----------------+------------+----------+--------------------+

If the JSON file is in local file system, you should load it to Vantage as shown in the following example.

>>> import pandas as pd
>>> dict_values = {"id": [1, 2, 3], "int_col": [21, 22, 23], "float_col": [21.2, 22.6, 23.5], "str_col": ["Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Florence Briggs Thayer)", "Palsson, Master. Gosta Leonard"]}
>>> pandas_df = pd.DataFrame(dict_values)
>>> from teradataml import copy_to_sql
>>> copy_to_sql(pandas_df, table_name= "new_table", if_exists= "replace")
>>> df = teraspark_session.createDataFrame("new_table")