When reading JSON file in teradatamlspk, the file must be in cloud storage.
Output vary. See following examples.
PySpark
>>> spark.read.options(header=True).json('path.json').show()
+-----+----+-----+----+-----+-----+ | col1|col2|col22|col3|col32| col4| +-----+----+-----+----+-----+-----+ | val1|val2| null|val3| null| val4| |val12|null|val22|null|val32|val42| +-----+----+-----+----+-----+-----+
teradatamlspk
>>> spark.read.options(authorization = {"Access_ID": id, "Access_Key": key}).json(path = "/connector/bucket.endpoint/[key_prefix]")
+--------------------+---------------+---------------+----------------+------------+----------+--------------------+ | Location|ObjectVersionId|ObjectTimeStamp|OffsetIntoObject|ObjectLength|ExtraField| Payload| +--------------------+---------------+---------------+----------------+------------+----------+--------------------+ |/S3/s3.amazonaws.com| None| None| 67| 70| None|{"col1": "val12", "c| |/S3/s3.amazonaws.com| None| None| 1| 64| None|{"col1": "val1", "co| +--------------------+---------------+---------------+----------------+------------+----------+--------------------+
using format
spark.read.options(authorization = {"Access_ID": id, "Access_Key": key}).format("json").load(path = "/connector/bucket.endpoint/[key_prefix]").show()
If the JSON file is in local file system, you should load it to Vantage as shown in the following example.
spark.read.options(header=True).json('path.json').show()