When reading CSV file in teradatamlspk, the file could be in cloud storage or in local file system.
PySpark
pyspark_session.read.options(header=True).csv(r'admissions_train.csv').show()
teradatamlspk
spark.read.options(authorization = {"Access_ID": id, "Access_Key": key}).csv(path = "/connector/bucket.endpoint/[key_prefix]").show()
If the CSV file is in local file system, schema is mandatory, teradatamlspk does not infer schema.
You should load it to Vantage as shown in the following example.
>>> schema = StructType([ StructField("long", FloatType(), nullable=True), StructField("lat", FloatType(), nullable=True), StructField("medage", FloatType(), nullable=True), StructField("totrooms", FloatType(), nullable=True), StructField("totbdrms", FloatType(), nullable=True), StructField("pop", FloatType(), nullable=True), StructField("houshlds", FloatType(), nullable=True), StructField("medinc", FloatType(), nullable=True), StructField("medhv", FloatType(), nullable=True)] )
>>> spark.read.csv(path=HOUSING_DATA, schema=schema, header=True).cache().show()
using format for Cloud Platform
spark.read.options(authorization = {"Access_ID": id, "Access_Key": key}).format("csv").load(path = "/connector/bucket.endpoint/[key_prefix]").show()
using format for Local File System
spark.read.format("csv").load(path = "local_path", schema = schema, header=True).cache().show()