Read CSV File | teradatamlspk | pyspark2teradataml - Read CSV File - Teradata Package for Python

Teradata® pyspark2teradataml User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
December 2024
ft:locale
en-US
ft:lastEdition
2024-12-18
dita:mapPath
oeg1710443196055.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
oeg1710443196055
Product Category
Teradata Vantage

When reading CSV file in teradatamlspk, the file could be in cloud storage or in local file system.

PySpark

pyspark_session.read.options(header=True).csv(r'admissions_train.csv').show()

teradatamlspk

spark.read.options(authorization = {"Access_ID": id, "Access_Key": key}).csv(path = "/connector/bucket.endpoint/[key_prefix]").show()

If the CSV file is in local file system, schema is mandatory, teradatamlspk does not infer schema.

You should load it to Vantage as shown in the following example.

>>> schema = StructType([
      StructField("long", FloatType(), nullable=True),
      StructField("lat", FloatType(), nullable=True),
      StructField("medage", FloatType(), nullable=True),
      StructField("totrooms", FloatType(), nullable=True),
      StructField("totbdrms", FloatType(), nullable=True),
      StructField("pop", FloatType(), nullable=True),
      StructField("houshlds", FloatType(), nullable=True),
      StructField("medinc", FloatType(), nullable=True),
      StructField("medhv", FloatType(), nullable=True)]
      )
>>> spark.read.csv(path=HOUSING_DATA, schema=schema, header=True).cache().show()

using format for Cloud Platform

spark.read.options(authorization = {"Access_ID": id, "Access_Key": key}).format("csv").load(path = "/connector/bucket.endpoint/[key_prefix]").show()

using format for Local File System

spark.read.format("csv").load(path = "local_path", schema = schema, header=True).cache().show()