read_csv | Teradata Package for Python - read_csv - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.10
Published
May 2022
Language
English (United States)
Last Update
2022-08-18
dita:mapPath
rsu1641592952675.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

The read_csv() function loads data from CSV file into Teradata Vantage. This function is used to quickly load large amounts of data in a table on Vantage using FastLoadCSV protocol.

When load data from CSV file, optional arguments can be used to identify fields in a CSV file.
  • sep specifies a single character string used to separate fields in a CSV file, with default value ' , '.
  • quotechar specifies a single character string used to quote fields in a CSV file, with default value ' " ' (double quote).
Considerations when using a CSV file:
  • Each record is on a separate line of the CSV file. Records are delimited by line breaks (CRLF).
  • The last record in the file may or may not have an ending line break.
  • The first line in the CSV must be header line. The header line lists the column names separated by the field separator (e.g. col1,col2,col3).
Limitations when using a CSV file with FastLoad:
  • read_csv function cannot load duplicate rows in a DataFrame if the table is a MULTISET table having primary index.
  • read_csv function does not support all Teradata Advanced SQL Engine data types.

    For example, target table having BLOB and CLOB data type columns cannot be loaded.

  • If there are some incorrect rows due to constraint violations, data type conversion errors, and so on, FastLoad protocol ignores those rows and inserts all valid rows.
  • Rows in the DataFrame that failed to get inserted are categorized into errors and warnings by FastLoad protocol and these errors and warnings are stored into respective error and warning tables by FastLoad API.

Teradata recommends using FastLoad protocol when number of rows to be loaded is at least 100,000. FastLoad opens multiple data transfer connections to the database, when the argument use_fastload is set to 'True'. The number of data transfer sessions can be set using argument open_sessions. If this argument is not set, by default, data transfer sessions opened by teradataml is the smaller of 8 and the number of available AMPs in Vantage.

Based on the input, the read_csv function returns different output:
  • If use_fastload is set to 'False', it returns teradataml DataFrame.
  • If use_fastload is set to 'True' and catch_errors_warnings is set to 'False', it returns only teradataml DataFrame.
  • If use_fastload is set to 'True' and catch_errors_warnings is set to 'True', it returns a tuple containing teradataml DataFrame and a dict containing the following attributes:
    • errors_dataframe: It is a Pandas DataFrame containing error messages thrown by fastload.

      DataFrame is empty if there are no errors.

    • warnings_dataframe: It is a Pandas DataFrame containing warning messages thrown by fastload.

      DataFrame is empty if there are no warnings.

    • errors_table: Name of the table containing errors.

      It is None, if argument save_errors is set to 'False'.

    • warnings_table: Name of the table containing warnings.

      It is None, if argument save_errors is set to 'False'.

    • fastloadcsv_error_tables: Name of the tables containing errors generated by FastLoadCSV.

      It is empty list, if argument save_errors is set to 'False'.

See the CSV BATCH INSERTS section and FastLoad section of https://pypi.org/project/teradatasql/ for more information.

Example Setup

>>> import csv
>>> from collections import OrderedDict
>>> from teradataml.dataframe.data_transfer import read_csv
>>> from teradatasqlalchemy.types import *
>>> csv_filename = "test_file.csv"
>>> table_name = "read_csv_table"
>>> record = [
        'id,fname,lname,marks',
        '101,abc,def,200',
        '102,lmn,pqr,105',
        '103,def,xyz,492',
        '101,abc,def,300'
    ]
>>> # Function to write list data into csv file.
>>> def write_data_into_csv(filename, record):
        with open(filename, 'w', newline='') as csvFile:
            csvWriter = csv.writer(csvFile, delimiter='\n')
            csvWriter.writerow(record)
>>>
>>> write_data_into_csv(csv_filename, record)

Example 1: Default execution with types passed as OrderedDict

This example loads the data from CSV file into a table, with argument types passed as OrderedDict.

>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT)
>>> read_csv('test_file.csv', 'my_first_table', types)

Example 2: Catch all errors and warnings, and store in the table

This example loads the data from CSV file into a table using fastload CSV protocol, and catch all errors and warnings and store those in the table.

>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT)
>>> read_csv(filepath='test_file.csv',
             table_name='my_first_table1', 
             types=types,
             save_errors=True, 
             catch_errors_warnings=True)

Example 3: Load data into an existing table, replace the table

This example loads the data from CSV file into a table using fastload CSV protocol. If table exists, then replace the table.

>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT)
>>> read_csv(filepath='test_file.csv',
             table_name='my_first_table',
             types=types, 
             if_exists='replace')

Example 4: Load data into an existing table, append the table

This example loads the data from CSV file into a table using fastload CSV protocol. If table exists in specified schema, then append the table.

>>> # Create new table in Vantage.
>>> read_csv(filepath='test_file.csv',
             table_name='my_first_table',
             types=types, 
             if_exists='fail')
>>> # Write different data into csv file.
>>> record = [
        'id,fname,lname,marks',
        '501,orl,mal,740'
    ]
>>> write_data_into_csv(csv_filename, record)
>>>
>>> # Append the existing table.
>>> read_csv(filepath='test_file.csv',
             table_name='my_first_table',
             if_exists='append')

Example 5: Load data into a SET table, catch all errors and warnings

This example loads the data from CSV file into a SET table using fastload CSV protocol. Catch all errors and warnings as well as store those in the table.

>>> # Write duplicate rows in csv file.
>>> record = [
           'id,fname,lname,marks',
            '101,abc,def,200',
            '102,lmn,pqr,105',
            '103,def,xyz,492',
            '101,abc,def,200'
        ]
>>> write_data_into_csv(csv_filename, record)
>>> # Create SET table.
>>> read_csv(filepath='test_file.csv',
             table_name='my_first_table',
             types=types, 
             if_exists='replace',
             set_table=True, 
             primary_index='id',
             save_errors=True, 
             catch_errors_warnings=True)

Example 6: Load data into a table without FastLoad protocol

This example loads the data from CSV file with DATE and TIMESTAMP columns into a table without FastLoad protocol. If table exists in specified schema, then append to the table.

>>> # Write date and timestamp in csv file.
>>> record = [
            'id,fname,lname,marks,admission_date,admission_time',
            '101,abc,def,200,2009-07-29,2009-07-29 20:17:59',
            '102,lmn,pqr,105,2009-07-29,2009-07-29 20:17:59',
            '103,def,xyz,492,2009-07-29,2009-07-29 20:17:59',
            '101,abc,def,200,2009-07-29,2009-07-29 20:17:59'
        ]
>>> write_data_into_csv(csv_filename, record)
>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT, admission_date=DATE, admission_time=TIMESTAMP)
>>> read_csv(filepath='test_file.csv',
             table_name='my_first_table',
             types=types, 
             if_exists='replace',
             use_fastload=False)
>>> # Append the existing table.
>>> read_csv(filepath='test_file.csv',
             table_name='my_first_table',
             if_exists='append',
             use_fastload=False)

Example 7: Load data into a temporary table without FastLoad protocol

This example loads the data from CSV file into a temporary table without FastLoad protocol. If table exists in specified schema, then append to the table.

>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT, admission_date=DATE, admission_time=TIMESTAMP)
>>> read_csv(filepath='test_file.csv',
             table_name='my_first_table',
             types=types, 
             if_exists='replace',
             temporary=True)
>>> # Append the existing table.
>>> read_csv(filepath='test_file.csv',
             table_name='my_first_table',
             if_exists='append',
             temporary=True)

Example 8: Load data into a PTI table

This example loads data from CSV file with TIMESTAMP columns into a PTI table. If specified table exists, then append to the table; otherwise, creates new table.

>>> types = OrderedDict(partition_id=INTEGER, adid=INTEGER, productid=INTEGER, event=VARCHAR, clicktime=TIMESTAMP)
>>> read_csv(filepath='test_file.csv',
             table_name='my_first_read_csv_pti_table',
             types=types, 
             if_exists='append',
             timecode_column='clicktime',
             columns_list='event',
             use_fastload=False)

Example 9: Load data into a SET PTI table

This example loads data from CSV file with TIMESTAMP columns into a SET PTI table. If specified table exists, then append to the table; otherwise, creates new table.

>>> types = OrderedDict(partition_id=INTEGER, adid=INTEGER, productid=INTEGER, event=VARCHAR, clicktime=TIMESTAMP)
>>> read_csv(filepath='test_file.csv',
             table_name='my_first_read_csv_pti_table',
             types=types, 
             if_exists='append',
             timecode_column='clicktime',
             columns_list='event',
             set_table=True)

Example 10: Load data into a temporary PTI table

This example loads data from CSV file with TIMESTAMP columns into a temporary PTI table. If specified table exists, then append to the table; otherwise, creates new table.

>>> types = OrderedDict(partition_id=INTEGER, adid=INTEGER, productid=INTEGER, event=VARCHAR, clicktime=TIMESTAMP)
>>> read_csv(filepath='test_file.csv',
             table_name='my_first_read_csv_pti_table',
             types=types, 
             if_exists='append',
             timecode_column='clicktime',
             columns_list='event',
             temporary=True)

Example 11: Load data into a table by opening given number of Teradata data transfer sessions

This example loads the data from CSV file into a Vantage table by opening two Teradata data transfer sessions.

>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT)
>>> read_csv(filepath='test_file.csv',  
             table_name='my_first_table_with_open_sessions',
             types=types, 
             open_sessions=2)

Example 12: Load data into table and set primary index

This example loads data from CSV file into a Vantage table and set primary index provided through the primary_index argument.

>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT)
>>> read_csv(filepath='test_file.csv',  
             table_name='my_first_table_with_primary_index',
             types=types, 
             primary_index = ['fname'])