The read_csv() function loads data from CSV file into Teradata Vantage. This function is used to quickly load large amounts of data in a table on Vantage using FastLoadCSV protocol.
- sep specifies a single character string used to separate fields in a CSV file, with default value ' , '.
- quotechar specifies a single character string used to quote fields in a CSV file, with default value ' " ' (double quote).
- Each record is on a separate line of the CSV file. Records are delimited by line breaks (CRLF).
- The last record in the file may or may not have an ending line break.
- The first line in the CSV must be header line. The header line lists the column names separated by the field separator (e.g. col1,col2,col3).
- read_csv function cannot load duplicate rows in a DataFrame if the table is a MULTISET table having primary index.
- read_csv function does not support all Teradata Advanced SQL Engine data types.
For example, target table having BLOB and CLOB data type columns cannot be loaded.
- If there are some incorrect rows due to constraint violations, data type conversion errors, and so on, FastLoad protocol ignores those rows and inserts all valid rows.
- Rows in the DataFrame that failed to get inserted are categorized into errors and warnings by FastLoad protocol and these errors and warnings are stored into respective error and warning tables by FastLoad API.
Teradata recommends using FastLoad protocol when number of rows to be loaded is at least 100,000. FastLoad opens multiple data transfer connections to the database, when the argument use_fastload is set to 'True'. The number of data transfer sessions can be set using argument open_sessions. If this argument is not set, by default, data transfer sessions opened by teradataml is the smaller of 8 and the number of available AMPs in Vantage.
- If use_fastload is set to 'False', it returns teradataml DataFrame.
- If use_fastload is set to 'True' and catch_errors_warnings is set to 'False', it returns only teradataml DataFrame.
- If use_fastload is set to 'True' and catch_errors_warnings is set to 'True', it returns a tuple containing teradataml DataFrame and a dict containing the following attributes:
- errors_dataframe: It is a Pandas DataFrame containing error messages thrown by fastload.
DataFrame is empty if there are no errors.
- warnings_dataframe: It is a Pandas DataFrame containing warning messages thrown by fastload.
DataFrame is empty if there are no warnings.
- errors_table: Name of the table containing errors.
It is None, if argument save_errors is set to 'False'.
- warnings_table: Name of the table containing warnings.
It is None, if argument save_errors is set to 'False'.
- fastloadcsv_error_tables: Name of the tables containing errors generated by FastLoadCSV.
It is empty list, if argument save_errors is set to 'False'.
- errors_dataframe: It is a Pandas DataFrame containing error messages thrown by fastload.
See the CSV BATCH INSERTS section and FastLoad section of https://pypi.org/project/teradatasql/ for more information.
Example Setup
>>> import csv >>> from collections import OrderedDict >>> from teradataml.dataframe.data_transfer import read_csv >>> from teradatasqlalchemy.types import *
>>> csv_filename = "test_file.csv" >>> table_name = "read_csv_table" >>> record = [ 'id,fname,lname,marks', '101,abc,def,200', '102,lmn,pqr,105', '103,def,xyz,492', '101,abc,def,300' ]
>>> # Function to write list data into csv file. >>> def write_data_into_csv(filename, record): with open(filename, 'w', newline='') as csvFile: csvWriter = csv.writer(csvFile, delimiter='\n') csvWriter.writerow(record) >>> >>> write_data_into_csv(csv_filename, record)
Example 1: Default execution with types passed as OrderedDict
This example loads the data from CSV file into a table, with argument types passed as OrderedDict.
>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT)
>>> read_csv('test_file.csv', 'my_first_table', types)
Example 2: Catch all errors and warnings, and store in the table
This example loads the data from CSV file into a table using fastload CSV protocol, and catch all errors and warnings and store those in the table.
>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT)
>>> read_csv(filepath='test_file.csv', table_name='my_first_table1', types=types, save_errors=True, catch_errors_warnings=True)
Example 3: Load data into an existing table, replace the table
This example loads the data from CSV file into a table using fastload CSV protocol. If table exists, then replace the table.
>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT)
>>> read_csv(filepath='test_file.csv', table_name='my_first_table', types=types, if_exists='replace')
Example 4: Load data into an existing table, append the table
This example loads the data from CSV file into a table using fastload CSV protocol. If table exists in specified schema, then append the table.
>>> # Create new table in Vantage. >>> read_csv(filepath='test_file.csv', table_name='my_first_table', types=types, if_exists='fail')
>>> # Write different data into csv file. >>> record = [ 'id,fname,lname,marks', '501,orl,mal,740' ]
>>> write_data_into_csv(csv_filename, record) >>> >>> # Append the existing table. >>> read_csv(filepath='test_file.csv', table_name='my_first_table', if_exists='append')
Example 5: Load data into a SET table, catch all errors and warnings
This example loads the data from CSV file into a SET table using fastload CSV protocol. Catch all errors and warnings as well as store those in the table.
>>> # Write duplicate rows in csv file. >>> record = [ 'id,fname,lname,marks', '101,abc,def,200', '102,lmn,pqr,105', '103,def,xyz,492', '101,abc,def,200' ] >>> write_data_into_csv(csv_filename, record)
>>> # Create SET table. >>> read_csv(filepath='test_file.csv', table_name='my_first_table', types=types, if_exists='replace', set_table=True, primary_index='id', save_errors=True, catch_errors_warnings=True)
Example 6: Load data into a table without FastLoad protocol
This example loads the data from CSV file with DATE and TIMESTAMP columns into a table without FastLoad protocol. If table exists in specified schema, then append to the table.
>>> # Write date and timestamp in csv file. >>> record = [ 'id,fname,lname,marks,admission_date,admission_time', '101,abc,def,200,2009-07-29,2009-07-29 20:17:59', '102,lmn,pqr,105,2009-07-29,2009-07-29 20:17:59', '103,def,xyz,492,2009-07-29,2009-07-29 20:17:59', '101,abc,def,200,2009-07-29,2009-07-29 20:17:59' ] >>> write_data_into_csv(csv_filename, record)
>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT, admission_date=DATE, admission_time=TIMESTAMP)
>>> read_csv(filepath='test_file.csv', table_name='my_first_table', types=types, if_exists='replace', use_fastload=False)
>>> # Append the existing table. >>> read_csv(filepath='test_file.csv', table_name='my_first_table', if_exists='append', use_fastload=False)
Example 7: Load data into a temporary table without FastLoad protocol
This example loads the data from CSV file into a temporary table without FastLoad protocol. If table exists in specified schema, then append to the table.
>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT, admission_date=DATE, admission_time=TIMESTAMP)
>>> read_csv(filepath='test_file.csv', table_name='my_first_table', types=types, if_exists='replace', temporary=True)
>>> # Append the existing table. >>> read_csv(filepath='test_file.csv', table_name='my_first_table', if_exists='append', temporary=True)
Example 8: Load data into a PTI table
This example loads data from CSV file with TIMESTAMP columns into a PTI table. If specified table exists, then append to the table; otherwise, creates new table.
>>> types = OrderedDict(partition_id=INTEGER, adid=INTEGER, productid=INTEGER, event=VARCHAR, clicktime=TIMESTAMP)
>>> read_csv(filepath='test_file.csv', table_name='my_first_read_csv_pti_table', types=types, if_exists='append', timecode_column='clicktime', columns_list='event', use_fastload=False)
Example 9: Load data into a SET PTI table
This example loads data from CSV file with TIMESTAMP columns into a SET PTI table. If specified table exists, then append to the table; otherwise, creates new table.
>>> types = OrderedDict(partition_id=INTEGER, adid=INTEGER, productid=INTEGER, event=VARCHAR, clicktime=TIMESTAMP)
>>> read_csv(filepath='test_file.csv', table_name='my_first_read_csv_pti_table', types=types, if_exists='append', timecode_column='clicktime', columns_list='event', set_table=True)
Example 10: Load data into a temporary PTI table
This example loads data from CSV file with TIMESTAMP columns into a temporary PTI table. If specified table exists, then append to the table; otherwise, creates new table.
>>> types = OrderedDict(partition_id=INTEGER, adid=INTEGER, productid=INTEGER, event=VARCHAR, clicktime=TIMESTAMP)
>>> read_csv(filepath='test_file.csv', table_name='my_first_read_csv_pti_table', types=types, if_exists='append', timecode_column='clicktime', columns_list='event', temporary=True)
Example 11: Load data into a table by opening given number of Teradata data transfer sessions
This example loads the data from CSV file into a Vantage table by opening two Teradata data transfer sessions.
>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT)
>>> read_csv(filepath='test_file.csv', table_name='my_first_table_with_open_sessions', types=types, open_sessions=2)
Example 12: Load data into table and set primary index
This example loads data from CSV file into a Vantage table and set primary index provided through the primary_index argument.
>>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT)
>>> read_csv(filepath='test_file.csv', table_name='my_first_table_with_primary_index', types=types, primary_index = ['fname'])