Teradata Package for Python Function Reference | 17.10 - read_csv - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

teradataml.dataframe.data_transfer.read_csv = read_csv(filepath, table_name, types=None, sep=',', quotechar='"', schema_name=None, if_exists='replace', primary_index=None, set_table=False, temporary=False, primary_time_index_name=None, timecode_column=None, timebucket_duration=None, timezero_date=None, columns_list=None, sequence_column=None, seq_max=None, catch_errors_warnings=False, save_errors=False, use_fastload=True, open_sessions=None): The read_csv() API loads data from CSV file into Teradata Vantage. Function can be used to quickly load large amounts of data in a table on Vantage using FastloadCSV protocol. Considerations when using a CSV file: * Each record is on a separate line of the CSV file. Records are delimited by line breaks (CRLF). The last record in the file may or may not have an ending line break. * First line in the CSV must be header line. The header line lists the column names separated by the field separator (e.g. col1,col2,col3). * Using a CSV file with FastLoad has limitations as follows: 1. read_csv API cannot load duplicate rows in the DataFrame if the table is a MULTISET table having primary index. 2. read_csv API does not support all Teradata Advanced SQL Engine data types. For example, target table having BLOB and CLOB data type columns cannot be loaded. 3. If there are any incorrect rows, i.e. due to constraint violations, data type conversion errors, etc., FastLoad protocol ignores those rows and inserts all valid rows. 4. Rows in the DataFrame that failed to get inserted are categorized into errors and warnings by FastLoad protocol and these errors and warnings are stored into respective error and warning tables by FastLoad API. 5. Teradata recommends to use Fastload protocol when number of rows to be loaded are at least 100,000. Fastload opens multiple data transfer connections to the database. For additional information about FastLoadCSV protocol through teradatasql driver, please refer the CSV BATCH INSERTS section of https://pypi.org/project/teradatasql/#CSVBatchInserts driver documentation for more information. PARAMETERS: filepath: Required Argument. Specifies the CSV filepath including name of the file to load the data from. Types: String table_name: Required Argument. Specifies the table name to load the data into. Types: String types: Optional Argument when if_exists=append and non-PTI table already exists, Required otherwise. Specifies the data types for requested columns to be saved in Vantage. Keys of this dictionary should be the name of the columns and values should be teradatasqlalchemy.types. Default Value: None Note: * If specified when "if_exists" is set to append and table exists, then argument is ignored. Types: OrderedDict sep: Optional Argument. Specifies a single character string used to separate fields in a CSV file. Default Value: "," Notes: * "sep" cannot be line feed ('\n') or carriage return ('\r'). * "sep" should not be same as "quotechar". * Length of "sep" argument should be 1. Types: String quotechar: Optional Argument. Specifies a single character string used to quote fields in a CSV file. Default Value: """ Notes: * "quotechar" cannot be line feed ('\n') or carriage return ('\r'). * "quotechar" should not be same as "sep". * Length of "quotechar" argument should be 1. Types: String schema_name: Optional Argument. Specifies the name of the database/schema in Vantage to write to. Default Value: None (Uses default database/schema). Types: String if_exists: Optional Argument. Specifies the action to take when table already exists in Vantage. Permitted Values: 'fail', 'replace', 'append' - fail: If table exists, raise TeradataMlException. - replace: If table exists, drop it, recreate it, and insert data. - append: If table exists, append the existing table. Default Value: replace Types: String primary_index: Optional Argument. Specifies which column(s) to use as primary index while creating table in Vantage. When set to None, No Primary Index (NoPI) tables are created. Default Value: None Types: String or list of strings Example: primary_index = 'my_primary_index' primary_index = ['my_primary_index1', 'my_primary_index2', 'my_primary_index3'] set_table: Optional Argument. Specifies a flag to determine whether to create a SET or a MULTISET table. When set to True, a SET table is created, otherwise MULTISET table is created. Default Value: False Notes: 1. Specifying set_table=True also requires specifying primary_index. 2. Creating SET table (set_table=True) results in a. loss of duplicate rows, if CSV contains any duplicate. 3. This argument has no effect if the table already exists and if_exists='append'. Types: Boolean temporary: Optional Argument. Specifies whether to create table as volatile. Default Value: False Notes: When set to True 1. FastloadCSV protocol is not used for loading the data. 2. "schema_name" is ignored. Types : Boolean primary_time_index_name: Optional Argument. Specifies the name for the Primary Time Index (PTI) when the table is to be created as PTI table. Note: This argument is not required or used when the table to be created is not a PTI table. It will be ignored if specified without the "timecode_column". Types: String timecode_column: Optional argument. Required when the CSV data must be saved as a PTI table. Specifies the column in the csv that reflects the form of the timestamp data in the time series. This column will be the TD_TIMECODE column in the table created. It should be of SQL type TIMESTAMP(n), TIMESTAMP(n) WITH TIMEZONE, or DATE, corresponding to Python types datetime.datetime or datetime.date. Note: When "timecode_column" argument is specified, an attempt to create a PTI table will be made. This argument is not required when the table to be created is not a PTI table. If this argument is specified, "primary_index" will be ignored. Types: String timezero_date: Optional Argument. Used when the CSV data must be saved as a PTI table. Specifies the earliest time series data that the PTI table will accept, a date that precedes the earliest date in the time series data. Value specified must be of the following format: DATE 'YYYY-MM-DD' Default Value: DATE '1970-01-01'. Note: This argument is not required or used when the table to be created is not a PTI table. It will be ignored if specified without the "timecode_column". Types: String timebucket_duration: Optional Argument. Required if "columns_list" is not specified or is None. Used when the CSV data must be saved as a PTI table. Specifies a duration that serves to break up the time continum in the time series data into discrete groups or buckets. Specified using the formal form time_unit(n), where n is a positive integer, and time_unit can be any of the following: CAL_YEARS, CAL_MONTHS, CAL_DAYS, WEEKS, DAYS, HOURS, MINUTES, SECONDS, MILLISECONDS, or MICROSECONDS. Note: This argument is not required or used when the table to be created is not a PTI table. It will be ignored if specified without the "timecode_column". Types: String columns_list: Optional Argument. Used when the CSV data must be saved as a PTI table. Required if "timebucket_duration" is not specified. A list of one or more PTI table column names. Note: This argument is not required or used when the table to be created is not a PTI table. It will be ignored if specified without the "timecode_column". Types: String or list of Strings sequence_column: Optional Argument. Used when the CSV data must be saved as a PTI table. Specifies the column of type Integer containing the unique identifier for time series data readings when they are not unique in time. * When specified, implies SEQUENCED, meaning more than one reading from the same sensor may have the same timestamp. This column will be the TD_SEQNO column in the table created. * When not specified, implies NONSEQUENCED, meaning there is only one sensor reading per timestamp. This is the default. Note: This argument is not required or used when the table to be created is not a PTI table. It will be ignored if specified without the "timecode_column". Types: String seq_max: Optional Argument. Used when the CSV data must be saved as a PTI table. Specifies the maximum number of data rows that can have the same timestamp. Can be used when 'sequenced' is True. Permitted range: 1 - 2147483647. Default Value: 20000. Note: This argument is not required or used when the table to be created is not a PTI table. It will be ignored if specified without the "timecode_column". Types: Integer save_errors: Optional Argument. Specifies whether to persist the errors/warnings(if any) information in Vantage or not. If "save_errors" is set to False: 1. Errors or warnings (in any) are not persisted into tables. 2. Errors table genarated by FastloadCSV are not persisted. If "save_errors" is set to True: 1. The errors or warnings information is persisted and names of error and warning tables are returned. Otherwise, the function returns None for the names of the tables. 2. The errors tables generated by FastloadCSV are persisted and name of error tables are returned. Otherwise, the function returns None for the names of the tables. Default Value: False Types: Boolean catch_errors_warnings: Optional Argument. Specifies whether to catch errors/warnings(if any) raised by fastload protocol while loading data into the Vantage table. When set to False, function does not catch any errors and warnings, otherwise catches errors and warnings, if any, and returns as a dictionary along with teradataml DataFrame. Please see 'RETURNS' section for more details. Default Value: False Types: Boolean use_fastload: Optional Argument. Specifies whether to use Fastload CSV protocol or not. Default Value: True Notes: 1. Teradata recommends to use Fastload when number of rows to be loaded are atleast 100,000. To load lesser rows set this argument to 'False'. Fastload opens multiple data transfer connections to the database. 2. When "use_fastload" is set to True, one can load the data into table using FastloadCSV protocol: a. Set table b. Multiset table 3. When "use_fastload" is set to False, one can load the data in following types of tables: a. Set table b. Multiset table c. PTI table d. Volatile table Types: Boolean open_sessions: Optional Argument. Specifies the number of Teradata data transfer sessions to be opened for fastload operation. Note : If "open_sessions" argument is not provided, the default value is the smaller of 8 or the number of AMPs available. For additional information about number of Teradata data-transfer sessions opened during fastload, please refer to: https://pypi.org/project/teradatasql/#FastLoad Default Value: None Types: int RETURNS: When "use_fastload" is set to False, returns teradataml dataframe. When "use_fastload" is set to True, read_csv() returns below: When "catch_errors_warnings" is set to False, returns only teradataml dataframe. When "catch_errors_warnings" is set to True, read_csv() returns a tuple containing: a. teradataml DataFrame. b. a dict containing the following attributes: a. errors_dataframe: It is a Pandas DataFrame containing error messages thrown by fastload. DataFrame is empty if there are no errors. b. warnings_dataframe: It is a Pandas DataFrame containing warning messages thrown by fastload. DataFrame is empty if there are no warnings. c. errors_table: Name of the table containing errors. It is None, if argument save_errors is False. d. warnings_table: Name of the table containing warnings. It is None, if argument save_errors is False. e. fastloadcsv_error_tables: Name of the tables containing errors generated by FastloadCSV. It is empty list, if argument "save_errors" is False. RAISES: TeradataMlException EXAMPLES: >>> from teradataml.dataframe.data_transfer import read_csv >>> from teradatasqlalchemy.types import * >>> from collections import OrderedDict # Example 1: Default execution with types argument is passed as OrderedDict. >>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT) >>> read_csv('test_file.csv', 'my_first_table', types) # Example 2: Load the data from CSV file into a table using fastload CSV protocol, # while doing so catch all errors and warnings as well as store those in the table. >>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT) >>> read_csv(filepath='test_file.csv', table_name='my_first_table1', types=types, save_errors=True, catch_errors_warnings=True) # Example 3: Load the data from CSV file into a table using fastload CSV protocol. # If table exists, then replace the same. Catch all errors and warnings as well as # store those in the table. >>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT) >>> read_csv(filepath='test_file.csv', table_name='my_first_table', types=types, if_exists='replace', save_errors=True, catch_errors_warnings=True) # Example 4: Load the data from CSV file into a table using fastload CSV protocol. # If table exists in specified schema, then append the same. Catch all # errors and warnings as well as store those in the table. >>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT) >>> read_csv(filepath='test_file.csv', table_name='my_first_table', types=types, if_exists='fail', save_errors=True, catch_errors_warnings=True) >>> read_csv(filepath='test_file.csv', table_name='my_first_table', if_exists='append', save_errors=True, catch_errors_warnings=True) # Example 5: Load the data from CSV file into a SET table using fastload CSV protocol. # Catch all errors and warnings as well as store those in the table. >>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT) >>> read_csv(filepath='test_file.csv', table_name='my_first_table', types=types, if_exists='replace', set_table=True, primary_index='id', save_errors=True, catch_errors_warnings=True) # Example 6: Load the data from CSV file into a temporary table without fastloadCSV protocol. # If table exists, then append to the same. >>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT) >>> read_csv(filepath='test_file.csv', table_name='my_first_table', types=types, if_exists='replace', temporary=True) >>> read_csv(filepath='test_file.csv', table_name='my_first_table', if_exists='append', temporary=True) # Example 7: Load the data from CSV file with DATE and TIMESTAMP columns into # a table without Fastload protocol. If table exists in specified # schema, then append to the table. >>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT, admission_date=DATE, admission_time=TIMESTAMP) >>> read_csv(filepath='test_file.csv', table_name='my_first_table', types=types, if_exists='fail', use_fastload=False) >>> read_csv(filepath='test_file.csv', table_name='my_first_table', if_exists='append', use_fastload=False) # Example 8: Load the data from CSV file with TIMESTAMP columns into # a PTI table. If specified table exists then append to the table, # otherwise creates new table. >>> types = OrderedDict(partition_id=INTEGER, adid=INTEGER, productid=INTEGER, event=VARCHAR, clicktime=TIMESTAMP) >>> read_csv(filepath='test_file.csv', table_name='my_first_read_csv_pti_table', types=types, if_exists='append', timecode_column='clicktime', columns_list='event', use_fastload=False) # Example 9: Load the data from CSV file with TIMESTAMP columns into # a SET PTI table. If specified table exists then append to the table, # otherwise creates new table. >>> types = OrderedDict(partition_id=INTEGER, adid=INTEGER, productid=INTEGER, event=VARCHAR, clicktime=TIMESTAMP) >>> read_csv(filepath='test_file.csv', table_name='my_first_read_csv_pti_table', types=types, if_exists='append', timecode_column='clicktime', columns_list='event', set_table=True) # Example 10: Load the data from CSV file with TIMESTAMP columns into # a temporary PTI table. If specified table exists then append to the table, # otherwise creates new table. >>> types = OrderedDict(partition_id=INTEGER, adid=INTEGER, productid=INTEGER, event=VARCHAR, clicktime=TIMESTAMP) >>> read_csv(filepath='test_file.csv', table_name='my_first_read_csv_pti_table', types=types, if_exists='append', timecode_column='clicktime', columns_list='event', temporary=True) # Example 11: Load the data from CSV file into Vantage table by opening specified # number of Teradata data transfer sesions. >>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT) >>> read_csv(filepath='test_file.csv', table_name='my_first_table_with_open_sessions', types=types, open_sessions=2) # Example 12: Load the data from CSV file into Vantage table and set primary index provided # through primary_index argument. >>> types = OrderedDict(id=BIGINT, fname=VARCHAR, lname=VARCHAR, marks=FLOAT) >>> read_csv(filepath='test_file.csv', table_name='my_first_table_with_primary_index', types=types, primary_index = ['fname'])