Teradata Python Package Function Reference - fastload - Teradata Python Package - Look here for syntax, methods and examples for the functions included in the Teradata Python Package.

teradataml.dataframe.fastload.fastload = fastload(df, table_name, schema_name=None, if_exists='replace', index=False, index_label=None, primary_index=None, types=None, batch_size=None, save_errors=False): The fastload() API writes records from a Pandas DataFrame to Teradata Vantage using Fastload. FastLoad API can be used to quickly load large amounts of data in an empty table on Vantage. 1. Teradata recommends to use this API when number rows in the Pandas DataFrame is greater than 100,000 to have better performance. To insert lesser rows, please use copy_to_sql for optimized performance. The data is loaded in batches. 2. FastLoad API cannot load duplicate rows in the DataFrame if the table is a MULTISET Indexed table. 3. FastLoad API does not support all Teradata Advanced SQL Engine data types. For example, target table having BLOB and CLOB data type columns cannot be loaded. 4. If there are any incorrect rows i.e. due to constraint violations, data type conversion errors, etc., FastLoad protocol ignores those rows and inserts all valid rows. 5. Rows in the DataFrame that failed to get inserted are categorized into errors and warnings by FastLoad protocol and these errors and warnings are stored into respective error and warning tables by FastLoad API. 6. If save_errors argument is True, the names of error and warning tables are shown once the fastload operation is complete. These tables will be persisted using copy_to_sql. For additional information about FastLoad protocol through teradatasql driver, please refer the FASTLOAD section of https://pypi.org/project/teradatasql/#FastLoad driver documentation for more information. PARAMETERS: df: Required Argument. Specifies the Pandas DataFrame object to be saved in Vantage. Types: pandas.DataFrame table_name: Required Argument. Specifies the name of the table to be created in Vantage. Types: String schema_name: Optional Argument. Specifies the name of the database schema in Vantage to write to. Types: String Default: None (Uses default database schema). if_exists: Optional Argument. Specifies the action to take when table already exists in Vantage. Types: String Possible values: {'fail', 'replace', 'append'} - fail: If table exists, raise TeradataMlException. - replace: If table exists, drop it, recreate it, and insert data. - append: If table exists, insert data. Create if does not exist. Default: replace index: Optional Argument. Specifies whether to save Pandas DataFrame index as a column or not. Types: Boolean (True or False) Default: False index_label: Optional Argument. Specifies the column label(s) for Pandas DataFrame index column(s). Types: String or list of strings Default: None primary_index: Optional Argument. Specifies which column(s) to use as primary index while creating table in Vantage. When set to None, No Primary Index (NoPI) tables are created. Types: String or list of strings Default: None Example: primary_index = 'my_primary_index' primary_index = ['my_primary_index1', 'my_primary_index2', 'my_primary_index3'] types: Optional Argument. Specifies the data types for requested columns to be saved in Vantage. Types: Python dictionary ({column_name1: type_value1, ... column_nameN: type_valueN}) Default: None Note: 1. This argument accepts a dictionary of columns names and their required teradatasqlalchemy types as key-value pairs, allowing to specify a subset of the columns of a specific type. i) When only a subset of all columns are provided, the column types for the rest are assigned appropriately. ii) When types argument is not provided, the column types are assigned as listed in the following table: +---------------------------+-----------------------------------------+ | Pandas/Numpy Type | teradatasqlalchemy Type | +---------------------------+-----------------------------------------+ | int32 | INTEGER | +---------------------------+-----------------------------------------+ | int64 | BIGINT | +---------------------------+-----------------------------------------+ | bool | BYTEINT | +---------------------------+-----------------------------------------+ | float32/float64 | FLOAT | +---------------------------+-----------------------------------------+ | datetime64/datetime64[ns] | TIMESTAMP | +---------------------------+-----------------------------------------+ | Any other data type | VARCHAR(configure.default_varchar_size) | +---------------------------+-----------------------------------------+ 2. This argument does not have any effect when the table specified using table_name and schema_name exists and if_exists = 'append'. batch_size: Optional Argument. Specifies the number of rows to be loaded in a batch. For better performance, recommended batch size is at least 100,000. batch_size must be a positive integer. If this argument is None, there are two cases based on the number of rows, say N in the dataframe 'df' as explained below: If N is greater than 100,000, the rows are divided into batches of equal size with each batch having at least 100,000 rows (except the last batch which might have more rows). If N is less than 100,000, the rows are inserted in one batch after notifying the user that insertion happens with degradation of performance. If this argument is not None, the rows are inserted in batches of size given in the argument, irrespective of the recommended batch size. The last batch will have rows less than the batch size specified, if the number of rows is not an integral multiples of the argument batch_size. Default Value: None Types: int save_errors: Optional Argument. Specifies whether to persist the error/warning information in Vantage or not. If save_errors is set to False, error/warnings are not persisted as tables. If argument is set to True, the error and warnings information are presisted and names of error and warning tables are returned. Otherwise, the function returns None for the names of the tables. Default Value: False Types: bool RETURNS: A dict containing the following attributes: 1. errors_dataframe: It is a Pandas DataFrame containing error messages thrown by fastload. DataFrame is empty if there are no errors. 2. warnings_dataframe: It is a Pandas DataFrame containing warning messages thrown by fastload. DataFrame is empty if there are no warnings. 3. errors_table: Name of the table containing errors. It is None, if argument save_errors is False. 4. warnings_table: Name of the table containing warnings. It is None, if argument save_errors is False. RAISES: TeradataMlException EXAMPLES: Saving a Pandas DataFrame using Fastload: >>> from teradataml.dataframe.fastload import fastload >>> from teradatasqlalchemy.types import * >>> df = {'emp_name': ['A1', 'A2', 'A3', 'A4'], 'emp_sage': [100, 200, 300, 400], 'emp_id': [133, 144, 155, 177], 'marks': [99.99, 97.32, 94.67, 91.00] } >>> pandas_df = pd.DataFrame(df) # a) Default execution >>> fastload(df = pandas_df, table_name = 'my_table') # b) Save a Pandas DataFrame with primary_index >>> pandas_df = pandas_df.set_index(['emp_id']) >>> fastload(df = pandas_df, table_name = 'my_table_1', primary_index='emp_id') # c) Save a Pandas DataFrame using fastload() with index and primary_index >>> fastload(df = pandas_df, table_name = 'my_table_2', index=True, primary_index='index_label') # d) Save a Pandas DataFrame using types, appending to the table if it already exists >>> fastload(df = pandas_df, table_name = 'my_table_3', schema_name = 'alice', index = True, index_label = 'my_index_label', primary_index = ['emp_id'], if_exists = 'append', types = {'emp_name': VARCHAR, 'emp_sage':INTEGER, 'emp_id': BIGINT, 'marks': DECIMAL}) # e) Save a Pandas DataFrame using levels in index of type MultiIndex >>> pandas_df = pandas_df.set_index(['emp_id', 'emp_name']) >>> fastload(df = pandas_df, table_name = 'my_table_4', schema_name = 'alice', index = True, index_label = ['index1', 'index2'], primary_index = ['index1'], if_exists = 'replace')