copy_to_sql() | Teradata Package for Python - copy_to_sql() - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2025-12-05
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

Use the copy_to_sql() function to create a table in Vantage based on a teradataml DataFrame or a pandas DataFrame.

The function takes a teradataml DataFrame or a pandas DataFrame and a table name as arguments, and generates DDL and DML commands that creates a table in Vantage. You can also specify the name of the schema in the database to write the table to. If no schema name is specified, the default database schema is used.

Required Parameters

df
Specifies the Pandas or teradataml DataFrame object to be saved.
table_name
Specifies the name of the table to be created in Vantage.

Optional Parameters

schema_name
Specifies the name of the SQL schema in Vantage to write to.
if_exists
Specifies the action to take when table already exists in the database.
Possible values are:
  • 'fail': If table exists, do nothing;
  • 'replace': If table exists, drop it, recreate it, and insert data;
  • 'append': If table exists, insert data. Create if does not exist.

The default value is 'append'.

Replacing a table with the contents of a teradataml DataFrame based on the same underlying table is not supported.
index
Specifies whether to save pandas DataFrame index as a column or not. Possible values are True or False.
The default value is False.
Only use as True when attempting to save Pandas DataFrames (and not on teradataml DataFrames).
index_label
Specifies the column labels for pandas DataFrame index columns. If no value is given and index is True, then a default label 'index_label' is used.
The default value is None.
Only use this argument when attempting to save Pandas DataFrames (and not on teradataml DataFrames).
If this argument is not specified (defaulted to None or is empty) and argument index is set to True, then the names property of the DataFrames index is used as the label(s), and if that too is None or empty, then:
  • A default label 'index_label' or 'level_0' (when 'index_label' is already taken) is used when index is standard.
  • Default labels 'level_0', 'level_1', etc. are used when index is multi-level index.
primary_index
Specifies the columns to use as primary index while creating tables in Vantage. When None (default value), No Primary Index (NOPI) tables are created.
For example:
  • primary_index = 'my_primary_index'
  • primary_index = ['my_primary_index1', 'my_primary_index2', 'my_primary_index3']
temporary
Specifies whether to create Vantage tables as permanent or volatile.
Possible values are:
  • True: Creates volatile tables, and schema_name is ignored.
  • False: Creates permanent tables.

The default value is False.

types
Specifies required data types for requested columns to be saved in Vantage.

The default value is None.

This argument accepts a dictionary of column names and their required teradatasqlalchemy types as key-value pairs, allowing to specify a subset of the columns of a specific type.

  • When only a subset of all columns are provided, the rest are defaulted to appropriate types.
  • When types argument is not provided, all column types are appropriately assigned and defaulted. The column types are assigned as listed in the following table.
    pandas/NumPy Type teradatasqlalchemy Type
    int32 INTEGER
    int64 BIGINT
    bool BYTEINT
    float32/float64 FLOAT
    datetime64/datatime64[ns] TIMESTAMP
    datatime64[ns,time_zone] TIMESTAMP(time_zone=True)
    Any other data type VARCHAR(configure.default_varchar_size)
primary_time_index_name
Specifies a name for the Primary Time Index (PTI) when the table to be created must be a PTI table.
This argument is not required or used when the table to be created is not a PTI table. It is ignored if specified without the argument timecode_column.
timecode_column
Specifies the column in the DataFrame that reflects the form of the timestamp data in the time series.
This argument is required when the DataFrame must be saved as a PTI table.

This column is the TD_TIMECODE column in the table created. It must be of the SQL type TIMESTAMP(n), TIMESTAMP(n) WITH TIME ZONE, or DATE, corresponding to the Python types datetime.datetime or datetime.date, or Pandas dtype datetime64[ns].

This argument is not required when the table to be created is not a PTI table. When specifying this argument, an attempt to create a PTI table is made. If this argument is specified, primary_index is ignored.
timezero_date
Specifies the earliest time series data that the PTI accepts; a date that precedes the earliest date in the time series data. Value specified must be of the following format: DATE 'YYYY-MM-DD'.

The default value is: DATE '1970-01-01'.

  • This argument is used when the DataFrame must be saved as a PTI table.
  • This argument is not required or used when the table to be created is not a PTI table.
  • It is ignored if specified without the timecode_column argument.
timebucket_duration
Specifies a duration that serves to break up the time continuum in the time series data into discrete groups or buckets.

Specified using the formal form time_unit(n), where n is a positive integer, and time_unit can be any of the following: CAL_YEARS, CAL_MONTHS, CAL_DAYS, WEEKS, DAYS, HOURS, MINUTES, SECONDS, MILLISECONDS, OR MICROSECONDS.

  • This argument is required if columns_list is not specified or is empty.
  • This argument is used when the DataFrame must be saved as a PTI table.
  • This argument is not required or used when the table to be created is not a PTI table.
  • It is ignored if specified without the timecode_column argument.
columns_list
Specifies a list of one or more PTI table column names.
  • This argument is required if timebucket_duration is not specified or is empty.
  • This argument is used when the DataFrame must be saved as a PTI table.
  • This argument is not required or used when the table to be created is not a PTI table.
  • It is ignored if specified without the timecode_column argument.
sequence_column
Specifies the column of type Integer containing the unique identifier for time series data reading when they are not unique in time.
  • When specified, implies SEQUENCED, meaning more than one reading from the same sensor may have the same timestamp. This column is the TD_SEQNO column in the table created.
  • When not specified, implies NONSEQUENCED, meaning there is only one sensor reading per timestamp. This is the default.
  • This argument is used when the DataFrame must be saved as a PTI table.
  • This argument is not required or used when the table to be created is not a PTI table.
  • It is ignored if specified without the timecode_column argument.
seq_max
Specifies the maximum number of sensor data rows that can have the same timestamp. Can be used when 'sequenced' is True.

Accepted values range from 1 to 2147483647, with default value 20000.

  • This argument is used when the DataFrame must be saved as a PTI table.
  • This argument is not required or used when the table to be created is not a PTI table.
  • It is ignored if specified without the timecode_column argument.
set_table
Specifies a flag to determine whether to create a SET or a MULTISET table.
  • When True, a SET table is created.
  • When False, a MULTISET table is created. This is the default value.
  • Specifying set_table=True also requires specifying primary_index or timecode_column.
  • Creating SET table (set_table=True) may result in
    • an error if the source is a Pandas DataFrame having duplicate rows;
    • loss of duplicate rows if the source is a teradataml DataFrame.
  • This argument has no effect if the table already exists and if_exists='append'.
match_column_order
Specifies whether the order of the columns in existing table matches the order of the columns in the "df" or not.

When set to False, the DataFrame to be loaded can have any order and number of columns.

The default value is True.

chunksize
Specifies the number of rows to be loaded in a batch, if DataFrame is pandas DataFrame.

The default value is 16383.

partition_by
Specifies the columns on which partition should be created while creating the table.
  • "partition_by", "partition_by_case" and "partition_by_range" are mutually exclusive.
  • "primary_index" should be specified when "partition_by" is used.
  • Not applicable for PTI tables.
partition_by_case
Specifies different cases to partition the index while creating table.
  • "partition_by", "partition_by_case" and "partition_by_range" are mutually exclusive.
  • "primary_index" should be specified when "partition_by_case" is used.
  • Not applicable for PTI tables.
partition_by_range
Specifies the range of values on which partition should be created while creating a table.
  • "partition_by", "partition_by_case" and "partition_by_range" are mutually exclusive.
  • "primary_index" should be specified when "partition_by_range" is used.
  • Not applicable for PTI tables.
sub_partition
Specifies the details to subpartition the main partition according to the value provided while creating the table.
  • "sub_partition" is applicable only when "partition_by_range" is specified.
  • Not applicable for PTI tables.
kwargs
Optional keyword arguments.
  • valid_time_columns

    Specifies the names of the valid time columns to be referred in "df".

    When "valid_time_columns" is specified, then function considers these columns as valid time dimension columns and creates a valid time dimension temporal table if table does not exist.

    If a string is provided, the column must be of PERIOD type.

  • derived_column
    Specifies the name of the derived column to be kept in the temporal table.
    • Argument is ignored if "valid_time_columns" are not specified.
    • Argument is considered only if copy_to_sql() is creating a table.
    • If "valid_time_columns" is specified and "derived_column" is not specified, then copy_to_sql() automatically creates a derived column by adding "_" between the columns mentioned in "valid_time_columns". For example, if "valid_time_columns" is ('col1', 'col2') and "derived_column" is not specified, then copy_to_sql() creates table with derived column name as 'col1_col2'.

Example 1: Create a new table in Vantage based on a pandas DataFrame

You must import pandas and copy_to_sql first.

>>>import pandas as pd
>>>from teradataml.dataframe.copy_to import copy_to_sql

Assume you create the "sales" table as follows:

>>>sales = [{'accounts': 'Jones LLC', 'Jan': 150, 'Feb': 200.0, 'Mar': 140, 'datetime':'2017-04-01'},
            {'accounts': 'Alpha Co',  'Jan': 200, 'Feb': 210.0, 'Mar': 215, 'datetime': '2017-04-01'},
            {'accounts': 'Blue Inc',  'Jan': 50,  'Feb': 90.0,  'Mar': 95, 'datetime': '2017-04-01' }]

And a pandas DataFrame "pdf" is created from the table "sales", using command:

pdf = pd.DataFrame(sales)

Enter pdf to display the pandas DataFrame:

>>>pdf
     Feb  Jan  Mar   accounts    datetime
0  200.0  150  140  Jones LLC  2017-04-01
1  210.0  200  215   Alpha Co  2017-04-01
2   90.0   50   95   Blue Inc  2017-04-01

Use the copy_to_sql() function to create a new Vantage table "pandas_sales" based on the pandas DataFrame "pdf" that is created in the prerequisite.

>>>copy_to_sql(df = pdf, table_name = "pandas_sales", primary_index="accounts", if_exists="replace")

Verify that the new table "pandas_sales" exists in Vantage using the DataFrame() function which creates a DataFrame based on an existing table.

>>>df = DataFrame("pandas_sales")
>>> df
             Feb  Jan  Mar                    datetime
accounts
Jones LLC  200.0  150  140  2017-04-01 00:00:00.000000
Alpha Co   210.0  200  215  2017-04-01 00:00:00.000000
Blue Inc    90.0   50   95  2017-04-01 00:00:00.000000

Example 2: Use the optional index and index_label parameters, and use the index as Primary Index

>>> copy_to_sql(df = pdf, table_name = "pandas_sales", index=True, index_label="idx", primary_index="idx", if_exists="replace")
>>> df = DataFrame("pandas_sales")
>>> df
       Feb  Jan  Mar   accounts                    datetime
idx
0    200.0  150  140  Jones LLC  2017-04-01 00:00:00.000000
2     90.0   50   95   Blue Inc  2017-04-01 00:00:00.000000
1    210.0  200  215   Alpha Co  2017-04-01 00:00:00.000000

Example 3: Create a new table in Vantage based on a teradataml DataFrame

>>> df = DataFrame("sales")
>>> df
              Feb   Jan   Mar   Apr    datetime
accounts                                      
Blue Inc     90.0    50    95   101  04/01/2017
Alpha Co    210.0   200   215   250  04/01/2017
Jones LLC   200.0   150   140   180  04/01/2017
Yellow Inc   90.0  None  None  None  04/01/2017
Orange Inc  210.0  None  None   250  04/01/2017
Red Inc     200.0   150   140  None  04/01/2017
>>> copy_to_sql(df = df, table_name = "sales_df1", primary_index="accounts", if_exists="replace")
>>> df1 = DataFrame("sales_df1")
>>> df1
              Feb   Jan   Mar   Apr  datetime
accounts                                    
Blue Inc     90.0    50    95   101  17/01/04
Orange Inc  210.0  None  None   250  17/01/04
Red Inc     200.0   150   140  None  17/01/04
Yellow Inc   90.0  None  None  None  17/01/04
Jones LLC   200.0   150   140   180  17/01/04
Alpha Co    210.0   200   215   250  17/01/04

Example 4: Create a new Primary Time Index Table in Vantage from a Pandas DataFrame

>>> copy_to_sql(df = pdf, table_name = "pandas_sales_pti", timecode_column="datetime", columns_list="accounts")
>>> pandas_sales_pti = DataFrame('pandas_sales_pti')
>>> pandas_sales_pti
                          TD_TIMECODE    Feb  Jan  Mar
accounts
Jones LLC  2017-04-01 00:00:00.000000  200.0  150  140
Alpha Co   2017-04-01 00:00:00.000000  210.0  200  215
Blue Inc   2017-04-01 00:00:00.000000   90.0   50   95

Example 5: Save a Pandas DataFrame with VECTOR datatype

>>> import pandas as pd
>>> VECTOR_data = {
... 'id': [10, 11, 12, 13],
... 'array_col': ['1,1', '2,2', '3,3', '4,4']
... }
>>> df = pd.DataFrame(VECTOR_data)
>>> from teradatasqlalchemy import VECTOR
>>> copy_to_sql(df=df, table_name='my_vector_table', types={'array_col': VECTOR})
>>>

Example 6: Save a pandas DataFrame with partition_by

>>> df = {'emp_name': ['A1', 'A2', 'A3', 'A4'],
...       'emp_sage': [100, 200, 300, 400],
...       'emp_id': [133, 144, 155, 177],
...       'marks': [99.99, 97.32, 94.67, 91.00]
...    }
>>> pandas_df = pd.DataFrame(df)
>>> copy_to_sql(df=pandas_df, table_name='my_table_5', if_exists='replace',
...             primary_index=['emp_id'],
...             partition_by='emp_id')

Example 7: Save a pandas DataFrame with partition_by_case

>>> copy_to_sql(df=pandas_df, table_name='my_table_6', if_exists='replace',
...             primary_index=['emp_id'],
...             partition_by_case='emp_id > 100, emp_id < 500')

Example 8: Save a pandas DataFrame with partition_by_range

>>> copy_to_sql(df=pandas_df, table_name='my_table_7', if_exists='replace',
...             primary_index=['emp_id'],
...             partition_by_range='emp_id BETWEEN 100 AND 500')

Example 9: Save a Pandas DataFrame with valid time columns of DATE type to a temporal table

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'id': [1, 2, 3],
...     'start_date': pd.to_datetime(['2024-01-01', '2024-02-01', '2024-03-01']).date,
...     'end_date': pd.to_datetime(['2024-01-10', '2024-02-10', '2024-03-10']).date,
...     'description': ['a', 'b', 'c']
... })
>>> copy_to_sql(
...     df=df,
...     table_name='temporal_table_pandas_date',
...     valid_time_columns=('start_date', 'end_date')
... )

Example 10: Save a Pandas DataFrame with valid time columns of TIMESTAMP type to a temporal table. Name the derived column as valid_time

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'id': [1, 2, 3],
...     'start_time': pd.to_datetime(['2024-01-01 10:00:00', '2024-02-01 11:00:00', '2024-03-01 12:00:00']),
...     'end_time': pd.to_datetime(['2024-01-01 12:00:00', '2024-02-01 13:00:00', '2024-03-01 14:00:00']),
...     'description: ['a', 'b', 'c']
... })
>>> copy_to_sql(
...     df=df,
...     table_name='temporal_table_pandas_timestamp',
...     valid_time_columns=('start_time', 'end_time'),
...     derived_column='valid_time'
... )

Example 11: Save a teradataml DataFrame with valid time column of PERIOD type to a temporal table

>>> from teradatasqlalchemy.types import PERIOD_DATE
>>> load_example_data("teradataml", "Employee_roles")
>>> df = DataFrame('Employee_roles')
>>> copy_to_sql(
...     df,
...     table_name = 'employee_roles_temporal',
...     valid_time_column='role_validity_period',
...     types={'role_validity_period':PERIOD_DATE}
... )