Teradata Package for Python Function Reference on VantageCloud Lake - copy_to_sql - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference on VantageCloud Lake
- Deployment
- VantageCloud
- Edition
- Lake
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Lake_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.copy_to.copy_to_sql = copy_to_sql(df, table_name, schema_name=None, if_exists='append', index=False, index_label=None, primary_index=None, temporary=False, types=None, primary_time_index_name=None, timecode_column=None, timebucket_duration=None, timezero_date=None, columns_list=None, sequence_column=None, seq_max=None, set_table=False, chunksize=16383, match_column_order=True)
- Writes records stored in a Pandas DataFrame or a teradataml DataFrame to Teradata Vantage.
PARAMETERS:
df:
Required Argument.
Specifies the Pandas or teradataml DataFrame object to be saved.
Types: pandas.DataFrame or teradataml.dataframe.dataframe.DataFrame
table_name:
Required Argument.
Specifies the name of the table to be created in Vantage.
Types : String
schema_name:
Optional Argument.
Specifies the name of the SQL schema in Teradata Vantage to write to.
Types: String
Default: None (Uses default database schema).
Note: schema_name will be ignored when temporary=True.
if_exists:
Optional Argument.
Specifies the action to take when table already exists in Vantage.
Types: String
Possible values: {'fail', 'replace', 'append'}
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
Default : append
Note: Replacing a table with the contents of a teradataml DataFrame based on
the same underlying table is not supported.
index:
Optional Argument.
Specifies whether to save Pandas DataFrame index as a column or not.
Types : Boolean (True or False)
Default : False
Note: Only use as True when attempting to save Pandas DataFrames (and not with teradataml DataFrames).
index_label:
Optional Argument.
Specifies the column label(s) for Pandas DataFrame index column(s).
Types : String or list of strings
Default : None
Note: If index_label is not specified (defaulted to None or is empty) and `index` is True, then
the 'names' property of the DataFrames index is used as the label(s),
and if that too is None or empty, then:
1) a default label 'index_label' or 'level_0' (when 'index_label' is already taken) is used
when index is standard.
2) default labels 'level_0', 'level_1', etc. are used when index is multi-level index.
Only use as True when attempting to save Pandas DataFrames (and not on teradataml DataFrames).
primary_index:
Optional Argument.
Specifies which column(s) to use as primary index while creating Teradata table(s) in Vantage.
When None, No Primary Index Teradata tables are created.
Types : String or list of strings
Default : None
Example:
primary_index = 'my_primary_index'
primary_index = ['my_primary_index1', 'my_primary_index2', 'my_primary_index3']
temporary:
Optional Argument.
Specifies whether to creates Vantage tables as permanent or volatile.
Types : Boolean (True or False)
Default : False
Note: When True:
1. volatile Tables are created, and
2. schema_name is ignored.
When False, permanent tables are created.
types
Optional Argument.
Specifies required data-types for requested columns to be saved in Vantage.
Types: Python dictionary ({column_name1: type_value1, ... column_nameN: type_valueN})
Default: None
Note:
1. This argument accepts a dictionary of columns names and their required teradatasqlalchemy types
as key-value pairs, allowing to specify a subset of the columns of a specific type.
i) When the input is a Pandas DataFrame:
- When only a subset of all columns are provided, the column types for the rest are assigned
appropriately.
- When types argument is not provided, the column types are assigned
as listed in the following table:
+---------------------------+-----------------------------------------+
| Pandas/Numpy Type | teradatasqlalchemy Type |
+---------------------------+-----------------------------------------+
| int32 | INTEGER |
+---------------------------+-----------------------------------------+
| int64 | BIGINT |
+---------------------------+-----------------------------------------+
| bool | BYTEINT |
+---------------------------+-----------------------------------------+
| float32/float64 | FLOAT |
+---------------------------+-----------------------------------------+
| datetime64/datetime64[ns] | TIMESTAMP |
+---------------------------+-----------------------------------------+
| datetime64[ns,<time_zone>]| TIMESTAMP(timezone=True) |
+---------------------------+-----------------------------------------+
| Any other data type | VARCHAR(configure.default_varchar_size) |
+---------------------------+-----------------------------------------+
ii) When the input is a teradataml DataFrame:
- When only a subset of all columns are provided, the column types for the rest are retained.
- When types argument is not provided, the column types are retained.
2. This argument does not have any effect when the table specified using table_name and schema_name
exists and if_exists = 'append'.
primary_time_index_name:
Optional Argument.
Specifies a name for the Primary Time Index (PTI) when the table
to be created must be a PTI table.
Type: String
Note: This argument is not required or used when the table to be created
is not a PTI table. It will be ignored if specified without the timecode_column.
timecode_column:
Optional argument.
Required when the DataFrame must be saved as a PTI table.
Specifies the column in the DataFrame that reflects the form
of the timestamp data in the time series.
This column will be the TD_TIMECODE column in the table created.
It should be of SQL type TIMESTAMP(n), TIMESTAMP(n) WITH TIMEZONE, or DATE,
corresponding to Python types datetime.datetime or datetime.date, or Pandas dtype datetime64[ns].
Type: String
Note: When you specify this parameter, an attempt to create a PTI table
will be made. This argument is not required when the table to be created
is not a PTI table. If this argument is specified, primary_index will be ignored.
timezero_date:
Optional Argument.
Used when the DataFrame must be saved as a PTI table.
Specifies the earliest time series data that the PTI table will accept;
a date that precedes the earliest date in the time series data.
Value specified must be of the following format: DATE 'YYYY-MM-DD'
Default Value: DATE '1970-01-01'.
Type: String
Note: This argument is not required or used when the table to be created
is not a PTI table. It will be ignored if specified without the timecode_column.
timebucket_duration:
Optional Argument.
Required if columns_list is not specified or is None.
Used when the DataFrame must be saved as a PTI table.
Specifies a duration that serves to break up the time continuum in
the time series data into discrete groups or buckets.
Specified using the formal form time_unit(n), where n is a positive
integer, and time_unit can be any of the following:
CAL_YEARS, CAL_MONTHS, CAL_DAYS, WEEKS, DAYS, HOURS, MINUTES,
SECONDS, MILLISECONDS, or MICROSECONDS.
Type: String
Note: This argument is not required or used when the table to be created
is not a PTI table. It will be ignored if specified without the timecode_column.
columns_list:
Optional Argument.
Used when the DataFrame must be saved as a PTI table.
Required if timebucket_duration is not specified.
A list of one or more PTI table column names.
Type: String or list of Strings
Note: This argument is not required or used when the table to be created
is not a PTI table. It will be ignored if specified without the timecode_column.
sequence_column:
Optional Argument.
Used when the DataFrame must be saved as a PTI table.
Specifies the column of type Integer containing the unique identifier for
time series data readings when they are not unique in time.
* When specified, implies SEQUENCED, meaning more than one reading from the same
sensor may have the same timestamp.
This column will be the TD_SEQNO column in the table created.
* When not specified, implies NONSEQUENCED, meaning there is only one sensor reading
per timestamp.
This is the default.
Type: str
Note: This argument is not required or used when the table to be created
is not a PTI table. It will be ignored if specified without the timecode_column.
seq_max:
Optional Argument.
Used when the DataFrame must be saved as a PTI table.
Specifies the maximum number of sensor data rows that can have the
same timestamp. Can be used when 'sequenced' is True.
Accepted range: 1 - 2147483647.
Default Value: 20000.
Type: int
Note: This argument is not required or used when the table to be created
is not a PTI table. It will be ignored if specified without the timecode_column.
set_table:
Optional Argument.
Specifies a flag to determine whether to create a SET or a MULTISET table.
When True, a SET table is created.
When False, a MULTISET table is created.
Default Value: False
Type: boolean
Note: 1. Specifying set_table=True also requires specifying primary_index or timecode_column.
2. Creating SET table (set_table=True) may result in
a. an error if the source is a Pandas DataFrame having duplicate rows.
b. loss of duplicate rows if the source is a teradataml DataFrame.
3. This argument has no effect if the table already exists and if_exists='append'.
chunksize:
Optional Argument.
Specifies the number of rows to be loaded in a batch.
Note:
This is argument is used only when argument "df" is pandas DataFrame.
Default Value: 16383
Types: int
match_column_order:
Optional Argument.
Specifies whether the order of the columns in existing table matches the order of
the columns in the "df" or not. When set to False, the dataframe to be loaded can
have any order and number of columns.
Default Value: True
Types: bool
RETURNS:
None
RAISES:
TeradataMlException
EXAMPLES:
1. Saving a Pandas DataFrame:
>>> from teradataml.dataframe.copy_to import copy_to_sql
>>> from teradatasqlalchemy.types import *
>>> df = {'emp_name': ['A1', 'A2', 'A3', 'A4'],
'emp_sage': [100, 200, 300, 400],
'emp_id': [133, 144, 155, 177],
'marks': [99.99, 97.32, 94.67, 91.00]
}
>>> pandas_df = pd.DataFrame(df)
a) Save a Pandas DataFrame using a dataframe & table name only:
>>> copy_to_sql(df = pandas_df, table_name = 'my_table')
b) Saving as a SET table
>>> copy_to_sql(df = pandas_df, table_name = 'my_set_table', index=True,
primary_index='index_label', set_table=True)
c) Save a Pandas DataFrame by specifying additional parameters:
>>> copy_to_sql(df = pandas_df, table_name = 'my_table_2', schema_name = 'alice',
index = True, index_label = 'my_index_label', temporary = False,
primary_index = ['emp_id'], if_exists = 'append',
types = {'emp_name': VARCHAR, 'emp_sage':INTEGER,
'emp_id': BIGINT, 'marks': DECIMAL})
d) Saving with additional parameters as a SET table
>>> copy_to_sql(df = pandas_df, table_name = 'my_table_3', schema_name = 'alice',
index = True, index_label = 'my_index_label', temporary = False,
primary_index = ['emp_id'], if_exists = 'append',
types = {'emp_name': VARCHAR, 'emp_sage':INTEGER,
'emp_id': BIGINT, 'marks': DECIMAL},
set_table=True)
e) Saving levels in index of type MultiIndex
>>> pandas_df = pandas_df.set_index(['emp_id', 'emp_name'])
>>> copy_to_sql(df = pandas_df, table_name = 'my_table_4', schema_name = 'alice',
index = True, index_label = ['index1', 'index2'], temporary = False,
primary_index = ['index1'], if_exists = 'replace')
2. Saving a teradataml DataFrame:
>>> from teradataml.dataframe.dataframe import DataFrame
>>> from teradataml.dataframe.copy_to import copy_to_sql
>>> from teradatasqlalchemy.types import *
>>> from teradataml.data.load_example_data import load_example_data
>>> # Load the data to run the example.
>>> load_example_data("glm", "admissions_train")
>>> # Create teradataml DataFrame(s)
>>> df = DataFrame('admissions_train')
>>> df2 = df.select(['gpa', 'masters'])
a) Save a teradataml DataFrame by using only a table name:
>>> df2.to_sql('my_tdml_table')
b) Save a teradataml DataFrame by using additional parameters:
>>> df2.to_sql(table_name = 'my_tdml_table', if_exists='append',
primary_index = ['gpa'], temporary=False, schema_name='alice')
c) Alternatively, save a teradataml DataFrame by using copy_to_sql:
>>> copy_to_sql(df2, 'my_tdml_table_2')
d) Save a teradataml DataFrame by using copy_to_sql with additional parameters:
>>> copy_to_sql(df = df2, table_name = 'my_tdml_table_3', schema_name = 'alice',
temporary = False, primary_index = None, if_exists = 'append',
types = {'masters': VARCHAR, 'gpa':INTEGER})
e) Saving as a SET table
>>> copy_to_sql(df = df2, table_name = 'my_tdml_set_table', schema_name = 'alice',
temporary = False, primary_index = ['gpa'], if_exists = 'append',
types = {'masters': VARCHAR, 'gpa':INTEGER}, set_table = True)
3. Saving a teradataml DataFrame as a PTI table:
>>> from teradataml.dataframe.dataframe import DataFrame
>>> from teradataml.dataframe.copy_to import copy_to_sql
>>> from teradataml.data.load_example_data import load_example_data
>>> load_example_data("sessionize", "sessionize_table")
>>> df3 = DataFrame('sessionize_table')
a) Using copy_to_sql
>>> copy_to_sql(df3, "test_copyto_pti",
timecode_column='clicktime',
columns_list='event')
b) Alternatively, using DataFrame.to_sql
>>> df3.to_sql(table_name = "test_copyto_pti_1",
timecode_column='clicktime',
columns_list='event')
c) Saving as a SET table
>>> copy_to_sql(df3, "test_copyto_pti_2",
timecode_column='clicktime',
columns_list='event',
set_table=True)