Teradata Package for Python Function Reference | 17.10 - copy_to_sql - Teradata Package for Python
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- Product Category
- Teradata Vantage
- teradataml.dataframe.copy_to.copy_to_sql = copy_to_sql(df, table_name, schema_name=None, if_exists='append', index=False, index_label=None, primary_index=None, temporary=False, types=None, primary_time_index_name=None, timecode_column=None, timebucket_duration=None, timezero_date=None, columns_list=None, sequence_column=None, seq_max=None, set_table=False, chunksize=16383)
- Writes records stored in a Pandas DataFrame or a teradataml DataFrame to Teradata Vantage.
PARAMETERS:
df:
Required Argument.
Specifies the Pandas or teradataml DataFrame object to be saved.
Types: pandas.DataFrame or teradataml.dataframe.dataframe.DataFrame
table_name:
Required Argument.
Specifies the name of the table to be created in Vantage.
Types : String
schema_name:
Optional Argument.
Specifies the name of the SQL schema in Teradata Vantage to write to.
Types: String
Default: None (Uses default database schema).
Note: schema_name will be ignored when temporary=True.
if_exists:
Optional Argument.
Specifies the action to take when table already exists in Vantage.
Types: String
Possible values: {'fail', 'replace', 'append'}
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
Default : append
Note: Replacing a table with the contents of a teradataml DataFrame based on
the same underlying table is not supported.
index:
Optional Argument.
Specifies whether to save Pandas DataFrame index as a column or not.
Types : Boolean (True or False)
Default : False
Note: Only use as True when attempting to save Pandas DataFrames (and not with teradataml DataFrames).
index_label:
Optional Argument.
Specifies the column label(s) for Pandas DataFrame index column(s).
Types : String or list of strings
Default : None
Note: If index_label is not specified (defaulted to None or is empty) and `index` is True, then
the 'names' property of the DataFrames index is used as the label(s),
and if that too is None or empty, then:
1) a default label 'index_label' or 'level_0' (when 'index_label' is already taken) is used
when index is standard.
2) default labels 'level_0', 'level_1', etc. are used when index is multi-level index.
Only use as True when attempting to save Pandas DataFrames (and not on teradataml DataFrames).
primary_index:
Optional Argument.
Specifies which column(s) to use as primary index while creating Teradata table(s) in Vantage.
When None, No Primary Index Teradata tables are created.
Types : String or list of strings
Default : None
Example:
primary_index = 'my_primary_index'
primary_index = ['my_primary_index1', 'my_primary_index2', 'my_primary_index3']
temporary:
Optional Argument.
Specifies whether to creates Vantage tables as permanent or volatile.
Types : Boolean (True or False)
Default : False
Note: When True:
1. volatile Tables are created, and
2. schema_name is ignored.
When False, permanent tables are created.
types
Optional Argument.
Specifies required data-types for requested columns to be saved in Vantage.
Types: Python dictionary ({column_name1: type_value1, ... column_nameN: type_valueN})
Default: None
Note:
1. This argument accepts a dictionary of columns names and their required teradatasqlalchemy types
as key-value pairs, allowing to specify a subset of the columns of a specific type.
i) When the input is a Pandas DataFrame:
- When only a subset of all columns are provided, the column types for the rest are assigned
appropriately.
- When types argument is not provided, the column types are assigned
as listed in the following table:
+---------------------------+-----------------------------------------+
| Pandas/Numpy Type | teradatasqlalchemy Type |
+---------------------------+-----------------------------------------+
| int32 | INTEGER |
+---------------------------+-----------------------------------------+
| int64 | BIGINT |
+---------------------------+-----------------------------------------+
| bool | BYTEINT |
+---------------------------+-----------------------------------------+
| float32/float64 | FLOAT |
+---------------------------+-----------------------------------------+
| datetime64/datetime64[ns] | TIMESTAMP |
+---------------------------+-----------------------------------------+
| Any other data type | VARCHAR(configure.default_varchar_size) |
+---------------------------+-----------------------------------------+
ii) When the input is a teradataml DataFrame:
- When only a subset of all columns are provided, the column types for the rest are retained.
- When types argument is not provided, the column types are retained.
2. This argument does not have any effect when the table specified using table_name and schema_name
exists and if_exists = 'append'.
primary_time_index_name:
Optional Argument.
Specifies a name for the Primary Time Index (PTI) when the table
to be created must be a PTI table.
Type: String
Note: This argument is not required or used when the table to be created
is not a PTI table. It will be ignored if specified without the timecode_column.
timecode_column:
Optional argument.
Required when the DataFrame must be saved as a PTI table.
Specifies the column in the DataFrame that reflects the form
of the timestamp data in the time series.
This column will be the TD_TIMECODE column in the table created.
It should be of SQL type TIMESTAMP(n), TIMESTAMP(n) WITH TIMEZONE, or DATE,
corresponding to Python types datetime.datetime or datetime.date, or Pandas dtype datetime64[ns].
Type: String
Note: When you specify this parameter, an attempt to create a PTI table
will be made. This argument is not required when the table to be created
is not a PTI table. If this argument is specified, primary_index will be ignored.
timezero_date:
Optional Argument.
Used when the DataFrame must be saved as a PTI table.
Specifies the earliest time series data that the PTI table will accept;
a date that precedes the earliest date in the time series data.
Value specified must be of the following format: DATE 'YYYY-MM-DD'
Default Value: DATE '1970-01-01'.
Type: String
Note: This argument is not required or used when the table to be created
is not a PTI table. It will be ignored if specified without the timecode_column.
timebucket_duration:
Optional Argument.
Required if columns_list is not specified or is None.
Used when the DataFrame must be saved as a PTI table.
Specifies a duration that serves to break up the time continuum in
the time series data into discrete groups or buckets.
Specified using the formal form time_unit(n), where n is a positive
integer, and time_unit can be any of the following:
CAL_YEARS, CAL_MONTHS, CAL_DAYS, WEEKS, DAYS, HOURS, MINUTES,
SECONDS, MILLISECONDS, or MICROSECONDS.
Type: String
Note: This argument is not required or used when the table to be created
is not a PTI table. It will be ignored if specified without the timecode_column.
columns_list:
Optional Argument.
Used when the DataFrame must be saved as a PTI table.
Required if timebucket_duration is not specified.
A list of one or more PTI table column names.
Type: String or list of Strings
Note: This argument is not required or used when the table to be created
is not a PTI table. It will be ignored if specified without the timecode_column.
sequence_column:
Optional Argument.
Used when the DataFrame must be saved as a PTI table.
Specifies the column of type Integer containing the unique identifier for
time series data readings when they are not unique in time.
* When specified, implies SEQUENCED, meaning more than one reading from the same
sensor may have the same timestamp.
This column will be the TD_SEQNO column in the table created.
* When not specified, implies NONSEQUENCED, meaning there is only one sensor reading
per timestamp.
This is the default.
Type: str
Note: This argument is not required or used when the table to be created
is not a PTI table. It will be ignored if specified without the timecode_column.
seq_max:
Optional Argument.
Used when the DataFrame must be saved as a PTI table.
Specifies the maximum number of sensor data rows that can have the
same timestamp. Can be used when 'sequenced' is True.
Accepted range: 1 - 2147483647.
Default Value: 20000.
Type: int
Note: This argument is not required or used when the table to be created
is not a PTI table. It will be ignored if specified without the timecode_column.
set_table:
Optional Argument.
Specifies a flag to determine whether to create a SET or a MULTISET table.
When True, a SET table is created.
When False, a MULTISET table is created.
Default Value: False
Type: boolean
Note: 1. Specifying set_table=True also requires specifying primary_index or timecode_column.
2. Creating SET table (set_table=True) may result in
a. an error if the source is a Pandas DataFrame having duplicate rows.
b. loss of duplicate rows if the source is a teradataml DataFrame.
3. This argument has no effect if the table already exists and if_exists='append'.
chunksize:
Optional Argument.
Specifies the number of rows to be loaded in a batch.
Note:
This is argument is used only when argument "df" is pandas DataFrame.
Default Value: 16383
Types: int
RETURNS:
None
RAISES:
TeradataMlException
EXAMPLES:
1. Saving a Pandas DataFrame:
>>> from teradataml.dataframe.copy_to import copy_to_sql
>>> from teradatasqlalchemy.types import *
>>> df = {'emp_name': ['A1', 'A2', 'A3', 'A4'],
'emp_sage': [100, 200, 300, 400],
'emp_id': [133, 144, 155, 177],
'marks': [99.99, 97.32, 94.67, 91.00]
}
>>> pandas_df = pd.DataFrame(df)
a) Save a Pandas DataFrame using a dataframe & table name only:
>>> copy_to_sql(df = pandas_df, table_name = 'my_table')
b) Saving as a SET table
>>> copy_to_sql(df = pandas_df, table_name = 'my_set_table', index=True,
primary_index='index_label', set_table=True)
c) Save a Pandas DataFrame by specifying additional parameters:
>>> copy_to_sql(df = pandas_df, table_name = 'my_table_2', schema_name = 'alice',
index = True, index_label = 'my_index_label', temporary = False,
primary_index = ['emp_id'], if_exists = 'append',
types = {'emp_name': VARCHAR, 'emp_sage':INTEGER,
'emp_id': BIGINT, 'marks': DECIMAL})
d) Saving with additional parameters as a SET table
>>> copy_to_sql(df = pandas_df, table_name = 'my_table_3', schema_name = 'alice',
index = True, index_label = 'my_index_label', temporary = False,
primary_index = ['emp_id'], if_exists = 'append',
types = {'emp_name': VARCHAR, 'emp_sage':INTEGER,
'emp_id': BIGINT, 'marks': DECIMAL},
set_table=True)
e) Saving levels in index of type MultiIndex
>>> pandas_df = pandas_df.set_index(['emp_id', 'emp_name'])
>>> copy_to_sql(df = pandas_df, table_name = 'my_table_4', schema_name = 'alice',
index = True, index_label = ['index1', 'index2'], temporary = False,
primary_index = ['index1'], if_exists = 'replace')
2. Saving a teradataml DataFrame:
>>> from teradataml.dataframe.dataframe import DataFrame
>>> from teradataml.dataframe.copy_to import copy_to_sql
>>> from teradatasqlalchemy.types import *
>>> from teradataml.data.load_example_data import load_example_data
>>> # Load the data to run the example.
>>> load_example_data("glm", "admissions_train")
>>> # Create teradataml DataFrame(s)
>>> df = DataFrame('admissions_train')
>>> df2 = df.select(['gpa', 'masters'])
a) Save a teradataml DataFrame by using only a table name:
>>> df2.to_sql('my_tdml_table')
b) Save a teradataml DataFrame by using additional parameters:
>>> df2.to_sql(table_name = 'my_tdml_table', if_exists='append',
primary_index = ['gpa'], temporary=False, schema_name='alice')
c) Alternatively, save a teradataml DataFrame by using copy_to_sql:
>>> copy_to_sql(df2, 'my_tdml_table_2')
d) Save a teradataml DataFrame by using copy_to_sql with additional parameters:
>>> copy_to_sql(df = df2, table_name = 'my_tdml_table_3', schema_name = 'alice',
temporary = False, primary_index = None, if_exists = 'append',
types = {'masters': VARCHAR, 'gpa':INTEGER})
e) Saving as a SET table
>>> copy_to_sql(df = df2, table_name = 'my_tdml_set_table', schema_name = 'alice',
temporary = False, primary_index = ['gpa'], if_exists = 'append',
types = {'masters': VARCHAR, 'gpa':INTEGER}, set_table = True)
3. Saving a teradataml DataFrame as a PTI table:
>>> from teradataml.dataframe.dataframe import DataFrame
>>> from teradataml.dataframe.copy_to import copy_to_sql
>>> from teradataml.data.load_example_data import load_example_data
>>> load_example_data("sessionize", "sessionize_table")
>>> df3 = DataFrame('sessionize_table')
a) Using copy_to_sql
>>> copy_to_sql(df3, "test_copyto_pti",
timecode_column='clicktime',
columns_list='event')
b) Alternatively, using DataFrame.to_sql
>>> df3.to_sql(table_name = "test_copyto_pti_1",
timecode_column='clicktime',
columns_list='event')
c) Saving as a SET table
>>> copy_to_sql(df3, "test_copyto_pti_2",
timecode_column='clicktime',
columns_list='event',
set_table=True)