Use the copy_to_sql() function to create a table in Vantage based on a teradataml DataFrame or a pandas DataFrame.
The function takes a teradataml DataFrame or a pandas DataFrame and a table name as arguments, and generates DDL and DML commands that creates a table in Vantage. You can also specify the name of the schema in the database to write the table to. If no schema name is specified, the default database schema is used.
- df: Specifies the Pandas or teradataml DataFrame object to be saved.
- table_name: Specifies the name of the table to be created in Vantage.
- schema_name: Specifies the name of the SQL schema in Vantage to write to.
- if_exists:
Specifies the action to take when table already exists in
the database. Possible values are:
- 'fail': If table exists, do nothing;
- 'replace': If table exists, drop it, recreate it, and insert data;
- 'append': If table exists, insert data. Create if does not exist.
The default value is 'append'.
Replacing a table with the contents of a teradataml DataFrame based on the same underlying table is not supported. - index:
Specifies whether to save pandas DataFrame index as a
column or not. Possible values are True or False. The default value is False.Only use as True when attempting to save Pandas DataFrames (and not on teradataml DataFrames).
- index_label: Specifies the column labels
for pandas DataFrame index columns. If no value is given
and index is True, then a default label 'index_label' is
used.The default value is None.Only use this argument when attempting to save Pandas DataFrames (and not on teradataml DataFrames).If this argument is not specified (defaulted to None or is empty) and argument index is set to True, then the names property of the DataFrames index is used as the label(s), and if that too is None or empty, then:
- A default label 'index_label' or 'level_0' (when 'index_label' is already taken) is used when index is standard.
- Default labels 'level_0', 'level_1', etc. are used when index is multi-level index.
- primary_index: Specifies the column(s)
to use as primary index while creating tables in Vantage. When None (default value), No
Primary Index (NOPI) tables are created. For example:
- primary_index = 'my_primary_index'
- primary_index = ['my_primary_index1', 'my_primary_index2', 'my_primary_index3']
- temporary:
Specifies whether to create Vantage tables as permanent or
volatile.Possible values are:
- True: Creates volatile tables, and schema_name is ignored.
- False: Creates permanent tables.
The default value is False.
- types:
Specifies required data types for requested columns to be
saved in Vantage.
The default value is None.
This argument accepts a dictionary of column names and their required teradatasqlalchemy types as key-value pairs, allowing to specify a subset of the columns of a specific type.
- When only a subset of all columns are provided, the rest are defaulted to appropriate types.
- When types argument is not
provided, all column types are appropriately
assigned and defaulted. The column types are
assigned as listed in the following table.
pandas/NumPy Type teradatasqlalchemy Type int32 INTEGER int64 BIGINT bool BYTEINT float32/float64 FLOAT datetime64/datatime64[ns] TIMESTAMP datatime64[ns,time_zone] TIMESTAMP(time_zone=True) Any other data type VARCHAR(configure.default_varchar_size)
- primary_time_index_name: Specifies a name for the Primary Time Index (PTI) when the table to be created must be a PTI table.This argument is not required or used when the table to be created is not a PTI table. It is ignored if specified without the argument timecode_column.
- timecode_column: Specifies the column in the DataFrame that reflects the form of the timestamp data in the time series. This argument is required when the DataFrame must be saved as a PTI table.
This column is the TD_TIMECODE column in the table created. It must be of the SQL type TIMESTAMP(n), TIMESTAMP(n) WITH TIME ZONE, or DATE, corresponding to the Python types datetime.datetime or datetime.date, or Pandas dtype datetime64[ns].
This argument is not required when the table to be created is not a PTI table. When specifying this argument, an attempt to create a PTI table is made. If this argument is specified, primary_index is ignored. - timezero_date: Specifies the earliest time series data that the PTI accepts; a date that precedes the earliest date in the time series data. Value specified must be of the following format: DATE 'YYYY-MM-DD'.
The default value is: DATE '1970-01-01'.
- This argument is used when the DataFrame must be saved as a PTI table.
- This argument is not required or used when the table to be created is not a PTI table.
- It is ignored if specified without the timecode_column argument.
- timebucket_duration: Specifies a
duration that serves to break up the time continuum in the
time series data into discrete groups or buckets.
Specified using the formal form time_unit(n), where n is a positive integer, and time_unit can be any of the following: CAL_YEARS, CAL_MONTHS, CAL_DAYS, WEEKS, DAYS, HOURS, MINUTES, SECONDS, MILLISECONDS, OR MICROSECONDS.
- This argument is required if columns_list is not specified or is empty.
- This argument is used when the DataFrame must be saved as a PTI table.
- This argument is not required or used when the table to be created is not a PTI table.
- It is ignored if specified without the timecode_column argument.
- columns_list: Specifies a list of one or
more PTI table column names.
- This argument is required if timebucket_duration is not specified or is empty.
- This argument is used when the DataFrame must be saved as a PTI table.
- This argument is not required or used when the table to be created is not a PTI table.
- It is ignored if specified without the timecode_column argument.
- sequence_column: Specifies the column of
type Integer containing the unique identifier for time
series data reading when they are not unique in time.
- When specified, implies SEQUENCED, meaning more than one reading from the same sensor may have the same timestamp. This column is the TD_SEQNO column in the table created.
- When not specified, implies NONSEQUENCED, meaning there is only one sensor reading per timestamp. This is the default.
- This argument is used when the DataFrame must be saved as a PTI table.
- This argument is not required or used when the table to be created is not a PTI table.
- It is ignored if specified without the timecode_column argument.
- seq_max:
Specifies the maximum number of sensor data rows that can
have the same timestamp. Can be used when 'sequenced' is
True.
Accepted values range from 1 to 2147483647, with default value 20000.
- This argument is used when the DataFrame must be saved as a PTI table.
- This argument is not required or used when the table to be created is not a PTI table.
- It is ignored if specified without the timecode_column argument.
- set_table: Specifies a flag to determine
whether to create a SET or a MULTISET table.
- When True, a SET table is created.
- When False, a MULTISET table is created. This is the default value.
- Specifying set_table=True also requires specifying primary_index or timecode_column.
- Creating SET table
(set_table=True) may result in
- an error if the source is a Pandas DataFrame having duplicate rows;
- loss of duplicate rows if the source is a teradataml DataFrame.
- This argument has no effect if the table already exists and if_exists='append'.
- match_column_order: Specifies whether
the order of the columns in existing table matches the
order of the columns in the "df" or not.
When set to False, the DataFrame to be loaded can have any order and number of columns.
The default value is True.
- chunksize: Specifies the number of rows
to be loaded in a batch, if DataFrame is pandas
DataFrame.
The default value is 16383.
Example 1: Create a new table in Vantage based on a pandas DataFrame
>>>import pandas as pd >>>from teradataml.dataframe.copy_to import copy_to_sql
>>>sales = [{'accounts': 'Jones LLC', 'Jan': 150, 'Feb': 200.0, 'Mar': 140, 'datetime':'2017-04-01'}, {'accounts': 'Alpha Co', 'Jan': 200, 'Feb': 210.0, 'Mar': 215, 'datetime': '2017-04-01'}, {'accounts': 'Blue Inc', 'Jan': 50, 'Feb': 90.0, 'Mar': 95, 'datetime': '2017-04-01' }]
pdf = pd.DataFrame(sales)
>>>pdf Feb Jan Mar accounts datetime 0 200.0 150 140 Jones LLC 2017-04-01 1 210.0 200 215 Alpha Co 2017-04-01 2 90.0 50 95 Blue Inc 2017-04-01
>>>copy_to_sql(df = pdf, table_name = "pandas_sales", primary_index="accounts", if_exists="replace")
>>>df = DataFrame("pandas_sales")
>>> df Feb Jan Mar datetime accounts Jones LLC 200.0 150 140 2017-04-01 00:00:00.000000 Alpha Co 210.0 200 215 2017-04-01 00:00:00.000000 Blue Inc 90.0 50 95 2017-04-01 00:00:00.000000
Example 2: Use the optional index and index_label parameters, and use the index as Primary Index.
>>> copy_to_sql(df = pdf, table_name = "pandas_sales", index=True, index_label="idx", primary_index="idx", if_exists="replace")
>>> df = DataFrame("pandas_sales") >>> df Feb Jan Mar accounts datetime idx 0 200.0 150 140 Jones LLC 2017-04-01 00:00:00.000000 2 90.0 50 95 Blue Inc 2017-04-01 00:00:00.000000 1 210.0 200 215 Alpha Co 2017-04-01 00:00:00.000000
Example 3: Create a new table in Vantage based on a teradataml DataFrame
>>> df = DataFrame("sales") >>> df Feb Jan Mar Apr datetime accounts Blue Inc 90.0 50 95 101 04/01/2017 Alpha Co 210.0 200 215 250 04/01/2017 Jones LLC 200.0 150 140 180 04/01/2017 Yellow Inc 90.0 None None None 04/01/2017 Orange Inc 210.0 None None 250 04/01/2017 Red Inc 200.0 150 140 None 04/01/2017
>>> copy_to_sql(df = df, table_name = "sales_df1", primary_index="accounts", if_exists="replace")
>>> df1 = DataFrame("sales_df1") >>> df1 Feb Jan Mar Apr datetime accounts Blue Inc 90.0 50 95 101 17/01/04 Orange Inc 210.0 None None 250 17/01/04 Red Inc 200.0 150 140 None 17/01/04 Yellow Inc 90.0 None None None 17/01/04 Jones LLC 200.0 150 140 180 17/01/04 Alpha Co 210.0 200 215 250 17/01/04
Example 4: Create a new Primary Time Index Table in Vantage from a Pandas DataFrame
>>> copy_to_sql(df = pdf, table_name = "pandas_sales_pti", timecode_column="datetime", columns_list="accounts")
>>> pandas_sales_pti = DataFrame('pandas_sales_pti') >>> pandas_sales_pti TD_TIMECODE Feb Jan Mar accounts Jones LLC 2017-04-01 00:00:00.000000 200.0 150 140 Alpha Co 2017-04-01 00:00:00.000000 210.0 200 215 Blue Inc 2017-04-01 00:00:00.000000 90.0 50 95