to_sql() DataFrame Method | Teradata Python Package - 17.00 - to_sql() DataFrame Method - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Release Date
November 2021
Content Type
User Guide
Publication ID
B700-4006-070K
Language
English (United States)

Use the to_sql() DataFrame method to create a table in Vantage based on a teradataml DataFrame.

The function takes a table name as an argument, and generates DDL and DML queries that creates a table in Vantage. You can also specify the name of the schema in the database to write the table to. If no schema name is specified, the default database schema is used.

Required argument:
  • table_name: Specifies the name of the table to be created in Vantage.

Optional arguments:

  • schema_name: Specifies the name of the SQL schema in Vantageto write to. The default value is None, which means the default Vantage schema.
    schema_name is ignored when temporary is True.
  • if_exists: Specifies the action to take when table already exists in the database. The possible values are:
    • 'fail': If table exists, do nothing;
    • 'replace': If table exists, drop it, recreate it, and insert data;
    • 'append': If table exists, insert data. Create if does not exist. This is the default value.
    Replacing a table with the content of a teradataml DataFrame based on the same underlying table is not supported.
  • primary_index: Creates Vantage tables with Primary index column. The value can be a single column name or a list of columns names. when primary_index is not specified, "No Primary Index Vantage tables are created.
  • temporary: Specifies whether to create a permanent table or volatile table. Volatile tables only exist for the lifetime of the session. Volatile tables automatically go away when the session ends. The possible values are:
    • True: Creates a volatile table;
    • False: Creates a permanent table. This is the default value.
  • types: Specifies required data types for requested columns to be saved in Vantage.
  • primary_time_index_name: Specifies a name for the Primary Time Index (PTI) when the table to be created must be a PTI table.
    This argument is not required or used when the table to be created is not a PTI table. It is ignored if specified without the timecode_column.
  • timecode_column: Specifies the column in the DataFrame that reflects the form of the timestamp data in the time series.
    This argument is required when the DataFrame must be saved as a PTI table. This argument is not required or used when the table to be created is not a PTI table. If this argument is specified, primary_index is ignored.
  • timezone_date: Specifies the earliest time series data that the PTI will accept; a date that precedes the earliest date in the time series data. Default value is DATE '1970-01-01'.
    This argument is used when the DataFrame must be saved as a PTI table. This argument is not required or used when the table to be created is not a PTI table. It is ignored if specified without the timecode_column.
  • timebucket_duration: Specifies a duration that serves to break up the time continuum in the time series data into discrete groups or buckets.
    This argument is required if columns_list is not specified or is empty. It is used when the DataFrame must be saved as a PTI table. This argument is not required or used when the table to be created is not a PTI table. It is ignored if specified without the timecode_column.
  • column_list: Specifies a list of one or more PTI table column names.
    This argument is required if timebucket_duration not specified. It is used when the DataFrame must be saved as a PTI table. This argument is not required or used when the table to be created is not a PTI table. It is ignored if specified without the timecode_column.
  • sequence_column: Specifies the column of type Integer containing the unique identifier for time series data reading when they are not unique in time.
    • When specified, implies SEQUENCED, meaning more than one reading from the same sensor may have the same timestamp. This column will be the TD_SEQNO column in the table created.
    • When not specified, implies NONSEQUENCED, meaning there is only one sensor reading per timestamp. This is the default.
    This argument is used when the DataFrame must be saved as a PTI table. This argument is not required or used when the table to be created is not a PTI table. It will be ignored if specified without the timecode_column.
  • seq_max: Specifies the maximum number of sensor data rows that can have the same timestamp. Can be used when 'sequenced' is True.
    This argument is used when the DataFrame must be saved as a PTI table. This argument is not required or used when the table to be created is not a PTI table. It will be ignored if specified without the timecode_column.
  • set_table: Specifies a flag to determine whether to create a SET or a MULTISET table.
    • When True, a SET table is created.
    • When False, a MULTISET table is created. This is the default value.
    • Specifying set_table=True also requires specifying primary_index or timecode_column.
    • Creating SET table (set_table=True) may result in loss of duplicate rows.
    • This argument has no effect if the table already exists and if_exists='append'.

Example 1: Create a NOPI table with selected columns from existing table

This example uses the to_sql() function to create a new table "sales_df1" based on the existing table "sales" in Teradata Vantage. The new table "sales_df1" includes only the columns "accounts", "Jan", and "Feb.

  • Create a teradataml DataFrame "df" from the existing table "sales".
    df = DataFrame("sales")
    >>> df
                  Feb    Jan    Mar    Apr    datetime
    accounts
    Blue Inc     90.0   50.0   95.0  101.0  04/01/2017
    Alpha Co    210.0  200.0  215.0  250.0  04/01/2017
    Jones LLC   200.0  150.0  140.0  180.0  04/01/2017
    Yellow Inc   90.0    NaN    NaN    NaN  04/01/2017
    Orange Inc  210.0    NaN    NaN  250.0  04/01/2017
    Red Inc     200.0  150.0  140.0    NaN  04/01/2017
  • Create a new dataframe with only the columns "accounts", "Jan", "Feb".
    >>> df = df.select(["accounts", "Jan", "Feb"])
    >>> df
         accounts    Jan    Feb
    0   Jones LLC  150.0  200.0
    1    Blue Inc   50.0   90.0
    2  Yellow Inc    NaN   90.0
    3  Orange Inc    NaN  210.0
    4    Alpha Co  200.0  210.0
    5     Red Inc  150.0  200.0
  • Creates a new table "sales_df1" in Vantage based on the new DataFrame. The table "sales_df1" is a NOPI table because the primary index was not specified.
    >>> df.to_sql("sales_df1")
  • Verify that the new table "sales_df1" exists in Vantage using the DataFrame() function which creates a DataFrame based on an existing table.
    >>> df1 = DataFrame("sales_df1")
    
    >>> df1
         accounts   Jan    Feb
    0    Blue Inc    50   90.0
    1  Orange Inc  None  210.0
    2     Red Inc   150  200.0
    3  Yellow Inc  None   90.0
    4   Jones LLC   150  200.0
    5    Alpha Co   200  210.0

Example 2: Create a table using the optional parameter primary_index

This example uses the to_sql() function with the optional parameter "primary_index" to create a new table "sales_df2" in Vantage with the primary index "accounts". Use the optional parameter "if_exists" to replace the existing table "sales_df2" in Vantage

  • Create a teradataml DataFrame "df" from the existing table "sales".
    df = DataFrame("sales")
  • Create a new dataframe with only the columns "accounts", "datetime".
    >>> df = df.select(["accounts", "datetime"])
  • Create a new table "sales_df2" with the primary index "accounts".
    >>> df.to_sql("sales_df2", if_exists="replace", primary_index="accounts")
  • Verify the new table by creating a new DataFrame from the new table.
    >>> df2 = DataFrame("sales_df2")
    >>> df2
                datetime
    accounts           
    Blue Inc    17/01/04
    Orange Inc  17/01/04
    Red Inc     17/01/04
    Yellow Inc  17/01/04
    Jones LLC   17/01/04
    Alpha Co    17/01/04

Example 3: Use optional parameter "set_table" to create a SET table "sales_subset_set"

  • Create a teradataml DataFrame "df" from the existing table "sales".
    df = DataFrame("sales")
  • Create a new dataframe with the columns "Apr", "datetime".
    >>> df_for_set = df.select(['Apr','datetime'])
    >>> df_for_set
        Apr    datetime
    0   101  04/01/2017
    1   250  04/01/2017
    2   180  04/01/2017
    3  None  04/01/2017
    4   250  04/01/2017
    5  None  04/01/2017
    
  • Create a SET table "sales_subset_set" using the optional parameter "set_table".
    >>> df_for_set.to_sql('sales_subset_set', primary_index='Apr', set_table=True)
  • Verify the new table by creating a new DataFrame from the new table.
    >>> set_df = DataFrame('sales_subset_set')
    >>> set_df
         datetime
    Apr
    180  17/01/04
    101  17/01/04
    250  17/01/04
    NaN  17/01/04

Example 4: Create a new Primary Time Index table "sales_pti".

  • Create a teradataml DataFrame "df" from the existing table "sales".
    df = DataFrame("sales")
  • Create a new Primary Time Index table "sales_pti".
    >>> df.to_sql('sales_pti', timecode_column='datetime', columns_list='accounts')
  • Verify the new table by creating a new DataFrame from the new table.
    >>> sales_pti= DataFrame('sales_pti')
    >>> sales_pti
               TD_TIMECODE    Feb   Jan   Mar   Apr
    accounts
    Blue Inc      17/01/04   90.0    50    95   101
    Orange Inc    17/01/04  210.0  None  None   250
    Red Inc       17/01/04  200.0   150   140  None
    Yellow Inc    17/01/04   90.0  None  None  None
    Jones LLC     17/01/04  200.0   150   140   180
    Alpha Co      17/01/04  210.0   200   215   250
See also copy_to_sql().