Use the to_sql() DataFrame method to create a table in Vantage based on a teradataml DataFrame.
The function takes a table name as an argument, and generates DDL and DML queries that creates a table in Vantage. You can also specify the name of the schema in the database to write the table to. If no schema name is specified, the default database schema is used.
- table_name: Specifies the name of the table to be created in Vantage.
Optional arguments:
- schema_name: Specifies the name of the SQL schema in Vantageto write to. The default value is None, which means the default Vantage schema.schema_name is ignored when temporary is True.
- if_exists: Specifies the action to take when table already exists in the database. The possible values are:
- 'fail': If table exists, do nothing;
- 'replace': If table exists, drop it, recreate it, and insert data;
- 'append': If table exists, insert data. Create if does not exist. This is the default value.
Replacing a table with the content of a teradataml DataFrame based on the same underlying table is not supported. - primary_index: Creates Vantage tables with Primary index column. The value can be a single column name or a list of columns names. when primary_index is not specified, "No Primary Index Vantage tables are created.
- temporary: Specifies whether to create a permanent table or volatile table. Volatile tables only exist for the lifetime of the session. Volatile tables automatically go away when the session ends. The possible values are:
- True: Creates a volatile table;
- False: Creates a permanent table. This is the default value.
- types: Specifies required data types for requested columns to be saved in Vantage.
- primary_time_index_name: Specifies a name for the Primary Time Index (PTI) when the table to be created must be a PTI table.This argument is not required or used when the table to be created is not a PTI table. It is ignored if specified without the timecode_column.
- timecode_column: Specifies the column in the DataFrame that reflects the form of the timestamp data in the time series.This argument is required when the DataFrame must be saved as a PTI table. This argument is not required or used when the table to be created is not a PTI table. If this argument is specified, primary_index is ignored.
- timezero_date: Specifies the earliest time series data that the PTI will accept; a date that precedes the earliest date in the time series data. Default value is DATE '1970-01-01'.This argument is used when the DataFrame must be saved as a PTI table. This argument is not required or used when the table to be created is not a PTI table. It is ignored if specified without the timecode_column.
- timebucket_duration: Specifies a duration that serves to break up the time continuum in the time series data into discrete groups or buckets.This argument is required if columns_list is not specified or is empty. It is used when the DataFrame must be saved as a PTI table. This argument is not required or used when the table to be created is not a PTI table. It is ignored if specified without the timecode_column.
- column_list: Specifies a list of one or more PTI table column names.This argument is required if timebucket_duration not specified. It is used when the DataFrame must be saved as a PTI table. This argument is not required or used when the table to be created is not a PTI table. It is ignored if specified without the timecode_column.
- sequence_column: Specifies the column of type Integer containing the unique identifier for time series data reading when they are not unique in time.
- When specified, implies SEQUENCED, meaning more than one reading from the same sensor may have the same timestamp. This column will be the TD_SEQNO column in the table created.
- When not specified, implies NONSEQUENCED, meaning there is only one sensor reading per timestamp. This is the default.
This argument is used when the DataFrame must be saved as a PTI table. This argument is not required or used when the table to be created is not a PTI table. It will be ignored if specified without the timecode_column. - seq_max: Specifies the maximum number of sensor data rows that can have the same timestamp. Can be used when 'sequenced' is True.This argument is used when the DataFrame must be saved as a PTI table. This argument is not required or used when the table to be created is not a PTI table. It will be ignored if specified without the timecode_column.
- set_table: Specifies a flag to determine whether to create a SET or a MULTISET table.
- When True, a SET table is created.
- When False, a MULTISET table is created. This is the default value.
- Specifying set_table=True also requires specifying primary_index or timecode_column.
- Creating SET table (set_table=True) may result in loss of duplicate rows.
- This argument has no effect if the table already exists and if_exists='append'.
Example 1: Create a NOPI table with selected columns from existing table
This example uses the to_sql() function to create a new table "sales_df1" based on the existing table "sales" in Teradata Vantage. The new table "sales_df1" includes only the columns "accounts", "Jan", and "Feb.
- Create a teradataml DataFrame "df" from the existing table "sales".
df = DataFrame("sales")
>>> df Feb Jan Mar Apr datetime accounts Blue Inc 90.0 50.0 95.0 101.0 04/01/2017 Alpha Co 210.0 200.0 215.0 250.0 04/01/2017 Jones LLC 200.0 150.0 140.0 180.0 04/01/2017 Yellow Inc 90.0 NaN NaN NaN 04/01/2017 Orange Inc 210.0 NaN NaN 250.0 04/01/2017 Red Inc 200.0 150.0 140.0 NaN 04/01/2017
- Create a new dataframe with only the columns "accounts", "Jan", "Feb".
>>> df = df.select(["accounts", "Jan", "Feb"])
>>> df accounts Jan Feb 0 Jones LLC 150.0 200.0 1 Blue Inc 50.0 90.0 2 Yellow Inc NaN 90.0 3 Orange Inc NaN 210.0 4 Alpha Co 200.0 210.0 5 Red Inc 150.0 200.0
- Creates a new table "sales_df1" in Vantage based on the new DataFrame. The table "sales_df1" is a NOPI table because the primary index was not specified.
>>> df.to_sql("sales_df1")
- Verify that the new table "sales_df1" exists in Vantage using the DataFrame() function which creates a DataFrame based on an existing table.
>>> df1 = DataFrame("sales_df1")
>>> df1 accounts Jan Feb 0 Blue Inc 50 90.0 1 Orange Inc None 210.0 2 Red Inc 150 200.0 3 Yellow Inc None 90.0 4 Jones LLC 150 200.0 5 Alpha Co 200 210.0
Example 2: Create a table using the optional parameter primary_index
This example uses the to_sql() function with the optional parameter "primary_index" to create a new table "sales_df2" in Vantage with the primary index "accounts". Use the optional parameter "if_exists" to replace the existing table "sales_df2" in Vantage
- Create a teradataml DataFrame "df" from the existing table "sales".
df = DataFrame("sales")
- Create a new dataframe with only the columns "accounts", "datetime".
>>> df = df.select(["accounts", "datetime"])
- Create a new table "sales_df2" with the primary index "accounts".
>>> df.to_sql("sales_df2", if_exists="replace", primary_index="accounts")
- Verify the new table by creating a new DataFrame from the new table.
>>> df2 = DataFrame("sales_df2")
>>> df2 datetime accounts Blue Inc 17/01/04 Orange Inc 17/01/04 Red Inc 17/01/04 Yellow Inc 17/01/04 Jones LLC 17/01/04 Alpha Co 17/01/04
Example 3: Use optional parameter "set_table" to create a SET table "sales_subset_set"
- Create a teradataml DataFrame "df" from the existing table "sales".
df = DataFrame("sales")
- Create a new dataframe with the columns "Apr", "datetime".
>>> df_for_set = df.select(['Apr','datetime'])
>>> df_for_set Apr datetime 0 101 04/01/2017 1 250 04/01/2017 2 180 04/01/2017 3 None 04/01/2017 4 250 04/01/2017 5 None 04/01/2017
- Create a SET table "sales_subset_set" using the optional parameter "set_table".
>>> df_for_set.to_sql('sales_subset_set', primary_index='Apr', set_table=True)
- Verify the new table by creating a new DataFrame from the new table.
>>> set_df = DataFrame('sales_subset_set')
>>> set_df datetime Apr 180 17/01/04 101 17/01/04 250 17/01/04 NaN 17/01/04
Example 4: Create a new Primary Time Index table "sales_pti".
- Create a teradataml DataFrame "df" from the existing table "sales".
df = DataFrame("sales")
- Create a new Primary Time Index table "sales_pti".
>>> df.to_sql('sales_pti', timecode_column='datetime', columns_list='accounts')
- Verify the new table by creating a new DataFrame from the new table.
>>> sales_pti= DataFrame('sales_pti')
>>> sales_pti TD_TIMECODE Feb Jan Mar Apr accounts Blue Inc 17/01/04 90.0 50 95 101 Orange Inc 17/01/04 210.0 None None 250 Red Inc 17/01/04 200.0 150 140 None Yellow Inc 17/01/04 90.0 None None None Jones LLC 17/01/04 200.0 150 140 180 Alpha Co 17/01/04 210.0 200 215 250