DataFrame Constructor | Teradata Python Package - DataFrame Constructor - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2026-02-20
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

Use the DataFrame() function to create a teradataml DataFrame from the input data.

Optional Parameters

data
Specifies the input data to create a teradataml DataFrame.
  • If a dictionary is provided, it must follow the below requirements:
    • Keys must be strings (column names).
    • Values must be lists of equal length (column data).
    • Nested dictionaries are not supported.
  • If the first row of an array column contains an empty list, the column type defaults to ARRAY_VARCHAR with scope 100.
index
If "data" is a string, then the argument specifies whether to use the index column for sorting or not.

If "data" is a pandas DataFrame, then this argument specifies whether to save Pandas DataFrame index as a column or not.

index_label
If "data" is a string, then the argument specifies column(s) used for sorting.

If "data" is a pandas DataFrame, then the default behavior is applied.

  • If the index label is not specified for a table name, the primary index of the base table is used as the index label.
  • If the index label is not specified for a view name, the index label is set to None.
  • Refer to the "index_label" parameter of copy_to_sql() for details on the default behavior.
query
Specifies the SQL query for this DataFrame.

Used by class method from_query.

materialize
Specifies whether to materialize DataFrame or not when created.

Used by class method from_query.

Use materialization when the query passed to from_query(), is expected to produce non-deterministic results, when it is executed multiple times. Using this option will help user to have deterministic results in the resulting teradataml DataFrame.

Default value: False (No materialization)

kwargs
table_name
Specifies the table name or view name in Vantage referenced by this DataFrame.
If "data" and "table_name" are both specified, then the "table_name" argument is ignored.
primary_index
Specifies which columns to use as primary index for the teradataml DataFrame.
If "data" and "table_name" are both specified, then the "table_name" argument is ignored.
types
Specifies required data types for requested columns to be saved in Vantage.
  • This argument is not applicable when "data" argument is of type str or in_schema.
  • Refer to the "types" parameter of copy_to_sql() for more details.
columns
Specifies the names of the columns to be used in the DataFrame.
  • This argument is not applicable when "data" argument is of type str or in_schema.
  • If "data" is a dictionary and this argument is specified, only the specified columns will be included in the DataFrame if the dictionary contains those keys. If the dictionary does not contain the specified keys, those columns will be added with NaN values.
persist
Specifies whether to persist the DataFrame.
This argument is only applicable when the "data" argument is of type dict, list or pandas DataFrame.

Default value: false

Example setup

teradataml DataFrame examples documented in this guide required certain datasets to be loaded. Use the below commands to load them. More details about 'load_example_data()' utility can be found at: load_example_data()

Load the example datasets for the examples.

>>> from teradataml import load_example_data
>>> load_example_data("dataframe", ["scale_housing_test", "employee_info", "sales", "admissions_train", "join_table1", "join_table2", "iris_test"])

Create a context using your database host and user credentials.

>>> from teradataml.context.context import *
>>> from teradataml.dataframe.dataframe import DataFrame
>>> create_context(host = "myhostname", username="myusername", password = "mypassword")

Example 1: Create a DataFrame from an existing table "sales" in the database

The index label is not specified, the primary index "accounts" is used.

>>> df = DataFrame("sales")
>>> df
              Feb   Jan   Mar   Apr  datetime
accounts                                    
Alpha Co    210.0   200   215   250  04/01/2017
Red Inc     200.0   150   140  None  04/01/2017
Orange Inc  210.0  None  None   250  04/01/2017
Jones LLC   200.0   150   140   180  04/01/2017
Yellow Inc   90.0  None  None  None  04/01/2017
Blue Inc     90.0    50    95   101  04/01/2017
>>>

Example 2: Create a DataFrame from an existing table "sales" in the database with an index label "Feb"

>>> df = DataFrame("sales", index_label="Feb")
>>> df
         accounts   Jan   Mar   Apr  datetime
Feb                                         
210.0    Alpha Co   200   215   250  04/01/2017
200.0     Red Inc   150   140  None  04/01/2017
210.0  Orange Inc  None  None   250  04/01/2017
200.0   Jones LLC   150   140   180  04/01/2017
90.0   Yellow Inc  None  None  None  04/01/2017
90.0     Blue Inc    50    95   101  04/01/2017

Example 3: Create a DataFrame from an existing table "sales" in the database with an index label "Jan" and "Feb"

>>> df = DataFrame("sales", index_label=["Jan", "Feb"])
>>> df
             accounts   Mar   Apr  datetime
Jan Feb                                   
200 210.0    Alpha Co   215   250  04/01/2017
150 200.0     Red Inc   140  None  04/01/2017
NaN 210.0  Orange Inc  None   250  04/01/2017
150 200.0   Jones LLC   140   180  04/01/2017
NaN 90.0   Yellow Inc  None  None  04/01/2017
50  90.0     Blue Inc    95   101  04/01/2017

Example 4: Creates a DataFrame from an existing view "salesv" in the database with an index_label "Mar"

You must create a view on sales table at the backend with name 'salesv' to use this example.

>>> get_context().execute("CREATE VIEW salesv AS SELECT * FROM sales")
<sqlalchemy.engine.result.ResultProxy object at 0x11bbc3668>
>>> df = DataFrame("salesv", index_label="Mar")
>>> df
       accounts    Feb   Jan   Apr  datetime
Mar                                        
95     Blue Inc   90.0    50   101  04/01/2017
NaN  Orange Inc  210.0  None   250  04/01/2017
140     Red Inc  200.0   150  None  04/01/2017
NaN  Yellow Inc   90.0  None  None  04/01/2017
140   Jones LLC  200.0   150   180  04/01/2017
215    Alpha Co  210.0   200   250  04/01/2017

Example 5: Create a teradataml DataFrame from pandas DataFrame

>>> import pandas as pd
>>> pdf = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
>>> df = DataFrame(pdf)
>>> df
    col1 col2 index_label
0      3    6           2
1      2    5           1
2      1    4           0

Example 6: Create a teradataml DataFrame from a pandas DataFrame without index column

>>> import pandas as pd
>>> pdf = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
>>> df = DataFrame(data=pdf, index=False)
>>> df
    col1 col2
0      3    6
1      2    5
2      1    4

Example 7: Create a teradataml DataFrame from a pandas DataFrame with index label and primary index as 'id'

>>> import pandas as pd
>>> df = DataFrame(pdf, index=True, index_label='id', primary_index='id')
>>> df
    col1 col2
id
2      3    6
1      2    5
0      1    4

Example 8: Create a teradataml DataFrame from a pandas DataFrame with index label and primary index as 'id'

>>> pdf = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
>>> df = DataFrame(pdf, index=True, index_label='id', primary_index='id')
>>> df
    col1 col2
id
2      3    6
1      2    5
0      1    4

Example 9: Create a teradataml DataFrame from list of lists

>>> df = DataFrame([[1, 2], [3, 4]])
>>> df
    col_0 col_1 index_label
0       3     4           1
1       1     2           0

Example 10: Create a teradataml DataFrame from numpy array

>>> import numpy as np
>>> df = DataFrame(np.array([[1, 2], [3, 4]]), index=True, index_label="id")
>>> df
    col_0 col_1
id
1       3     4
0       1     2

Example 11: Create a teradataml DataFrame from a dictionary

>>> df = DataFrame({"col1": [1, 2], "col2": [3, 4]}, index=True, index_label="id")
>>> df
    col_0 col_1
id
1       2     4
0       1     3

Example 12: Create a teradataml DataFrame from list of dictionaries

>>> df = DataFrame({"col1": [1, 2], "col2": [3, 4]}, index=True, index_label="id")
>>> df
    col_0 col_1
id
1       2     4
0       1     3

Example 13: Create a teradataml DataFrame from list of tuples

>>> df = DataFrame([("Alice", 1), ("Bob", 2)])
>>> df
    col_0 col_1 index_label
0   Alice     1           1
1     Bob     2           0

Example 14: Create a teradataml DataFrame from a numpy arrays

>>> import numpy as np
>>> import pandas as pd
>>> pdf = pd.DataFrame({
...     'id': [1, 2],
...     'values': [np.array([1, 2, 3]), np.array([4, 5, 6])],
...     'tags': [np.array(['a', 'b', 'c']), np.array(['x', 'y', 'z'])]
... })
>>> df = DataFrame(pdf)
>>> df

Output

    id   values           tags  index_label
0    2  (4,5,6)  ('x','y','z')            1
1    1  (1,2,3)  ('a','b','c')            0