Use the DataFrame() function to create a teradataml DataFrame from the input data.
Optional Parameters
- data
- Specifies the input data to create a teradataml DataFrame.
- If a dictionary is provided, it must follow the below requirements:
- Keys must be strings (column names).
- Values must be lists of equal length (column data).
- Nested dictionaries are not supported.
- If the first row of an array column contains an empty list, the column type defaults to ARRAY_VARCHAR with scope 100.
- If a dictionary is provided, it must follow the below requirements:
- index
- If "data" is a string, then the argument specifies whether to use the index column for sorting or not.
If "data" is a pandas DataFrame, then this argument specifies whether to save Pandas DataFrame index as a column or not.
- index_label
- If "data" is a string, then the argument specifies column(s) used for sorting.
If "data" is a pandas DataFrame, then the default behavior is applied.
- If the index label is not specified for a table name, the primary index of the base table is used as the index label.
- If the index label is not specified for a view name, the index label is set to None.
- Refer to the "index_label" parameter of copy_to_sql() for details on the default behavior.
- query
- Specifies the SQL query for this DataFrame.
Used by class method from_query.
- materialize
- Specifies whether to materialize DataFrame or not when created.
Used by class method from_query.
Use materialization when the query passed to from_query(), is expected to produce non-deterministic results, when it is executed multiple times. Using this option will help user to have deterministic results in the resulting teradataml DataFrame.
Default value: False (No materialization)
- kwargs
- table_name
- Specifies the table name or view name in Vantage referenced by this DataFrame.If "data" and "table_name" are both specified, then the "table_name" argument is ignored.
- primary_index
- Specifies which columns to use as primary index for the teradataml DataFrame.If "data" and "table_name" are both specified, then the "table_name" argument is ignored.
- types
- Specifies required data types for requested columns to be saved in Vantage.
- This argument is not applicable when "data" argument is of type str or in_schema.
- Refer to the "types" parameter of copy_to_sql() for more details.
- columns
- Specifies the names of the columns to be used in the DataFrame.
- This argument is not applicable when "data" argument is of type str or in_schema.
- If "data" is a dictionary and this argument is specified, only the specified columns will be included in the DataFrame if the dictionary contains those keys. If the dictionary does not contain the specified keys, those columns will be added with NaN values.
- persist
- Specifies whether to persist the DataFrame.This argument is only applicable when the "data" argument is of type dict, list or pandas DataFrame.
Default value: false
Example setup
Load the example datasets for the examples.
>>> from teradataml import load_example_data
>>> load_example_data("dataframe", ["scale_housing_test", "employee_info", "sales", "admissions_train", "join_table1", "join_table2", "iris_test"])
Create a context using your database host and user credentials.
>>> from teradataml.context.context import * >>> from teradataml.dataframe.dataframe import DataFrame
>>> create_context(host = "myhostname", username="myusername", password = "mypassword")
Example 1: Create a DataFrame from an existing table "sales" in the database
The index label is not specified, the primary index "accounts" is used.
>>> df = DataFrame("sales")
>>> df
Feb Jan Mar Apr datetime
accounts
Alpha Co 210.0 200 215 250 04/01/2017
Red Inc 200.0 150 140 None 04/01/2017
Orange Inc 210.0 None None 250 04/01/2017
Jones LLC 200.0 150 140 180 04/01/2017
Yellow Inc 90.0 None None None 04/01/2017
Blue Inc 90.0 50 95 101 04/01/2017
>>>
Example 2: Create a DataFrame from an existing table "sales" in the database with an index label "Feb"
>>> df = DataFrame("sales", index_label="Feb")
>>> df
accounts Jan Mar Apr datetime
Feb
210.0 Alpha Co 200 215 250 04/01/2017
200.0 Red Inc 150 140 None 04/01/2017
210.0 Orange Inc None None 250 04/01/2017
200.0 Jones LLC 150 140 180 04/01/2017
90.0 Yellow Inc None None None 04/01/2017
90.0 Blue Inc 50 95 101 04/01/2017
Example 3: Create a DataFrame from an existing table "sales" in the database with an index label "Jan" and "Feb"
>>> df = DataFrame("sales", index_label=["Jan", "Feb"])
>>> df
accounts Mar Apr datetime
Jan Feb
200 210.0 Alpha Co 215 250 04/01/2017
150 200.0 Red Inc 140 None 04/01/2017
NaN 210.0 Orange Inc None 250 04/01/2017
150 200.0 Jones LLC 140 180 04/01/2017
NaN 90.0 Yellow Inc None None 04/01/2017
50 90.0 Blue Inc 95 101 04/01/2017
Example 4: Creates a DataFrame from an existing view "salesv" in the database with an index_label "Mar"
You must create a view on sales table at the backend with name 'salesv' to use this example.
>>> get_context().execute("CREATE VIEW salesv AS SELECT * FROM sales")
<sqlalchemy.engine.result.ResultProxy object at 0x11bbc3668>
>>> df = DataFrame("salesv", index_label="Mar")
>>> df
accounts Feb Jan Apr datetime
Mar
95 Blue Inc 90.0 50 101 04/01/2017
NaN Orange Inc 210.0 None 250 04/01/2017
140 Red Inc 200.0 150 None 04/01/2017
NaN Yellow Inc 90.0 None None 04/01/2017
140 Jones LLC 200.0 150 180 04/01/2017
215 Alpha Co 210.0 200 250 04/01/2017
Example 5: Create a teradataml DataFrame from pandas DataFrame
>>> import pandas as pd
>>> pdf = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
>>> df = DataFrame(pdf)
>>> df
col1 col2 index_label
0 3 6 2
1 2 5 1
2 1 4 0
Example 6: Create a teradataml DataFrame from a pandas DataFrame without index column
>>> import pandas as pd
>>> pdf = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
>>> df = DataFrame(data=pdf, index=False)
>>> df
col1 col2
0 3 6
1 2 5
2 1 4
Example 7: Create a teradataml DataFrame from a pandas DataFrame with index label and primary index as 'id'
>>> import pandas as pd
>>> df = DataFrame(pdf, index=True, index_label='id', primary_index='id')
>>> df
col1 col2
id
2 3 6
1 2 5
0 1 4
Example 8: Create a teradataml DataFrame from a pandas DataFrame with index label and primary index as 'id'
>>> pdf = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
>>> df = DataFrame(pdf, index=True, index_label='id', primary_index='id')
>>> df
col1 col2
id
2 3 6
1 2 5
0 1 4
Example 9: Create a teradataml DataFrame from list of lists
>>> df = DataFrame([[1, 2], [3, 4]])
>>> df
col_0 col_1 index_label
0 3 4 1
1 1 2 0
Example 10: Create a teradataml DataFrame from numpy array
>>> import numpy as np
>>> df = DataFrame(np.array([[1, 2], [3, 4]]), index=True, index_label="id")
>>> df
col_0 col_1
id
1 3 4
0 1 2
Example 11: Create a teradataml DataFrame from a dictionary
>>> df = DataFrame({"col1": [1, 2], "col2": [3, 4]}, index=True, index_label="id")
>>> df
col_0 col_1
id
1 2 4
0 1 3
Example 12: Create a teradataml DataFrame from list of dictionaries
>>> df = DataFrame({"col1": [1, 2], "col2": [3, 4]}, index=True, index_label="id")
>>> df
col_0 col_1
id
1 2 4
0 1 3
Example 13: Create a teradataml DataFrame from list of tuples
>>> df = DataFrame([("Alice", 1), ("Bob", 2)])
>>> df
col_0 col_1 index_label
0 Alice 1 1
1 Bob 2 0
Example 14: Create a teradataml DataFrame from a numpy arrays
>>> import numpy as np >>> import pandas as pd
>>> pdf = pd.DataFrame({
... 'id': [1, 2],
... 'values': [np.array([1, 2, 3]), np.array([4, 5, 6])],
... 'tags': [np.array(['a', 'b', 'c']), np.array(['x', 'y', 'z'])]
... })
>>> df = DataFrame(pdf) >>> df
Output
id values tags index_label
0 2 (4,5,6) ('x','y','z') 1
1 1 (1,2,3) ('a','b','c') 0