Teradata Package for Python Function Reference | 20.00 - to_pandas - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- Language
- English (United States)
- Last Update
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.to_pandas = to_pandas(self, index_column=None, num_rows=99999, all_rows=False, fastexport=False, catch_errors_warnings=False, **kwargs)
- DESCRIPTION:
Returns a Pandas DataFrame for the corresponding teradataml
DataFrame Object.
PARAMETERS:
index_column:
Optional Argument.
Specifies column(s) to be used as Pandas index.
When the argument is provided, the specified column is used as
the Pandas index. Otherwise, the teradataml DataFrame's index
(if exists) is used as the Pandas index or the primary index of
the table on Vantage is used as the Pandas index. The default
integer index is used if none of the above indexes exists.
Default Value: Integer index
Types: str OR list of Strings (str)
num_rows:
Optional Argument.
Specifies the number of rows to retrieve randomly from the DataFrame
while creating Pandas Dataframe.
Default Value: 99999
Types: int
Note:
This argument is ignored if "all_rows" is set to True.
all_rows:
Optional Argument.
Specifies whether all rows from teradataml DataFrame should be
retrieved while creating Pandas DataFrame.
Default Value: False
Types: bool
fastexport:
Optional Argument.
Specifies whether fastexport protocol should be used while
converting teradataml DataFrame to a Pandas DataFrame. If the
argument is set to True, fastexport wire protocol is used
internally for data transfer. By default, fastexport protocol will not be
used while converting teradataml DataFrame to a Pandas DataFrame.
When set to None, the approach is decided based on the number of rows
requested by the user for extraction.
If requested number of rows are greater than or equal to 100000,
then fastexport is used, otherwise regular mode is used for data
extraction.
Note:
1. Teradata recommends to use FastExport when number of rows
in teradataml DataFrame are atleast 100,000. To extract
lesser rows ignore this option and go with regular
approach. FastExport opens multiple data transfer connections
to the database.
2. FastExport does not support all Teradata Database data types.
For example, tables with BLOB and CLOB type columns cannot
be extracted.
3. FastExport cannot be used to extract data from a
volatile or temporary table.
4. For best efficiency, do not use DataFrame.groupby() and
DataFrame.sort() with FastExport.
For additional information about FastExport protocol through
teradatasql driver, please refer to FASTEXPORT section of
https://pypi.org/project/teradatasql/#FastExport driver documentation.
Default Value: False
Types: bool
catch_errors_warnings:
Optional Argument.
Specifies whether to catch errors/warnings(if any) raised by
fastexport protocol while converting teradataml DataFrame to
Pandas DataFrame. When this is set to True and fastexport is used,
to_pandas() returns a tuple containing:
a. Pandas DataFrame.
b. Errors(if any) in a list thrown by fastexport.
c. Warnings(if any) in a list thrown by fastexport.
When set to False and fastexport is used, prints the fastexport
errors/warnings to the standard output, if there are any.
Note:
This argument is ignored if "fastexport" is set to False.
Default Value: False
Types: bool
kwargs:
Optional Argument.
Specifies keyword arguments. Arguments "coerce_float"
"parse_dates" and "open_sessions" can be passed as keyword arguments.
* "coerce_float" specifies a boolean to for attempting to
convert non-string, non-numeric objects to floating point.
* "parse_dates" specifies columns to parse as dates.
* "open_sessions" specifies the number of Teradata data transfer
sessions to be opened for fastexport. This argument is only applicable
in fastexport mode.
* Function returns the pandas dataframe with Decimal columns types as float instead of object.
If user want datatype to be object, set argument "coerce_float" to False.
Notes:
1. For additional information about "coerce_float" and
"parse_date" arguments please refer to:
https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html
2. If "open_sessions" argument is not provided, the default value
is the smaller of 8 or the number of AMPs avaialble.
For additional information about number of Teradata data-transfer
sessions opened during fastexport, please refer to:
https://pypi.org/project/teradatasql/#FastExport
RETURNS:
When "catch_errors_warnings" is set to True and if protocol used for
data transfer is fastexport, then the function returns a tuple
containing:
a. Pandas DataFrame.
b. Errors, if any, thrown by fastexport in a list of strings.
c. Warnings, if any, thrown by fastexport in a list of strings.
Only Pandas DataFrame otherwise.
Note:
Column types of the resulting Pandas DataFrame depends on
pandas.from_records().
RAISES:
TeradataMlException
EXAMPLES:
Teradata supports the following formats:
A] No parameter(s): df.to_pandas()
B] Single index_column parameter: df.to_pandas(index_column = "col1")
C] Multiple index_column (list) parameters:
df.to_pandas(index_column = ['col1', 'col2'])
D] Only num_rows parameter specified: df.to_pandas(num_rows = 100)
E] Both index_column & num_rows specified:
df.to_pandas(index_column = 'col1', num_rows = 100)
F] Only all_rows parameter specified: df.to_pandas(all_rows = True)
Column names ("col1", "col2"..) are strings representing Teradata
Vantage table Columns. It supports all standard Teradata data types
for columns: INTEGER, VARCHAR(5), FLOAT etc.
df is a Teradata DataFrame object: df = DataFrame.from_table('admissions_train')
>>> load_example_data("dataframe","admissions_train")
>>> df = DataFrame("admissions_train")
>>> df
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
37 no 3.52 Novice Novice 1
35 no 3.68 Novice Beginner 1
12 no 3.65 Novice Novice 1
4 yes 3.50 Beginner Novice 1
38 yes 2.65 Advanced Beginner 1
27 yes 3.96 Advanced Advanced 0
39 yes 3.75 Advanced Beginner 0
7 yes 2.33 Novice Novice 1
40 yes 3.95 Novice Beginner 0
>>> pandas_df = df.to_pandas()
>>> pandas_df
masters gpa stats programming admitted
id
15 yes 4.00 Advanced Advanced 1
14 yes 3.45 Advanced Advanced 0
31 yes 3.50 Advanced Beginner 1
29 yes 4.00 Novice Beginner 0
23 yes 3.59 Advanced Novice 1
21 no 3.87 Novice Beginner 1
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
32 yes 3.46 Advanced Beginner 0
11 no 3.13 Advanced Advanced 1
...
>>> pandas_df = df.to_pandas(index_column = 'id')
>>> pandas_df
masters gpa stats programming admitted
id
15 yes 4.00 Advanced Advanced 1
14 yes 3.45 Advanced Advanced 0
31 yes 3.50 Advanced Beginner 1
29 yes 4.00 Novice Beginner 0
23 yes 3.59 Advanced Novice 1
21 no 3.87 Novice Beginner 1
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
32 yes 3.46 Advanced Beginner 0
11 no 3.13 Advanced Advanced 1
28 no 3.93 Advanced Advanced 1
...
>>> pandas_df = df.to_pandas(index_column = 'gpa')
>>> pandas_df
id masters stats programming admitted
gpa
4.00 15 yes Advanced Advanced 1
3.45 14 yes Advanced Advanced 0
3.50 31 yes Advanced Beginner 1
4.00 29 yes Novice Beginner 0
3.59 23 yes Advanced Novice 1
3.87 21 no Novice Beginner 1
3.83 17 no Advanced Advanced 1
3.85 34 yes Advanced Beginner 0
4.00 13 no Advanced Novice 1
3.46 32 yes Advanced Beginner 0
3.13 11 no Advanced Advanced 1
3.93 28 no Advanced Advanced 1
...
>>> pandas_df = df.to_pandas(index_column = ['masters', 'gpa'])
>>> pandas_df
id stats programming admitted
masters gpa
yes 4.00 15 Advanced Advanced 1
3.45 14 Advanced Advanced 0
3.50 31 Advanced Beginner 1
4.00 29 Novice Beginner 0
3.59 23 Advanced Novice 1
no 3.87 21 Novice Beginner 1
3.83 17 Advanced Advanced 1
yes 3.85 34 Advanced Beginner 0
no 4.00 13 Advanced Novice 1
yes 3.46 32 Advanced Beginner 0
no 3.13 11 Advanced Advanced 1
3.93 28 Advanced Advanced 1
...
>>> pandas_df = df.to_pandas(index_column = 'gpa', num_rows = 3)
>>> pandas_df
id masters stats programming admitted
gpa
3.46 22 yes Novice Beginner 0
2.33 7 yes Novice Novice 1
3.95 40 yes Novice Beginner 0
>>> pandas_df = df.to_pandas(all_rows = True)
>>> pandas_df
masters gpa stats programming admitted
id
15 yes 4.00 Advanced Advanced 1
14 yes 3.45 Advanced Advanced 0
31 yes 3.50 Advanced Beginner 1
29 yes 4.00 Novice Beginner 0
23 yes 3.59 Advanced Novice 1
21 no 3.87 Novice Beginner 1
17 no 3.83 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
32 yes 3.46 Advanced Beginner 0
11 no 3.13 Advanced Advanced 1
...
# Convert teradataml DataFrame to pandas DataFrame using fastexport.
# Prints errors/warnings if any on to the screen as catch_errors_warnings
# argument is not set.
>>> pandas_df = df.to_pandas(fastexport = True)
Errors: []
Warnings: []
>>> pandas_df
masters gpa stats programming admitted
id
38 yes 2.65 Advanced Beginner 1
26 yes 3.57 Advanced Advanced 1
5 no 3.44 Novice Novice 0
24 no 1.87 Advanced Novice 1
3 no 3.70 Novice Beginner 1
1 yes 3.95 Beginner Beginner 0
20 yes 3.90 Advanced Advanced 1
18 yes 3.81 Advanced Advanced 1
8 no 3.60 Beginner Advanced 1
25 no 3.96 Advanced Advanced 1
2 yes 3.76 Beginner Beginner 0
...
# Convert teradataml DataFrame to pandas DataFrame using fastexport
# also catch warnings/errors if any raised by fastexport. Returns
# a tuple.
>>> pandas_df, err, warn = df.to_pandas(fastexport = True,
catch_errors_warnings = True)
# Print pandas df.
>>> pandas_df
masters gpa stats programming admitted
id
38 yes 2.65 Advanced Beginner 1
26 yes 3.57 Advanced Advanced 1
5 no 3.44 Novice Novice 0
24 no 1.87 Advanced Novice 1
3 no 3.70 Novice Beginner 1
1 yes 3.95 Beginner Beginner 0
20 yes 3.90 Advanced Advanced 1
18 yes 3.81 Advanced Advanced 1
8 no 3.60 Beginner Advanced 1
25 no 3.96 Advanced Advanced 1
2 yes 3.76 Beginner Beginner 0
17 no 3.83 Advanced Advanced 1
...
# Print errors list.
>>> err
[]
# Print warnings list.
>>> warn
[]
# Convert teradataml DataFrame to pandas DataFrame without
# fastexport.
>>> pandas_df = df.to_pandas(fastexport = False)
>>> pandas_df
masters gpa stats programming admitted
id
38 yes 2.65 Advanced Beginner 1
26 yes 3.57 Advanced Advanced 1
5 no 3.44 Novice Novice 0
24 no 1.87 Advanced Novice 1
3 no 3.70 Novice Beginner 1
1 yes 3.95 Beginner Beginner 0
20 yes 3.90 Advanced Advanced 1
18 yes 3.81 Advanced Advanced 1
8 no 3.60 Beginner Advanced 1
25 no 3.96 Advanced Advanced 1
2 yes 3.76 Beginner Beginner 0
...
# Convert teradataml DataFrame to pandas DataFrame using fastexport
# by opening 2 Teradata data transfer sessiosns.
>>> pandas_df = df.to_pandas(fastexport = True, open_sessions = 2)
Errors: []
Warnings: []
>>> pandas_df
masters gpa stats programming admitted
id
38 yes 2.65 Advanced Beginner 1
26 yes 3.57 Advanced Advanced 1
5 no 3.44 Novice Novice 0
24 no 1.87 Advanced Novice 1
3 no 3.70 Novice Beginner 1
1 yes 3.95 Beginner Beginner 0
20 yes 3.90 Advanced Advanced 1
18 yes 3.81 Advanced Advanced 1
8 no 3.60 Beginner Advanced 1
25 no 3.96 Advanced Advanced 1
2 yes 3.76 Beginner Beginner 0
...