to_pandas() Method | Teradata Python Package - to_pandas() Method - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Published
November 2021
Language
English (United States)
Last Update
2022-01-14
dita:mapPath
bol1585763678431.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

The to_pandas() function creates a pandas DataFrame from a teradataml DataFrame.

Column types of the resulting Pandas DataFrame depends on pandas.read_sql_query().

Optional arguments:

  • index_column specifies column name or List of column names representing columns to be used as Pandas index.
    When the optional parameter index_column is provided, the specified column is used as the Pandas index. Otherwise, the index (if exists) of the teradataml DataFrame or the primary index of the database table is used as the Pandas index. The default integer index is used if none of the above indexes exists.
  • all_rows specifies whether all rows from the teradataml DataFrame is retrieved while creating a pandas DataFrame. The default is False.
  • num_rows specifies the number of rows to retrieve from DataFrame while creating Pandas DataFrame. The default is 99999.
    This argument is ignored if all_rows is set to True.
  • fastexport specifies whether FastExport protocol should be used while converting a teradataml DataFrame to a Pandas DataFrame.

    If the argument is set to True, FastExport wire protocol is used internally for data transfer.

    By default, approach is decided based on the number of rows requested by the user for extraction:
    • If requested number of rows is greater than or equal to 100,000, then FastExport is used.
    • If requested number of rows is less than 100,000, regular mode is used for data extraction.
    • Teradata recommends using FastExport when number of rows in teradataml DataFrame is at least 100,000.

      To extract lesser number of rows, ignore this option and go with regular approach. Because FastExport opens multiple data transfer connections to the database, which is time consuming.

    • FastExport does not support all data types in Vantage.

      For example, tables with BLOB and CLOB type columns cannot be extracted.

    • FastExport cannot be used to extract data from a volatile or temporary table.
    • For best efficiency, do not use DataFrame.groupby() and DataFrame.sort() with FastExport.

    See the FastExport section of https://pypi.org/project/teradatasql/ for more information about FastExport protocol through teradatasql driver.

  • catch_errors_warnings specifies whether to catch errors and warnings, if any, raised by FastExport protocol while converting a teradataml DataFrame to a Pandas DataFrame.
    This argument is ignored if fastexport argument is set to FALSE.
    When FastExport is used and this argument is set to True, to_pandas() returns a tuple containing:
    • Pandas DataFrame
    • Errors, if any, in a list thrown by FastExport
    • Warnings, if any, in a list thrown by FastExport

    When FastExport is used and this argument is set to False (the default value), to_pandas() prints the FastExport errors and warnings, if any, to the standard output.

  • kwargs specifies keyword arguments.
    Arguments coerce_float and parse_dates can be passed as keyword arguments.
    • coerce_float specifies a boolean to attempt to convert non-string, non-numeric objects to floating point.
    • parse_dates specifies columns to parse as dates.
    See https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html for more information about the coerce_float and parse_dates arguments.

Examples Prerequisite

Assume a teradataml DataFrame "df" is created from a Vantage table "sales", using command:
df = DataFrame("sales")

Example 1: Create a pandas DataFrame without specifying index

>>> pandas_df = df.to_pandas()
>>> pandas_df
            Feb   Jan   Mar   Apr    datetime
accounts
Alpha Co    210   200   215   250  2017-04-01
Blue Inc     90    50    95   101  2017-04-01
Yellow Inc   90  None  None  None  2017-04-01
Jones LLC   200   150   140   180  2017-04-01
Red Inc     200   150   140  None  2017-04-01
Orange Inc  210  None  None   250  2017-04-01

Example 2: Create a pandas DataFrame using index_column to set the index to "Feb"

>>> pandas_df = df.to_pandas(index_column = 'Feb')
>>> pandas_df
       accounts   Jan   Mar   Apr    datetime
Feb
210    Alpha Co   200   215   250  2017-04-01
90     Blue Inc    50    95   101  2017-04-01
90   Yellow Inc  None  None  None  2017-04-01
200   Jones LLC   150   140   180  2017-04-01
200     Red Inc   150   140  None  2017-04-01
210  Orange Inc  None  None   250  2017-04-01

Example 3: Create a pandas DataFrame using a list of column names for a multicolumn index

>>> pandas_df = df.to_pandas(index_column = ['accounts', 'Feb'])
>>> pandas_df
                 Jan   Mar   Apr    datetime
accounts   Feb
Yellow Inc 90   None  None  None  2017-04-01
Alpha Co   210   200   215   250  2017-04-01
Jones LLC  200   150   140   180  2017-04-01
Orange Inc 210  None  None   250  2017-04-01
Blue Inc   90     50    95   101  2017-04-01
Red Inc    200   150   140  None  2017-04-01

Example 4: Create a pandas DataFrame using num_rows to limit the number of rows to 3

>>> pandas_df = df.to_pandas(index_column = 'Feb', num_rows = 3)
>>> pandas_df
         accounts    Jan    Mar    Apr    datetime
Feb                                              
90.0   Yellow Inc    NaN    NaN    NaN  2017-01-04
90.0     Blue Inc   50.0   95.0  101.0  2017-01-04
200.0     Red Inc  150.0  140.0    NaN  2017-01-04

Example 5: Create a pandas DataFrame using all_rows

>>> pandas_df = df.to_pandas(all_rows = True)
>>> pandas_df
              Feb    Jan    Mar    Apr    datetime
accounts
Red Inc     200.0  150.0  140.0    NaN  2017-01-04
Orange Inc  210.0    NaN    NaN  250.0  2017-01-04
Blue Inc     90.0   50.0   95.0  101.0  2017-01-04
Alpha Co    210.0  200.0  215.0  250.0  2017-01-04
Yellow Inc   90.0    NaN    NaN    NaN  2017-01-04
Jones LLC   200.0  150.0  140.0  180.0  2017-01-04

Example 6: Convert teradataml DataFrame to pandas DataFrame using fastexport, printing errors, if any, on screen

This example prints errors and warnings, if any, on to the screen, as catch_errors_warnings argument is not set.

>>> pandas_df = df.to_pandas(fastexport = True)
Errors: []
Warnings: []
>>> pandas_df
              Feb    Jan    Mar    Apr    datetime
accounts
Red Inc     200.0  150.0  140.0    NaN  2017-01-04
Orange Inc  210.0    NaN    NaN  250.0  2017-01-04
Blue Inc     90.0   50.0   95.0  101.0  2017-01-04
Yellow Inc   90.0    NaN    NaN    NaN  2017-01-04
Alpha Co    210.0  200.0  215.0  250.0  2017-01-04
Jones LLC   200.0  150.0  140.0  180.0  2017-01-04

Example 7: Convert teradataml DataFrame to pandas DataFrame using fastexport, catching errors, if any

This example catches errors and warnings, if any, raised by fastexport, and returns a tuple.

>>> pandas_df, err, warn = df.to_pandas(fastexport = True, catch_errors_warnings = True)
# Print pandas df.
>>> pandas_df
              Feb    Jan    Mar    Apr    datetime
accounts
Red Inc     200.0  150.0  140.0    NaN  2017-01-04
Orange Inc  210.0    NaN    NaN  250.0  2017-01-04
Blue Inc     90.0   50.0   95.0  101.0  2017-01-04
Yellow Inc   90.0    NaN    NaN    NaN  2017-01-04
Alpha Co    210.0  200.0  215.0  250.0  2017-01-04
Jones LLC   200.0  150.0  140.0  180.0  2017-01-04
# Print errors list.
>>> err
[]
# Print warnings list.
>>> warn
[]