Use the to_pandas() function to creates pandas DataFrame from a teradataml DataFrame.
Returns the pandas dataframe with Decimal column types as float instead of object. If user want to datatype to be object set argument "coerce_float" to False.
Optional arguments:
- index_column specifies column name or List of column names representing columns to be used as Pandas index.When the optional parameter index_column is provided, the specified column is used as the Pandas index. Otherwise, the index (if exists) of the teradataml DataFrame or the primary index of the database table is used as the Pandas index. The default integer index is used if none of these indexes exists.
- all_rows specifies whether all rows from the teradataml DataFrame is retrieved while creating a pandas DataFrame. The default is False.
- num_rows specifies the number of rows to retrieve randomly from the teradataml DataFrame while creating pandas DataFrame. The default is 99999.This argument is ignored if all_rows is set to True.
- fastexport specifies whether FastExport protocol must be used while converting a teradataml DataFrame to a Pandas DataFrame. Possible values for this argument:
- The default value is False. Which means, by default, FastExport protocol is not used while converting teradataml DataFrame to a Pandas DataFrame.
- If the argument is set to True, FastExport protocol is used internally for data transfer.
- If the argument is set to None, the approach is decided based on the number of rows requested by the user for extraction:
- If requested number of rows is greater than or equal to 100,000, then FastExport is used.
- If requested number of rows is less than 100,000, regular mode is used for data extraction.
- Teradata recommends using FastExport when number of rows in teradataml DataFrame is at least 100,000.
To extract lesser number of rows, ignore this option and go with regular approach. Because FastExport opens multiple data transfer connections to the database, which is time consuming.
- FastExport does not support all data types in Vantage.
For example, tables with BLOB and CLOB type columns cannot be extracted.
- FastExport cannot be used to extract data from a volatile or temporary table.
- For best efficiency, do not use DataFrame.groupby() and DataFrame.sort() with FastExport.
See the FastExport section of https://pypi.org/project/teradatasql/ for more information about FastExport protocol through teradatasql driver.
- catch_errors_warnings specifies whether to catch errors and warnings, if any, raised by FastExport protocol while converting a teradataml DataFrame to a Pandas DataFrame. This argument is ignored if fastexport argument is set to FALSE.When FastExport is used and this argument is set to True, to_pandas() returns a tuple containing:
- Pandas DataFrame
- Errors, if any, in a list thrown by FastExport
- Warnings, if any, in a list thrown by FastExport
When FastExport is used and this argument is set to False (the default value), to_pandas() prints the FastExport errors and warnings, if any, to the standard output.
- kwargs specifies keyword arguments. Arguments coerce_float, parse_dates, and open_sessions can be passed as keyword arguments.
- coerce_float specifies a boolean to attempt to convert non-string, non-numeric objects to floating point.
- parse_dates specifies columns to parse as dates.
- open_sessions specifies the number of Teradata data transfer sessions to be opened for fastexport. This argument is only applicable in fastexport mode.If open_sessions argument is not set, by default, the number of data transfer sessions opened by teradataml is the smaller of 8 and the number of available AMPs in Vantage.
See the FastExport section of https://pypi.org/project/teradatasql/ for more information about number of data transfer session opened during fastexport.
See https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html for more information about the coerce_float and parse_dates arguments.
Example Setup
df = DataFrame("sales")
Example 1: Create a pandas DataFrame without specifying index
>>> pandas_df = df.to_pandas()
>>> pandas_df Feb Jan Mar Apr datetime accounts Alpha Co 210 200 215 250 2017-04-01 Blue Inc 90 50 95 101 2017-04-01 Yellow Inc 90 None None None 2017-04-01 Jones LLC 200 150 140 180 2017-04-01 Red Inc 200 150 140 None 2017-04-01 Orange Inc 210 None None 250 2017-04-01
Example 2: Create a pandas DataFrame using index_column to set the index to "Feb"
>>> pandas_df = df.to_pandas(index_column = 'Feb')
>>> pandas_df accounts Jan Mar Apr datetime Feb 210 Alpha Co 200 215 250 2017-04-01 90 Blue Inc 50 95 101 2017-04-01 90 Yellow Inc None None None 2017-04-01 200 Jones LLC 150 140 180 2017-04-01 200 Red Inc 150 140 None 2017-04-01 210 Orange Inc None None 250 2017-04-01
Example 3: Create a pandas DataFrame using a list of column names for a multicolumn index
>>> pandas_df = df.to_pandas(index_column = ['accounts', 'Feb'])
>>> pandas_df Jan Mar Apr datetime accounts Feb Yellow Inc 90 None None None 2017-04-01 Alpha Co 210 200 215 250 2017-04-01 Jones LLC 200 150 140 180 2017-04-01 Orange Inc 210 None None 250 2017-04-01 Blue Inc 90 50 95 101 2017-04-01 Red Inc 200 150 140 None 2017-04-01
Example 4: Create a pandas DataFrame using num_rows to limit the number of rows to 3
>>> pandas_df = df.to_pandas(index_column = 'Feb', num_rows = 3)
>>> pandas_df accounts Jan Mar Apr datetime Feb 90.0 Yellow Inc NaN NaN NaN 2017-01-04 90.0 Blue Inc 50.0 95.0 101.0 2017-01-04 200.0 Red Inc 150.0 140.0 NaN 2017-01-04
Example 5: Create a pandas DataFrame using all_rows
>>> pandas_df = df.to_pandas(all_rows = True)
>>> pandas_df Feb Jan Mar Apr datetime accounts Red Inc 200.0 150.0 140.0 NaN 2017-01-04 Orange Inc 210.0 NaN NaN 250.0 2017-01-04 Blue Inc 90.0 50.0 95.0 101.0 2017-01-04 Alpha Co 210.0 200.0 215.0 250.0 2017-01-04 Yellow Inc 90.0 NaN NaN NaN 2017-01-04 Jones LLC 200.0 150.0 140.0 180.0 2017-01-04
Example 6: Convert teradataml DataFrame to pandas DataFrame using fastexport, printing errors, if any, on screen
This example prints errors and warnings, if any, on to the screen, as catch_errors_warnings argument is not set.
>>> pandas_df = df.to_pandas(fastexport = True) Errors: [] Warnings: []
>>> pandas_df Feb Jan Mar Apr datetime accounts Red Inc 200.0 150.0 140.0 NaN 2017-01-04 Orange Inc 210.0 NaN NaN 250.0 2017-01-04 Blue Inc 90.0 50.0 95.0 101.0 2017-01-04 Yellow Inc 90.0 NaN NaN NaN 2017-01-04 Alpha Co 210.0 200.0 215.0 250.0 2017-01-04 Jones LLC 200.0 150.0 140.0 180.0 2017-01-04
Example 7: Convert teradataml DataFrame to pandas DataFrame using fastexport, catching errors, if any
This example catches errors and warnings, if any, raised by fastexport, and returns a tuple.
>>> pandas_df, err, warn = df.to_pandas(fastexport = True, catch_errors_warnings = True)
# Print pandas df. >>> pandas_df Feb Jan Mar Apr datetime accounts Red Inc 200.0 150.0 140.0 NaN 2017-01-04 Orange Inc 210.0 NaN NaN 250.0 2017-01-04 Blue Inc 90.0 50.0 95.0 101.0 2017-01-04 Yellow Inc 90.0 NaN NaN NaN 2017-01-04 Alpha Co 210.0 200.0 215.0 250.0 2017-01-04 Jones LLC 200.0 150.0 140.0 180.0 2017-01-04
# Print errors list. >>> err []
# Print warnings list. >>> warn []
Example 8: Convert teradataml DataFrame to pandas DataFrame using fastexport by opening specific number of sessions
>>> pandas_df = df.to_pandas(fastexport = True, open_sessions = 2) Errors: [] Warnings: [] # Print pandas df. >>> pandas_df Feb Jan Mar Apr datetime accounts Red Inc 200.0 150.0 140.0 NaN 2017-01-04 Orange Inc 210.0 NaN NaN 250.0 2017-01-04 Blue Inc 90.0 50.0 95.0 101.0 2017-01-04 Yellow Inc 90.0 NaN NaN NaN 2017-01-04 Alpha Co 210.0 200.0 215.0 250.0 2017-01-04 Jones LLC 200.0 150.0 140.0 180.0 2017-01-04