DataFrame Manipulation - Teradata Python Package

Teradata® Python Package User Guide

Product
Teradata Python Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-29
dita:mapPath
rkb1531260709148.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

You can manipulate a DataFrame with methods and operators. The DataFrames created using the DataFrame() constructor, or the DataFrame() and DataFrame.from_table() and DataFrame.from_query() functions have the same methods and operators.

DataFrame Methods

A DataFrame method has the basic syntax DataFrame_instance.method(arguments). Using the specified DataFrame and arguments, the method returns a new DataFrame. The specified DataFrame remains unchanged.

DataFrame Method Description
assign() Method Assigns new column expressions in DataFrame_instance.
concat() Method Concatenate two teradataml DataFrame objects along the index axis.
describe() in Regular Aggregate Mode Generates statistics for numeric columns. Computes the count, mean, std, min, percentiles, and max for numeric columns.
drop() Method Drops specified labels from rows or columns in DataFrame_instance.
dropna() Method Removes rows with null values in DataFrame_instance.
filter() Method Returns only the filtered columns or rows (based on the index) of DataFrame_instance. Filter is item, like, or regex.

Other filters are operators index[] and loc[].

get() Method Retrieves required columns from DataFrame using column names as key.
get_values() Method Retrieves all values (only) present in a teradataml DataFrame.
groupby() Method Returns all columns of DataFrame_instance, grouped as specified.
head() Method Returns the first n rows of DataFrame_instance.
join() Method Joins two different teradataml DataFrames together.
merge() Method Merges two teradataml DataFrames together.
sample() Method Samples rows from a DataFrame, directly or based on conditions.
select() Method Returns only the selected columns of DataFrame_instance.
set_index() Method Assigns one or more existing columns as the new index to a teradataml DataFrame.
sort() Method Returns all columns of DataFrame_instance, sorted as specified.
sort_index() Method Return sorted objects by labels (along an axis) in either ascending or descending order for a teradataml DataFrame.
squeeze() Method Squeeze one-dimensional axis objects into a scalar for teradataml DataFrames with a single element, or a Series object for a teradataml DataFrame with a single column.
tail() Method Returns the last n rows of the sorted teradataml DataFrame.
Aggregate Methods
DataFrame Method Description
agg() Method Applies specified aggregate methods to specified columns of DataFrame_instance.
count() in Regular Aggregate Mode Returns column-wise count of DataFrame_instance.
max() Method Returns column-wise maximum value of DataFrame_instance.
mean() Method Returns column-wise mean value of DataFrame_instance.
median() in Regular Aggregate Mode Returns column-wise median value of a DataFrame.
min() Method Returns column-wise minimum value of DataFrame_instance.
std() Method Returns column-wise standard deviation value of DataFrame_instance.
sum() Method Returns column-wise sum value of DataFrame_instance.
var() Method Returns column-wise unbiased variance value of the DataFrame.
Time Series Aggregate Methods
DataFrame Method Description
bottom() Returns the smallest number of values in the columns for each group, with or without ties.
count() in Time Series Aggregate Mode Returns column-wise count of the DataFrame
delta_t() Calculates time differences, or DELTA_T, between a starting and an ending event.
describe() in Time Series Aggregate Mode Generates statistics for numeric columns.
first() Returns the oldest value, determined by the timecode, for each group.
groupby_time() Resamples time series data to group the same by time on a datetime column of a DataFrame.
last() Returns the newest value, determined by the timecode, for each group.
mad() Returns the median of the set of values defined as the absolute value of the difference between each value and the median of all values in each group.
median() in Time Series Aggregate Mode Returns column-wise median value of the dataframe.
mode() Returns the column-wise mode of all values in each group.
resample() Resamples time series data to group the same by time on a datetime column of a DataFrame.
top() Returns the largest number of values in the columns for each group, with or without ties.
The describe() and get_values() methods do not return a teradataml DataFrame.
The groupby() method returns instance of teradataml DataFrameGroupBy, which is inherited from teradataml DataFrame.
The groupby_time() and resample() methods return instance of teradataml DataFrameGroupByTime, which is inherited from teradataml DataFrame.

Operators

A DataFrame operator has the basic syntax as follows:
  • DataFrame_instance.loc[arguments]
  • DataFrame_instance.iloc[arguments]
  • DataFrame_instance[arguments]

All operators are filters. Another filter is the filter() method.

DataFrame Operator Description
index[] Operator Returns only the filtered rows of DataFrame_instance. Filter uses logical expressions composed of DataFrame columns and Python literals.
loc[] Operator Returns new DataFrame that has only the filtered columns and rows of DataFrame_instance accessed by labels.
iloc[] Operator Returns new DataFrame that has only the filtered columns and rows of DataFrame_instance accessed by integer values.