DataFrame Manipulation | Teradata Python Package - DataFrame Manipulation - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

You can manipulate a DataFrame with methods and operators. The DataFrames created using the DataFrame() constructor, or the DataFrame() and DataFrame.from_table() and DataFrame.from_query() functions have the same methods and operators.

DataFrame Methods

A DataFrame method has the basic syntax DataFrame.method(arguments). Using the specified DataFrame and arguments, the method returns a new DataFrame. The specified DataFrame remains unchanged.

apply Method
Applies user defined function (UDF) to each row in teradataml DataFrame, leveraging APPLY table operator of Open Analytics Framework.
assign() Method
Assigns new column expressions in a DataFrame.
concat() Method
Concatenate two teradataml DataFrame objects along the index axis.
describe() Method
Generates statistics for numeric columns. Computes the count, mean, std, min, percentiles, and max for numeric columns.
drop() Method
Drops specified labels from rows or columns in a DataFrame.
dropna() Method
Removes rows with null values in a DataFrame.
filter() Method
Returns only the filtered columns or rows (based on the index) of a DataFrame.
Filter is item, like, or regex.
Other filters are operators index[] and loc[].
get() Method
Retrieves required columns from a DataFrame using column names as key.
get_values() Method
Retrieves all values (only) present in a DataFrame.
groupby() Method
Returns all columns of a DataFrame, grouped as specified.
head() Method
Returns the first n rows of a DataFrame.
itertuples() Method
Iterates over teradataml DataFrame rows as namedtuples.
join() Method
Joins two different teradataml DataFrames together.
map_partition() Method
Applies a function to a group or partition of rows in a teradataml DataFrame and returns a teradataml DataFrame.
map_row() Method
Applies a function to every row in a teradataml DataFrame and returns a teradataml DataFrame.
merge() Method
Merges two teradataml DataFrames together.
sample() Method
Samples rows from a DataFrame, directly or based on conditions.
select() Method
Returns only the selected columns of a DataFrame.
set_index() Method
Assigns one or more existing columns as the new index to a DataFrame.
show_query() Method
Returns underlying SQL for the teradataml DataFrame.
sort() Method
Returns all columns of a DataFrame, sorted as specified.
sort_index() Method
Returns sorted objects by labels (along an axis) in either ascending or descending order for a DataFrame.
squeeze() Method
Squeezes one-dimensional axis objects into a scalar for DataFrames with a single element, or a Series object for a DataFrame with a single column.
tail() Method
Returns the last n rows of the sorted DataFrame.

For a list of regular aggregate functions supported by DataFrame, see Regular Aggregate Functions Supported by DataFrame.

For a list of time series aggregate functions, see Time Series Aggregate Functions.

  • The describe() and get_values() methods do not return a teradataml DataFrame.
  • The groupby() method returns instance of teradataml DataFrameGroupBy, which is inherited from teradataml DataFrame.
  • The groupby_time() and resample() methods return instance of teradataml DataFrameGroupByTime, which is inherited from teradataml DataFrame.

Functions

drop_duplicate() Function
Drops duplicate rows from teradataml DataFrame to return distinct values from the DataFrame.

Operators

A DataFrame operator has the basic syntax as follows:
  • DataFrame.loc[arguments]
  • DataFrame.iloc[arguments]
  • DataFrame[arguments]

All operators are filters. Another filter is the filter() method.

index[] Operator
Returns only the filtered rows of a DataFrame.
Filter uses logical expressions composed of DataFrame columns and Python literals.
loc[] Operator
Returns new DataFrame that has only the filtered columns and rows of a DataFrame accessed by labels.
iloc[] Operator
Returns new DataFrame that has only the filtered columns and rows of a DataFrame accessed by integer values.