Column Specification in Analytics Database Analytic Functions | teradataml - Column Specification in Analytics Database Analytic Functions - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
Enterprise
IntelliFlex
VMware
Product
Teradata Package for Python
Release Number
20.00
Published
March 2024
Language
English (United States)
Last Update
2024-04-09
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

Column Specification

Some analytic functions in the Analytics Database provide arguments for selecting or performing operations on multiple columns.

teradataml allows user to specify columns in the following ways:
  • Pass a single column name.

    For example, column_arg = "col1"

  • Pass multiple columns (specific columns only).

    For example, column_arg = ["col1", "col3", "col8"]

  • Pass multiple columns using DataFrame.columns and slice filtering.

    For example, column_arg = list(set(df.columns[2:10]) - set(df.columns[5:7]))

  • Pass multiple column as a column range.

    For example, column_arg = "column_range"

Specifying Column Range column_range

Column range can be specified in the following ways:
  • Without column exclusion.

    Syntax: "start_column:end_column"

    Must be passed as string.
  • With column exclusion.

    Syntax: ["start_column:end_column", "-exclude_column1", ...]

    Must be passed as a list of strings.
The start_column and end_column can be:
  • Column names.

    For example, "column1:column2".

  • Nonnegative integers that represent the indexes of columns in the table. The first column has index 0.

    For example, "0:4" specifies the first five columns in the table.

  • Empty.
    For example,
    • ":4" or ":columnD" specifies all columns up to and including the column with index 4 or columnD.
    • "4:" or "columnD:" specifies the column with index 4 (or columnD) and all columns after it.
    • ":" specifies all columns in the table.

The exclude_column is a column in the specified range, represented by either its name or its index.

For example, ["0:99", "-[50]", "-column10"] specifies the columns with indexes from 0 to 99, except the column with index 50 and column10.

Column ranges cannot overlap, and cannot include any specified column.
When handling column name containing range separators:
  • Column can be enclosed in double quotes.
    For example, if DataFrame has columns :columnA, columnB: and :column:C:, columns can be selected by enclosing in double quotes:
    • "\":columnA\"" or '":columnA"'
    • "\"columnB:\"" or '"columnB:"'
    • "\":column:C:\"" or '":column:C:"'
  • You can always use column range and exclusion by index instead of column names.

Column Range Examples

Assume "insect_sprays" DataFrame has columns groupA, :groupB, groupC:CC, groupD:, groupE:EE:EEE and :groupF:, the following examples show how to select range of columns for the group_columns argument in the ANOVA function.

Example 1: Specify range index of columns.
ANOVA(data=insect_sprays,
      group_columns="2:5",
      alpha = 0.025)
Example 2: Specify range index of columns with first column empty.
ANOVA(data=insect_sprays,
      group_columns=":5",
      alpha=0.025)
Example 3: Specify range of columns by their names. Column has separator, hence enclose it in double quotes.
ANOVA(data=insect_sprays,
      group_columns= '"groupC:CC":"groupE:EE:EEE"',
      alpha = 0.05)
Example 4: Specify range of columns by their names with second column empty. Column has separator, hence enclose it in double quotes.
ANOVA(data=insect_sprays,
      group_columns='"groupE:EE:EEE":',
      alpha = 0.05)
Example 5: Specify range of columns for all columns. Column has separator, hence enclose it in double quotes.
ANOVA(data=insect_sprays,
      group_columns=[':', "-\":groupB\""], # or [':', "-[2]"]
      alpha = 0.05)