replace() | DataFrame Manipulation | Teradata Package for Python - replace() - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Use the replace() function to replace every occurence of to_replace with the value in the columns mentioned in subset. When subset is not provided, this function replaces occurences of to_replace in all columns.

Required Arguments:

  • to_replace: Specifies a ColumnExpression or a literal that the function searches for values in the Column. Use ColumnExpression when you want to match the condition based on a DataFrameColumn function, else use literal.

    Supported column types: CHAR, VARCHAR, FLOAT, INTEGER, DECIMAL.

    Only ColumnExpressions generated from DataFrameColumn functions are supported. BinaryExpressions are not supported.

    For example: Consider teradataml DataFrame has two columns COL1, COL2. df.COL1.abs() is supported but df.COL1 == df.COL2 is not supported.

  • value: Specifies a ColumnExpression or a literal that replaces the to_replace in the column. Use ColumnExpression only when the value you want to replace is based on a DataFrameColumn function; otherwise, use literal.

    Supported column types: CHAR, VARCHAR, FLOAT, INTEGER, DECIMAL.

    This argument is required argument when to_replace is not a dictionary. Otherwise, it is optional and ignored.

    Only ColumnExpressions generated from DataFrameColumn functions are supported. BinaryExpressions are not supported.

    For example: Consider teradataml DataFrame has two columns COL1, COL2. df.COL1.abs() is supported but df.COL1 == df.COL2 is not supported.

Optional Arguments:
  • subset: Specifies the columns to consider for replacing the values.

Example Setup

  • Load the data to run the example.
    >>> load_example_data("dataframe", "admissions_train")
  • Create a DataFrame on 'admissions_train' table.
    >>> df = DataFrame("admissions_train")
    >>> print(df)
       masters   gpa     stats programming  admitted
    id
    15     yes  4.00  Advanced    Advanced         1
    34     yes  3.85  Advanced    Beginner         0
    13      no  4.00  Advanced      Novice         1
    38     yes  2.65  Advanced    Beginner         1
    5       no  3.44    Novice      Novice         0
    40     yes  3.95    Novice    Beginner         0
    7      yes  2.33    Novice      Novice         1
    22     yes  3.46    Novice    Beginner         0
    26     yes  3.57  Advanced    Advanced         1
    17      no  3.83  Advanced    Advanced         1

Example 1: Replace the string 'Advanced' with 'Good' in columns 'stats' and 'programming'

>>> res = df.replace({"Advanced": "Good", "Beginner": "starter"}, subset=["stats", "programming"])
>>> print(res)
   masters   gpa   stats programming  admitted
id
15     yes  4.00    Good        Good         1
7      yes  2.33  Novice      Novice         1
22     yes  3.46  Novice     starter         0
17      no  3.83    Good        Good         1
13      no  4.00    Good      Novice         1
38     yes  2.65    Good     starter         1
26     yes  3.57    Good        Good         1
5       no  3.44  Novice      Novice         0
34     yes  3.85    Good     starter         0
40     yes  3.95  Novice     starter         0

Example 2: Replace the string 'Advanced' with 'Good' and 'Beginner' with 'starter' in columns 'stats' and 'programming'

>>> res = df.replace({"Advanced": "Good", "Beginner": "starter"}, subset=["stats", "programming"])
>>> print(res)
   masters   gpa   stats programming  admitted
id
15     yes  4.00    Good        Good         1
7      yes  2.33  Novice      Novice         1
22     yes  3.46  Novice     starter         0
17      no  3.83    Good        Good         1
13      no  4.00    Good      Novice         1
38     yes  2.65    Good     starter         1
26     yes  3.57    Good        Good         1
5       no  3.44  Novice      Novice         0
34     yes  3.85    Good     starter         0
40     yes  3.95  Novice     starter         0

Example 3: Append the string '_New' to 'stats' column when values in 'programming' and 'stats' are same

>>> res = df.replace({df.programming: df.stats+"_New"}, subset=["stats"])
>>> print(res)
   masters   gpa         stats programming  admitted
id
15     yes  4.00  Advanced_New    Advanced         1
34     yes  3.85      Advanced    Beginner         0
13      no  4.00      Advanced      Novice         1
38     yes  2.65      Advanced    Beginner         1
5       no  3.44    Novice_New      Novice         0
40     yes  3.95        Novice    Beginner         0
7      yes  2.33    Novice_New      Novice         1
22     yes  3.46        Novice    Beginner         0
26     yes  3.57  Advanced_New    Advanced         1
17      no  3.83  Advanced_New    Advanced         1

Example 4: Round the values of 'gpa' to its nearest integer

>>> res = df.replace({df.gpa: df.gpa.round(0)}, subset=["gpa"])
>>> print(res)
   masters  gpa     stats programming  admitted
id
15     yes  4.0  Advanced    Advanced         1
7      yes  2.0    Novice      Novice         1
22     yes  3.0    Novice    Beginner         0
17      no  4.0  Advanced    Advanced         1
13      no  4.0  Advanced      Novice         1
38     yes  3.0  Advanced    Beginner         1
26     yes  4.0  Advanced    Advanced         1
5       no  3.0    Novice      Novice         0
34     yes  4.0  Advanced    Beginner         0
40     yes  4.0    Novice    Beginner         0

Example 5: Replace the value of 'masters' with '1' if value is 'yes' and with '0' if value is 'no'

>>> res = df.replace({'yes': 1, 'no': 0}, subset=["masters"])
>>> print(res)
   masters   gpa     stats programming  admitted
id
15       1  4.00  Advanced    Advanced         1
7        1  2.33    Novice      Novice         1
22       1  3.46    Novice    Beginner         0
17       0  3.83  Advanced    Advanced         1
13       0  4.00  Advanced      Novice         1
38       1  2.65  Advanced    Beginner         1
26       1  3.57  Advanced    Advanced         1
5        0  3.44    Novice      Novice         0
34       1  3.85  Advanced    Beginner         0
40       1  3.95    Novice    Beginner         0