drop_duplicate() | Teradata Package for Python - drop_duplicate() Function - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Use the the drop_duplicate() function to drop duplicate rows from teradataml DataFrame to return distinct values from the DataFrame.

Optional Argument:
  • column_names: Specifies the names of the columns to drop the duplicate values of, to get the distinct values.

    If not specified, all columns in the DataFrame are considered for the operation.

Example Setup

In this example, "admission_train" dataset is used.

>>> from teradataml import *
>>> load_example_data("dataframe", "admissions_train")
>>> df = DataFrame("admissions_train")
# Print dataframe.
>>> df
      masters   gpa     stats programming admitted
   id
   13      no  4.00  Advanced      Novice        1
   26     yes  3.57  Advanced    Advanced        1
   5       no  3.44    Novice      Novice        0
   19     yes  1.98  Advanced    Advanced        0
   15     yes  4.00  Advanced    Advanced        1
   40     yes  3.95    Novice    Beginner        0
   7      yes  2.33    Novice      Novice        1
   22     yes  3.46    Novice    Beginner        0
   36      no  3.00  Advanced      Novice        0
   38     yes  2.65  Advanced    Beginner        1

Example 1: Get the distinct rows of values for the column 'programming'

>>> df.drop_duplicate("programming")
  programming
0      Novice
1    Beginner
2    Advanced

Example 2: Get the distinct rows of values for the columns 'programming' and 'admitted'

>>> df.drop_duplicate(["programming","admitted"])
  programming  admitted
0    Beginner         0
1    Advanced         1
2    Beginner         1
3    Advanced         0
4      Novice         1
5      Novice         0