Use the the drop_duplicate() function to drop duplicate rows from teradataml DataFrame to return distinct values from the DataFrame.
Optional Argument:
- column_names: Specifies the names of the columns to drop the duplicate values of, to get the distinct values.
If not specified, all columns in the DataFrame are considered for the operation.
Example Setup
In this example, "admission_train" dataset is used.
>>> from teradataml import *
>>> load_example_data("dataframe", "admissions_train")
>>> df = DataFrame("admissions_train")
# Print dataframe. >>> df masters gpa stats programming admitted id 13 no 4.00 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 5 no 3.44 Novice Novice 0 19 yes 1.98 Advanced Advanced 0 15 yes 4.00 Advanced Advanced 1 40 yes 3.95 Novice Beginner 0 7 yes 2.33 Novice Novice 1 22 yes 3.46 Novice Beginner 0 36 no 3.00 Advanced Novice 0 38 yes 2.65 Advanced Beginner 1
Example 1: Get the distinct rows of values for the column 'programming'
>>> df.drop_duplicate("programming") programming 0 Novice 1 Beginner 2 Advanced
Example 2: Get the distinct rows of values for the columns 'programming' and 'admitted'
>>> df.drop_duplicate(["programming","admitted"]) programming admitted 0 Beginner 0 1 Advanced 1 2 Beginner 1 3 Advanced 0 4 Novice 1 5 Novice 0