td_except | SET Operations | Teradata Package for Python - td_except - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-04-03
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905
Use the td_except() function to return the rows that appear in the first teradataml DataFrame or GeoDataFrame, and not in other teradataml DataFrames or GeoDataFrames, along the index axis.
This function must be applied to data frames of the same type: either all teradataml DataFrames, or all GeoDataFrames.

Example Prerequisites

>>> from teradataml import load_example_data
>>> load_example_data("dataframe", "setop_test1")
>>> load_example_data("dataframe", "setop_test2")
>>> from teradataml.dataframe import dataframe
>>> from teradataml.dataframe.setop import td_except

Example 1: Run td_except() on rows from two DataFrames, using default signature

This example applies the except operation on rows from two teradataml DataFrames when using default signature of the function.

>>> df1 = DataFrame('setop_test1')
>>> df1
   masters   gpa     stats programming  admitted
id                                             
62      no  3.70  Advanced    Advanced         1
53     yes  3.50  Beginner      Novice         1
69      no  3.96  Advanced    Advanced         1
61     yes  4.00  Advanced    Advanced         1
58      no  3.13  Advanced    Advanced         1
51     yes  3.76  Beginner    Beginner         0
68      no  1.87  Advanced      Novice         1
66      no  3.87    Novice    Beginner         1
60      no  4.00  Advanced      Novice         1
59      no  3.65    Novice      Novice         1
>>> df2 = DataFrame('setop_test2')
>>> df2
   masters   gpa     stats programming  admitted
id                                             
12      no  3.65    Novice      Novice         1
15     yes  4.00  Advanced    Advanced         1
14     yes  3.45  Advanced    Advanced         0
20     yes  3.90  Advanced    Advanced         1
18     yes  3.81  Advanced    Advanced         1
17      no  3.83  Advanced    Advanced         1
13      no  4.00  Advanced      Novice         1
11      no  3.13  Advanced    Advanced         1
60      no  4.00  Advanced      Novice         1
19     yes  1.98  Advanced    Advanced         0
>>> idf = td_except([df1[df1.id<55] , df2])
>>> idf
   masters   gpa     stats programming  admitted
id                                             
51     yes  3.76  Beginner    Beginner         0
50     yes  3.95  Beginner    Beginner         0
54     yes  3.50  Beginner    Advanced         1
52      no  3.70    Novice    Beginner         1
53     yes  3.50  Beginner      Novice         1
53     yes  3.50  Beginner      Novice         1

Example 2: Run td_except() on rows from two DataFrames, discarding duplicate rows

This examples applies the except operation on rows from the two teradataml DataFrames from previous example, discarding duplicate rows from the result by passing allow_duplicates = False.

>>> idf = td_except([df1[df1.id<55] , df2], allow_duplicates=False)
>>> idf
   masters   gpa     stats programming  admitted
id                                             
54     yes  3.50  Beginner    Advanced         1
51     yes  3.76  Beginner    Beginner         0
53     yes  3.50  Beginner      Novice         1
50     yes  3.95  Beginner    Beginner         0
52      no  3.70    Novice    Beginner         1

Example 3: Run td_except() on more than two DataFrames

This example shows what happens when td_except is used on more than two teradataml DataFrames. In this example, you have three teradataml DataFrames as df1, df2 & df3, the operation is applied on df1 and df2 first, and then the operation is applied again on the result and df3.

>>> df3 = df1[df1.gpa <= 3.9]
>>> # Effective operation here would be, (df1-df2)-df3
>>> idf = td_except([df1, df2, df3])
>>> idf
   masters   gpa     stats programming  admitted
id                                             
61     yes  4.00  Advanced    Advanced         1
50     yes  3.95  Beginner    Beginner         0
69      no  3.96  Advanced    Advanced         1

Example 4: Run td_except on two GeoDataFrames

  • Create GeoDataFrames
    >>> geo_dataframe = GeoDataFrame('sample_shapes')
    >>> geo_dataframe1 = geo_dataframe[geo_dataframe.skey == 1004].select(['skey','linestrings'])
    
    >>> geo_dataframe1
    skey            linestrings
    1004  LINESTRING (10 20 30,40 50 60,70 80 80)
    >>> geo_dataframe2 = geo_dataframe[geo_dataframe.skey < 1010].select(['skey','linestrings'])
    
    >>> geo_dataframe2
    skey                        linestrings
    1009            MULTILINESTRING ((10 20 30,40 50 60),(70 80 80,90 100 110))
    1005                                         LINESTRING (1 3 6,3 0 6,6 0 1)
    1004                                LINESTRING (10 20 30,40 50 60,70 80 80)
    1002                                               LINESTRING (1 3,3 0,0 1)
    1001                                           LINESTRING (1 1,2 2,3 3,4 4)
    1003                       LINESTRING (1.35 3.6456,3.6756 0.23,0.345 1.756)
    1007                            MULTILINESTRING ((1 1,1 3,6 3),(10 5,20 1))
    1006           LINESTRING (1.35 3.6456 4.5,3.6756 0.23 6.8,0.345 1.756 8.9)
    1008  MULTILINESTRING ((1 3,3 0,0 1),(1.35 3.6456,3.6756 0.23,0.345 1.756))
    
  • Run td_except
    >>> td_except([geo_dataframe2,geo_dataframe1])
    skey                    linestrings
    1008  MULTILINESTRING ((1 3,3 0,0 1),(1.35 3.6456,3.6756 0.23,0.345 1.756))
    1003                       LINESTRING (1.35 3.6456,3.6756 0.23,0.345 1.756)
    1005                                         LINESTRING (1 3 6,3 0 6,6 0 1)
    1006           LINESTRING (1.35 3.6456 4.5,3.6756 0.23 6.8,0.345 1.756 8.9)
    1009            MULTILINESTRING ((10 20 30,40 50 60),(70 80 80,90 100 110))
    1001                                           LINESTRING (1 1,2 2,3 3,4 4)
    1007                            MULTILINESTRING ((1 1,1 3,6 3),(10 5,20 1))
    1002                                               LINESTRING (1 3,3 0,0 1)