Use the td_intersect() function to find the data at the intersection of the list of teradataml DataFrames or GeoDataFrames along the index axis, and returns a DataFrame or a GeoDataFrame with rows common to all input DataFrames or GeoDataFrames.
Example Prerequisites
>>> from teradataml import load_example_data
>>> load_example_data("dataframe", "setop_test1") >>> load_example_data("dataframe", "setop_test2")
>>> from teradataml.dataframe import dataframe >>> from teradataml.dataframe.setop import td_intersect
Example 1: Run td_intersect() on rows from two DataFrames, using default signature
This example gets the intersection of rows from two teradataml DataFrames when using default signature of the function.
>>> df1 = DataFrame('setop_test1') >>> df1 masters gpa stats programming admitted id 62 no 3.70 Advanced Advanced 1 53 yes 3.50 Beginner Novice 1 69 no 3.96 Advanced Advanced 1 61 yes 4.00 Advanced Advanced 1 58 no 3.13 Advanced Advanced 1 51 yes 3.76 Beginner Beginner 0 68 no 1.87 Advanced Novice 1 66 no 3.87 Novice Beginner 1 60 no 4.00 Advanced Novice 1 59 no 3.65 Novice Novice 1
>>> df2 = DataFrame('setop_test2') >>> df2 masters gpa stats programming admitted id 12 no 3.65 Novice Novice 1 15 yes 4.00 Advanced Advanced 1 14 yes 3.45 Advanced Advanced 0 20 yes 3.90 Advanced Advanced 1 18 yes 3.81 Advanced Advanced 1 17 no 3.83 Advanced Advanced 1 13 no 4.00 Advanced Novice 1 11 no 3.13 Advanced Advanced 1 60 no 4.00 Advanced Novice 1 19 yes 1.98 Advanced Advanced 0
>>> idf = td_intersect([df1, df2]) >>> idf masters gpa stats programming admitted id 64 yes 3.81 Advanced Advanced 1 60 no 4.00 Advanced Novice 1 58 no 3.13 Advanced Advanced 1 68 no 1.87 Advanced Novice 1 66 no 3.87 Novice Beginner 1 60 no 4.00 Advanced Novice 1 62 no 3.70 Advanced Advanced 1
Example 2: Run td_intersect() on rows from two DataFrames, discarding duplicate rows
This examples applies the intersect operation on rows from the two teradataml DataFrames from previous example, discarding duplicate rows from the result by passing allow_duplicates = False.
>>> idf = td_intersect([df1, df2], allow_duplicates=False) >>> idf masters gpa stats programming admitted id 64 yes 3.81 Advanced Advanced 1 60 no 4.00 Advanced Novice 1 58 no 3.13 Advanced Advanced 1 68 no 1.87 Advanced Novice 1 66 no 3.87 Novice Beginner 1 62 no 3.70 Advanced Advanced 1
Example 3: Run td_intersect() on more than two DataFrames
This example shows what happens when td_intersect is used on more than two teradataml DataFrames. In this example, you have three teradataml DataFrames as df1, df2 & df3, the operation is applied on df1 & df2 first, and then the operation is applied again on the result & df3.
>>> df3 = df1[df1.gpa <= 3.5] >>> df3 masters gpa stats programming admitted id 58 no 3.13 Advanced Advanced 1 67 yes 3.46 Novice Beginner 0 54 yes 3.50 Beginner Advanced 1 68 no 1.87 Advanced Novice 1 53 yes 3.50 Beginner Novice 1
>>> # Effective operation here would be, (df1-df2)-df3 >>> idf = td_intersect([df1, df2, df3]) >>> idf masters gpa stats programming admitted id 58 no 3.13 Advanced Advanced 1 68 no 1.87 Advanced Novice 1
Example 4: Perform intersection of two GeoDataFrames
- Create GeoDataFrames
>>> geo_dataframe = GeoDataFrame('sample_shapes')
>>> geo_dataframe1 = geo_dataframe[geo_dataframe.skey == 1004].select(['skey','linestrings']) >>> geo_dataframe1 skey linestrings 1004 LINESTRING (10 20 30,40 50 60,70 80 80)
>>> geo_dataframe2 = geo_dataframe[geo_dataframe.skey < 1010].select(['skey','linestrings']) >>> geo_dataframe2 skey linestrings 1009 MULTILINESTRING ((10 20 30,40 50 60),(70 80 80,90 100 110)) 1005 LINESTRING (1 3 6,3 0 6,6 0 1) 1004 LINESTRING (10 20 30,40 50 60,70 80 80) 1002 LINESTRING (1 3,3 0,0 1) 1001 LINESTRING (1 1,2 2,3 3,4 4) 1003 LINESTRING (1.35 3.6456,3.6756 0.23,0.345 1.756) 1007 MULTILINESTRING ((1 1,1 3,6 3),(10 5,20 1)) 1006 LINESTRING (1.35 3.6456 4.5,3.6756 0.23 6.8,0.345 1.756 8.9) 1008 MULTILINESTRING ((1 3,3 0,0 1),(1.35 3.6456,3.6756 0.23,0.345 1.756))
- Perform intersection
>>> td_intersect([geo_dataframe1,geo_dataframe2]) skey linestrings 1004 LINESTRING (10 20 30,40 50 60,70 80 80)