td_intersect - Teradata Python Package

Teradata® Python Package User Guide

Product
Teradata Python Package
Release Number
16.20
Published
February 2020
Language
English (United States)
Last Update
2020-02-29
dita:mapPath
rkb1531260709148.ditamap
dita:ditavalPath
Generic_no_ie_no_tempfilter.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

Use the td_intersect() function to find the data at the intersection of the list of teradataml DataFrames along the index axis and returns a DataFrame with rows common to all input DataFrames.

Example Prerequisites

>>> from teradataml import load_example_data
>>> load_example_data("dataframe", "setop_test1")
>>> load_example_data("dataframe", "setop_test2")
>>> from teradataml.dataframe import dataframe
>>> from teradataml.dataframe.setop import td_intersect

Example: Run td_intersect() on rows from two DataFrames, using default signature

This example gets the intersection of rows from two teradataml DataFrames when using default signature of the function.

>>> df1 = DataFrame('setop_test1')
>>> df1
   masters   gpa     stats programming  admitted
id                                             
62      no  3.70  Advanced    Advanced         1
53     yes  3.50  Beginner      Novice         1
69      no  3.96  Advanced    Advanced         1
61     yes  4.00  Advanced    Advanced         1
58      no  3.13  Advanced    Advanced         1
51     yes  3.76  Beginner    Beginner         0
68      no  1.87  Advanced      Novice         1
66      no  3.87    Novice    Beginner         1
60      no  4.00  Advanced      Novice         1
59      no  3.65    Novice      Novice         1
>>> df2 = DataFrame('setop_test2')
>>> df2
   masters   gpa     stats programming  admitted
id                                             
12      no  3.65    Novice      Novice         1
15     yes  4.00  Advanced    Advanced         1
14     yes  3.45  Advanced    Advanced         0
20     yes  3.90  Advanced    Advanced         1
18     yes  3.81  Advanced    Advanced         1
17      no  3.83  Advanced    Advanced         1
13      no  4.00  Advanced      Novice         1
11      no  3.13  Advanced    Advanced         1
60      no  4.00  Advanced      Novice         1
19     yes  1.98  Advanced    Advanced         0
>>> idf = td_intersect([df1, df2])
>>> idf
   masters   gpa     stats programming  admitted
id                                             
64     yes  3.81  Advanced    Advanced         1
60      no  4.00  Advanced      Novice         1
58      no  3.13  Advanced    Advanced         1
68      no  1.87  Advanced      Novice         1
66      no  3.87    Novice    Beginner         1
60      no  4.00  Advanced      Novice         1
62      no  3.70  Advanced    Advanced         1

Example: Run td_intersect() on rows from two DataFrames, discarding duplicate rows

This examples applies the intersect operation on rows from the two teradataml DataFrames from previous example, discarding duplicate rows from the result by passing allow_duplicates = False.

>>> idf = td_intersect([df1, df2], allow_duplicates=False)
>>> idf
   masters   gpa     stats programming  admitted
   id                                             
   64     yes  3.81  Advanced    Advanced         1
   60      no  4.00  Advanced      Novice         1
   58      no  3.13  Advanced    Advanced         1
   68      no  1.87  Advanced      Novice         1
   66      no  3.87    Novice    Beginner         1
   62      no  3.70  Advanced    Advanced         1

Example: Run td_intersect() on more than two DataFrames

This example shows what happens when td_intersect is used on more than two teradataml DataFrames. In this example, you have three teradataml DataFrames as df1, df2 & df3, the operation is applied on df1 & df2 first, and then the operation is applied again on the result & df3.

>>> df3 = df1[df1.gpa <= 3.5]
>>> df3
   masters   gpa     stats programming  admitted
id                                             
58      no  3.13  Advanced    Advanced         1
67     yes  3.46    Novice    Beginner         0
54     yes  3.50  Beginner    Advanced         1
68      no  1.87  Advanced      Novice         1
53     yes  3.50  Beginner      Novice         1
>>> # Effective operation here would be, (df1-df2)-df3
>>> idf = td_intersect([df1, df2, df3])
>>> idf
   masters   gpa     stats programming  admitted
id                                             
58      no  3.13  Advanced    Advanced         1
68      no  1.87  Advanced      Novice         1