merge() Method | Teradata Python Package - merge() Method

merge() Method | Teradata Python Package - merge() Method - Teradata Package for Python

Teradata® Package for Python User Guide

Product

Teradata Package for Python

Release Number

17.00

Published

November 2021

Language

English (United States)

Last Update

2022-01-14

dita:mapPath

bol1585763678431.ditamap

dita:ditavalPath

ayr1485454803741.ditaval

dita:id

B700-4006

lifecycle

Product Category

Teradata Vantage

Use the merge() function to merge two teradataml DataFrame objects together. The merge is done by performing a database-style join operation by columns or indexes.

Supported merge operations are:

inner: Returns only matching rows, non-matching rows are eliminated;
left: Returns all matching rows, plus non-matching rows from the left teradataml DataFrame;
right: Returns all matching rows, plus non-matching rows from the right teradataml DataFrame;
full: Returns all rows from both teradataml DataFrames, including non matching rows.

When multiple merge conditions are given, they are joined using AND boolean operator. Other boolean operators are not supported.
Nesting of merge on conditions in column expressions using & and | is not supported.
For example: on = [(df1.a == df1.b) & (df1.c == df1.d)]. You can use [df1.a == df1.b, df1.c == df1.d] instead.

The valid form of on conditions

The on parameter specifies the list of conditions that indicate the columns to be join keys. Supported join operators are =, ==, <, <=, >, >=, <> and != .

= and <> operators are not supported when using DataFrame columns as operands.

The on conditions can take the following forms:

String comparisons, in the form of col1 <= col2, where col1 is the column of left dataframe df1 and col2 is the column of right dataframe df2. For example:
1. ["a","b"] indicates df1.a = df2.a and df1.b = df2.b.
2. ["a = b", "c == d"] indicates df1.a = df2.b and df1.c = df2.d.
3. ["a <= b", "c > d"] indicates df1.a <= df2.b and df1.c > df2.d.
4. ["a < b", "c >= d"] indicates df1.a < df2.b and df1.c >= df2.d.
5. ["a <> b"] indicates df1.a != df2.b. Same is the case for ["a != b"].
Column comparisons, in the form of df1.col1 <= df1.col2, where col1 is the column of left dataframe df1 and col2 is the column of right dataframe df2. For example:
1. [df1.a == df2.a, df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b.
2. [df1.a == df2.b, df1.c == df2.d] indicates df1.a = df2.b and df1.c = df2.d.
3. [df1.a <= df2.b and df1.c > df2.d] indicates df1.a <= df2.b and df1.c > df2.d.
4. [df1.a < df2.b and df1.c >= df2.d] indicates df1.a < df2.b and df1.c >= df2.d.
5. df1.a != df2.b indicates df1.a != df2.b.
The combination of both string comparisons and comparisons as column expressions. Some examples are explained below:
1. ["a", df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b.
2. [df1.a <= df2.b, "c > d"] indicates df1.a <= df2.b and df1.c > df2.d.

Refer to teradataml DataFrame Column for more information about ColumnExpressions in teradataml.

Example Prerequisite

Set up teradataml DataFrames for merge.

>>> from datetime import datetime, timedelta

>>> dob = datetime.strptime('31101991', '%d%m%Y').date()

>>> df1 = pd.DataFrame(data={'col1': [1, 2,3],
                           'col2': ['teradata','analytics','platform'],
                           'col3': [1.3, 2.3, 3.3],
                           'col5': ['a','b','c'],
                           'col 6': [dob, dob + timedelta(1), dob + timedelta(2)],
                           "'col8'":[3,4,5]})
 
>>> df2 = pd.DataFrame(data={'col1': [1, 2, 3],
                                'col4': ['teradata', 'analytics', 'are you'],
                                'col3': [1.3, 2.3, 4.3],
                                'col7':['a','b','d'],
                                'col 6': [dob, dob + timedelta(1), dob + timedelta(3)],
                                "'col8'": [3, 4, 5]})

Persist the Pandas DataFrames in Vantage:

>>> copy_to_sql(df1, "t1", primary_index="col1")

>>> copy_to_sql(df2, "t2", primary_index="col1")

>>> df1 = DataFrame("table1")

>>> df2 = DataFrame("table2")

Display the DataFrames.

>>> df1
                 'col8'       col 6       col2  col3 col5
            col1                                        
            2         4  1991-11-01  analytics   2.3    b
            1         3  1991-10-31   teradata   1.3    a
            3         5  1991-11-02   platform   3.3    c
 
>>> df2
                 'col8'       col 6  col3       col4 col7
            col1                                        
            2         4  1991-11-01   2.3  analytics    b
            1         3  1991-10-31   1.3   teradata    a
            3         5  1991-11-03   4.3    are you    d

Example 1: Specify merge conditions as string using 'on' argument as well as DataFrame indexes as merge keys

>>> df1.merge(right = df2, how = "left", on = ["col3","col2=col4"], use_index = True, lsuffix = "t1", rsuffix = "t2")
 
  t2_col1 col5    t2_col 6 t1_col1 t2_'col8'  t1_col3       col4  t2_col3  col7       col2    t1_col 6 t1_'col8'
0       2    b  1991-11-01       2         4      2.3  analytics      2.3     b  analytics  1991-11-01         4
1       1    a  1991-10-31       1         3      1.3   teradata      1.3     a   teradata  1991-10-31         3
2    None    c        None       3      None      3.3       None      NaN  None   platform  1991-11-02         5

Example 2: Specify 'on' conditions as ColumnExpression and DataFrame indexes as merge keys

>>> df1.merge(right = df2, how = "left", on = [df1.col1, df1.col3], use_index = True, lsuffix = "t1", rsuffix = "t2")
 
  t1_col1  t2_col1       col2  t1_col3  t2_col3 col5    t1_col 6    t2_col 6  t1_'col8'  t2_'col8'       col4  col7
0        2      2.0  analytics      2.3      2.3    b  1991-06-23  1991-06-23          4        4.0  analytics     b
1        1      1.0   teradata      1.3      1.3    a  1991-06-22  1991-06-22          3        3.0   teradata     a
2        3      NaN   platform      3.3      NaN    c  1991-06-24        None          5        NaN       None  None

Example 3: Specify left_on, right_on conditions along with DataFrame indexes as merge keys

>>> df1.merge(right = df2, how = "right", left_on = "col2", right_on = "col4", use_index = True, lsuffix = "t1", rsuffix = "t2")
  t1_col1 t2_col1       col2  t1_col3  t2_col3  col5    t1_col 6    t2_col 6 t1_'col8' t2_'col8'       col4 col7
0       2       2  analytics      2.3      2.3     b  1991-11-01  1991-11-01         4         4  analytics    b
1       1       1   teradata      1.3      1.3     a  1991-10-31  1991-10-31         3         3   teradata    a
2    None       3       None      NaN      4.3  None        None  1991-11-03      None         5    are you    d

Example 4: DataFrames to be merged do not contain common columns

If DataFrames to be merged do not contain common columns, lsuffix and rsuffix are not required.

>>> new_df1 = df1.select(['col2', 'col5'])

>>> new_df2 = df2.select(['col4', 'col7'])

>>> new_df1
  col5       col2
0    b  analytics
1    a   teradata
2    c   platform

>>> new_df2
  col7       col4
0    b  analytics
1    a   teradata
2    d    are you

>>> new_df1.merge(right = new_df2, how = "inner", on = "col5=col7")
  col5       col4       col2 col7
0    b  analytics  analytics    b
1    a   teradata   teradata    a

Example 5: No merge conditions are specified

When no merge conditions are specified, teradataml DataFrame indexes are used as merge keys.

>>> df1.merge(right = df2, how = "full", lsuffix = "t1", rsuffix = "t2")
  t2_col1 col5    t2_col 6 t1_col1 t2_'col8'  t1_col3       col4  t2_col3 col7       col2    t1_col 6 t1_'col8'
0       2    b  1991-11-01       2         4      2.3  analytics      2.3    b  analytics  1991-11-01         4
1       1    a  1991-10-31       1         3      1.3   teradata      1.3    a   teradata  1991-10-31         3
2       3    c  1991-11-03       3         5      3.3    are you      4.3    d   platform  1991-11-02         5