merge() Method

Teradata® Python Package User Guide

brand
Teradata Vantage
prodname
Teradata Python Package
vrm_release
16.20
category
User Guide
featnum
B700-4006-098K

Use the merge() function to merge two teradataml DataFrame objects together. The merge is done by performing a database-style join operation by columns or indexes.

Supported merge operations are:
  • inner: Returns only matching rows, non-matching rows are eliminated;
  • left: Returns all matching rows, plus non-matching rows from the left teradataml DataFrame;
  • right: Returns all matching rows, plus non-matching rows from the right teradataml DataFrame;
  • full: Returns all rows from both teradataml DataFrames, including non matching rows.
Supported merge operator is "=", which means that merge is done based on equality operator. No other operator is supported.

Example Prerequisite

Set up teradataml DataFrames for merge.

>>> from datetime import datetime, timedelta
>>> dob = datetime.strptime('31101991', '%d%m%Y').date()
>>> df1 = pd.DataFrame(data={'col1': [1, 2,3],
                           'col2': ['teradata','analytics','platform'],
                           'col3': [1.3, 2.3, 3.3],
                           'col5': ['a','b','c'],
                           'col 6': [dob, dob + timedelta(1), dob + timedelta(2)],
                           "'col8'":[3,4,5]})
 
>>> df2 = pd.DataFrame(data={'col1': [1, 2, 3],
                                'col4': ['teradata', 'analytics', 'are you'],
                                'col3': [1.3, 2.3, 4.3],
                                'col7':['a','b','d'],
                                'col 6': [dob, dob + timedelta(1), dob + timedelta(3)],
                                "'col8'": [3, 4, 5]})
>>> copy_to_sql(df1, "t1", primary_index="col1")

>>> copy_to_sql(df2, "t2", primary_index="col1")
>>> df1 = DataFrame("table1")

>>> df2 = DataFrame("table2")

Display the DataFrames.

>>> df1
                 'col8'       col 6       col2  col3 col5
            col1                                        
            2         4  1991-11-01  analytics   2.3    b
            1         3  1991-10-31   teradata   1.3    a
            3         5  1991-11-02   platform   3.3    c
 
>>> df2
                 'col8'       col 6  col3       col4 col7
            col1                                        
            2         4  1991-11-01   2.3  analytics    b
            1         3  1991-10-31   1.3   teradata    a
            3         5  1991-11-03   4.3    are you    d

Example: Specify both types of 'on' conditions as well as teradataml DataFrame indexes as merge keys

>>> df3 = df1.merge(right = df2, how = "left", on = ["col3","col2=col4"], use_index = True, lsuffix = "t1", rsuffix = "t2")
>>> df3
 
     t2_col1 col5    t2_col 6 t1_col1 t2_'col8'  t1_col3       col4  t2_col3  col7       col2    t1_col 6 t1_'col8'
 
0       2    b  1991-11-01       2         4      2.3  analytics      2.3     b  analytics  1991-11-01         4
1       1    a  1991-10-31       1         3      1.3   teradata      1.3     a   teradata  1991-10-31         3
2    None    c        None       3      None      3.3       None      NaN  None   platform  1991-11-02         5

Example: Specify 'left_on', 'right_on' conditions along with teradataml DataFrame indexes as merge keys

>>> df4 = df1.merge(right = df2, how = "right", left_on = "a", right_on = "c", use_index = True, lsuffix = "t1", rsuffix = "t2")
>>> df34
             
    t2_col1  col5    t2_col 6 t1_col1 t2_'col8'  t1_col3       col4  t2_col3 col7       col2    t1_col 6 t1_'col8'
0       2     b  1991-11-01       2         4      2.3  analytics      2.3    b  analytics  1991-11-01         4
1       1     a  1991-10-31       1         3      1.3   teradata      1.3    a   teradata  1991-10-31         3
2       3  None  1991-11-03    None         5      NaN    are you      4.3    d       None        None      None

Example: No merge conditions are specified, teradataml DataFrame indexes are used as merge keys

>>> df5 = df1.merge(right = df2, how = "full", lsuffix = "t1", rsuffix = "t2")
>>> df5
 
    t2_col1 col5    t2_col 6 t1_col1 t2_'col8'  t1_col3       col4  t2_col3 col7       col2    t1_col 6 t1_'col8'
0       2    b  1991-11-01       2         4      2.3  analytics      2.3    b  analytics  1991-11-01         4
1       1    a  1991-10-31       1         3      1.3   teradata      1.3    a   teradata  1991-10-31         3
2       3    c  1991-11-03       3         5      3.3    are you      4.3    d   platform  1991-11-02         5