Use the merge() function to merge two teradataml DataFrame objects together. The merge is done by performing a database-style join operation by columns or indexes.
- inner: Returns only matching rows, non-matching rows are eliminated;
- left: Returns all matching rows, plus non-matching rows from the left teradataml DataFrame;
- right: Returns all matching rows, plus non-matching rows from the right teradataml DataFrame;
- full: Returns all rows from both teradataml DataFrames, including non matching rows.
- When multiple merge conditions are given, they are joined using AND boolean operator. Other boolean operators are not supported.
- Nesting of merge on conditions in column expressions using & and | is not supported.
For example: on = [(df1.a == df1.b) & (df1.c == df1.d)]. You can use [df1.a == df1.b, df1.c == df1.d] instead.
The valid form of on conditions
The on parameter specifies the list of conditions that indicate the columns to be join keys. Supported join operators are =, ==, <, <=, >, >=, <> and != .
= and <> operators are not supported when using DataFrame columns as operands.
The on conditions can take the following forms:
- String comparisons, in the form of col1 <= col2, where col1 is the column of left dataframe df1 and col2 is the column of right dataframe df2. For example:
- ["a","b"] indicates df1.a = df2.a and df1.b = df2.b.
- ["a = b", "c == d"] indicates df1.a = df2.b and df1.c = df2.d.
- ["a <= b", "c > d"] indicates df1.a <= df2.b and df1.c > df2.d.
- ["a < b", "c >= d"] indicates df1.a < df2.b and df1.c >= df2.d.
- ["a <> b"] indicates df1.a != df2.b. Same is the case for ["a != b"].
- Column comparisons, in the form of df1.col1 <= df1.col2, where col1 is the column of left dataframe df1 and col2 is the column of right dataframe df2. For example:
- [df1.a == df2.a, df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b.
- [df1.a == df2.b, df1.c == df2.d] indicates df1.a = df2.b and df1.c = df2.d.
- [df1.a <= df2.b and df1.c > df2.d] indicates df1.a <= df2.b and df1.c > df2.d.
- [df1.a < df2.b and df1.c >= df2.d] indicates df1.a < df2.b and df1.c >= df2.d.
- df1.a != df2.b indicates df1.a != df2.b.
- The combination of both string comparisons and comparisons as column expressions. Some examples are explained below:
- ["a", df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b.
- [df1.a <= df2.b, "c > d"] indicates df1.a <= df2.b and df1.c > df2.d.
Refer to teradataml DataFrame Column for more information about ColumnExpressions in teradataml.
Example Prerequisite
Set up teradataml DataFrames for merge.
>>> from datetime import datetime, timedelta
>>> dob = datetime.strptime('31101991', '%d%m%Y').date()
>>> df1 = pd.DataFrame(data={'col1': [1, 2,3], 'col2': ['teradata','analytics','platform'], 'col3': [1.3, 2.3, 3.3], 'col5': ['a','b','c'], 'col 6': [dob, dob + timedelta(1), dob + timedelta(2)], "'col8'":[3,4,5]}) >>> df2 = pd.DataFrame(data={'col1': [1, 2, 3], 'col4': ['teradata', 'analytics', 'are you'], 'col3': [1.3, 2.3, 4.3], 'col7':['a','b','d'], 'col 6': [dob, dob + timedelta(1), dob + timedelta(3)], "'col8'": [3, 4, 5]})Persist the Pandas DataFrames in Vantage:
>>> copy_to_sql(df1, "t1", primary_index="col1") >>> copy_to_sql(df2, "t2", primary_index="col1")
>>> df1 = DataFrame("table1") >>> df2 = DataFrame("table2")
Display the DataFrames.
>>> df1 'col8' col 6 col2 col3 col5 col1 2 4 1991-11-01 analytics 2.3 b 1 3 1991-10-31 teradata 1.3 a 3 5 1991-11-02 platform 3.3 c >>> df2 'col8' col 6 col3 col4 col7 col1 2 4 1991-11-01 2.3 analytics b 1 3 1991-10-31 1.3 teradata a 3 5 1991-11-03 4.3 are you d
Example 1: Specify merge conditions as string using 'on' argument as well as DataFrame indexes as merge keys
>>> df1.merge(right = df2, how = "left", on = ["col3","col2=col4"], use_index = True, lsuffix = "t1", rsuffix = "t2") t2_col1 col5 t2_col 6 t1_col1 t2_'col8' t1_col3 col4 t2_col3 col7 col2 t1_col 6 t1_'col8' 0 2 b 1991-11-01 2 4 2.3 analytics 2.3 b analytics 1991-11-01 4 1 1 a 1991-10-31 1 3 1.3 teradata 1.3 a teradata 1991-10-31 3 2 None c None 3 None 3.3 None NaN None platform 1991-11-02 5
Example 2: Specify 'on' conditions as ColumnExpression and DataFrame indexes as merge keys
>>> df1.merge(right = df2, how = "left", on = [df1.col1, df1.col3], use_index = True, lsuffix = "t1", rsuffix = "t2") t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 t1_col 6 t2_col 6 t1_'col8' t2_'col8' col4 col7 0 2 2.0 analytics 2.3 2.3 b 1991-06-23 1991-06-23 4 4.0 analytics b 1 1 1.0 teradata 1.3 1.3 a 1991-06-22 1991-06-22 3 3.0 teradata a 2 3 NaN platform 3.3 NaN c 1991-06-24 None 5 NaN None None
Example 3: Specify left_on, right_on conditions along with DataFrame indexes as merge keys
>>> df1.merge(right = df2, how = "right", left_on = "col2", right_on = "col4", use_index = True, lsuffix = "t1", rsuffix = "t2") t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 t1_col 6 t2_col 6 t1_'col8' t2_'col8' col4 col7 0 2 2 analytics 2.3 2.3 b 1991-11-01 1991-11-01 4 4 analytics b 1 1 1 teradata 1.3 1.3 a 1991-10-31 1991-10-31 3 3 teradata a 2 None 3 None NaN 4.3 None None 1991-11-03 None 5 are you d
Example 4: DataFrames to be merged do not contain common columns
If DataFrames to be merged do not contain common columns, lsuffix and rsuffix are not required.
>>> new_df1 = df1.select(['col2', 'col5']) >>> new_df2 = df2.select(['col4', 'col7'])
>>> new_df1 col5 col2 0 b analytics 1 a teradata 2 c platform >>> new_df2 col7 col4 0 b analytics 1 a teradata 2 d are you
>>> new_df1.merge(right = new_df2, how = "inner", on = "col5=col7") col5 col4 col2 col7 0 b analytics analytics b 1 a teradata teradata a
Example 5: No merge conditions are specified
When no merge conditions are specified, teradataml DataFrame indexes are used as merge keys.
>>> df1.merge(right = df2, how = "full", lsuffix = "t1", rsuffix = "t2") t2_col1 col5 t2_col 6 t1_col1 t2_'col8' t1_col3 col4 t2_col3 col7 col2 t1_col 6 t1_'col8' 0 2 b 1991-11-01 2 4 2.3 analytics 2.3 b analytics 1991-11-01 4 1 1 a 1991-10-31 1 3 1.3 teradata 1.3 a teradata 1991-10-31 3 2 3 c 1991-11-03 3 5 3.3 are you 4.3 d platform 1991-11-02 5