Teradata Package for Python Function Reference | 20.00 - merge - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- lifecycle
- latest
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.merge = merge(self, right, on=None, how='inner', left_on=None, right_on=None, use_index=False, lsuffix=None, rsuffix=None)
- DESCRIPTION:
Merges two teradataml DataFrames together.
Supported merge operations are:
- cross: Returns cartesian product between the two dataframes.
- inner: Returns only matching rows, non-matching rows are eliminated.
- left: Returns all matching rows plus non-matching rows from the left teradataml DataFrame.
- right: Returns all matching rows plus non-matching rows from the right teradataml DataFrame.
- full: Returns all rows from both teradataml DataFrames, including non matching rows.
PARAMETERS:
right:
Required Argument.
Specifies right teradataml DataFrame on which merge is to be performed.
Types: teradataml DataFrame
on:
Optional Argument.
Specifies list of conditions that indicate the columns used for the merge.
When no arguments are provided for this condition, the merge is performed using the indexes
of the teradataml DataFrames. Both teradataml DataFrames are required to have index labels to
perform a merge operation when no arguments are provided for this condition.
When either teradataml DataFrame does not have a valid index label in the above case,
an exception is thrown.
• String comparisons, in the form of "col1 <= col2", where col1 is
the column of left DataFrame df1 and col2 is the column of right
DataFrame df2.
Examples:
1. ["a","b"] indicates df1.a = df2.a and df1.b = df2.b.
2. ["a = b", "c = d"] indicates df1.a = df2.b and df1.c = df2.d
3. ["a <= b", "c > d"] indicates df1.a <= df2.b and df1.c > df2.d.
4. ["a < b", "c >= d"] indicates df1.a < df2.b and df1.c >= df2.d.
5. ["a <> b"] indicates df1.a != df2.b. Same is the case for ["a != b"].
• Column comparisons, in the form of df1.col1 <= df2.col2, where col1
is the column of left DataFrame df1 and col2 is the column of right
DataFrame df2.
Examples:
1. [df1.a == df2.a, df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b.
2. [df1.a == df2.b, df1.c == df2.d] indicates df1.a = df2.b and df1.c = df2.d.
3. [df1.a <= df2.b and df1.c > df2.d] indicates df1.a <= df2.b and df1.c > df2.d.
4. [df1.a < df2.b and df1.c >= df2.d] indicates df1.a < df2.b and df1.c >= df2.d.
5. df1.a != df2.b indicates df1.a != df2.b.
• The combination of both string comparisons and comparisons as column expressions.
Examples:
1. ["a", df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b.
2. [df1.a <= df2.b, "c > d"] indicates df1.a <= df2.b and df1.c > df2.d.
Default Value: None
Types: str or ColumnExpression or List of strings(str) or ColumnExpressions
how:
Optional Argument.
Specifies the type of merge to perform. Supports inner, left, right, full and cross merge operations.
When how is "cross", the arguments on, left_on, right_on and use_index are ignored.
Default Value: "inner".
Types: str
left_on:
Optional Argument.
Specifies column to merge on, in the left teradataml DataFrame.
When both the 'on' and 'left_on' parameters are unspecified, the index columns
of the teradataml DataFrames are used to perform the merge operation.
Default Value: None.
Types: str or ColumnExpression or List of strings(str) or ColumnExpressions
right_on:
Optional Argument.
Specifies column to merge on, in the right teradataml DataFrame.
When both the 'on' and 'right_on' parameters are unspecified, the index columns
of the teradataml DataFrames are used to perform the merge operation.
Default Value: None.
Types: str or ColumnExpression or List of strings(str) or ColumnExpressions
use_index:
Optional Argument.
Specifies whether (or not) to use index from the teradataml DataFrames as the merge key(s).
When False, and 'on', 'left_on', and 'right_on' are all unspecified, the index columns
of the teradataml DataFrames are used to perform the merge operation.
Default value: False.
Types: bool
lsuffix:
Optional Argument.
Specifies suffix to be added to the left table columns.
Default Value: None.
Types: str
Note: A suffix is required if teradataml DataFrames being merged have columns
with the same name.
rsuffix:
Optional Argument.
Specifies suffix to be added to the right table columns.
Default Value: None.
Types: str
Note: A suffix is required if teradataml DataFrames being merged have columns
with the same name.
RAISES:
TeradataMlException
RETURNS:
teradataml DataFrame
EXAMPLES:
# Example set-up teradataml DataFrames for merge
>>> from datetime import datetime, timedelta
>>> dob = datetime.strptime('31101991', '%d%m%Y').date()
>>> df1 = pd.DataFrame(data={'col1': [1, 2,3],
'col2': ['teradata','analytics','platform'],
'col3': [1.3, 2.3, 3.3],
'col5': ['a','b','c'],
'col 6': [dob, dob + timedelta(1), dob + timedelta(2)],
"'col8'":[3,4,5]})
>>> df2 = pd.DataFrame(data={'col1': [1, 2, 3],
'col4': ['teradata', 'analytics', 'are you'],
'col3': [1.3, 2.3, 4.3],
'col7':['a','b','d'],
'col 6': [dob, dob + timedelta(1), dob + timedelta(3)],
"'col8'": [3, 4, 5]})
>>> # Persist the Pandas DataFrames in Vantage.
>>> copy_to_sql(df1, "table1", primary_index="col1")
>>> copy_to_sql(df2, "table2", primary_index="col1")
>>> df1 = DataFrame("table1")
>>> df2 = DataFrame("table2")
>>> df1
'col8' col 6 col2 col3 col5
col1
2 4 1991-11-01 analytics 2.3 b
1 3 1991-10-31 teradata 1.3 a
3 5 1991-11-02 platform 3.3 c
>>> df2
'col8' col 6 col3 col4 col7
col1
2 4 1991-11-01 2.3 analytics b
1 3 1991-10-31 1.3 teradata a
3 5 1991-11-03 4.3 are you d
>>> # 1) Specify both types of 'on' conditions and DataFrame indexes as merge keys:
>>> df1.merge(right = df2, how = "left", on = ["col3","col2=col4"], use_index = True, lsuffix = "t1", rsuffix = "t2")
t2_col1 col5 t2_col 6 t1_col1 t2_'col8' t1_col3 col4 t2_col3 col7 col2 t1_col 6 t1_'col8'
0 2 b 1991-11-01 2 4 2.3 analytics 2.3 b analytics 1991-11-01 4
1 1 a 1991-10-31 1 3 1.3 teradata 1.3 a teradata 1991-10-31 3
2 None c None 3 None 3.3 None NaN None platform 1991-11-02 5
>>> # 2) Specify 'on' conditions as ColumnExpression and DataFrame indexes as merge keys:
>>> df1.merge(right = df2, how = "left", on = [df1.col1, df1.col3], use_index = True, lsuffix = "t1", rsuffix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 t1_col 6 t2_col 6 t1_'col8' t2_'col8' col4 col7
0 2 2.0 analytics 2.3 2.3 b 1991-06-23 1991-06-23 4 4.0 analytics b
1 1 1.0 teradata 1.3 1.3 a 1991-06-22 1991-06-22 3 3.0 teradata a
2 3 NaN platform 3.3 NaN c 1991-06-24 None 5 NaN None None
>>> # 3) Specify left_on, right_on conditions along with DataFrame indexes as merge keys:
>>> df1.merge(right = df2, how = "right", left_on = "col2", right_on = "col4", use_index = True, lsuffix = "t1", rsuffix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 t1_col 6 t2_col 6 t1_'col8' t2_'col8' col4 col7
0 2 2 analytics 2.3 2.3 b 1991-11-01 1991-11-01 4 4 analytics b
1 1 1 teradata 1.3 1.3 a 1991-10-31 1991-10-31 3 3 teradata a
2 None 3 None NaN 4.3 None None 1991-11-03 None 5 are you d
>>> # 4) If teradataml DataFrames to be merged do not contain common columns, lsuffix and
# rsuffix are not required:
>>> new_df1 = df1.select(['col2', 'col5'])
>>> new_df2 = df2.select(['col4', 'col7'])
>>> new_df1
col5 col2
0 b analytics
1 a teradata
2 c platform
>>> new_df2
col7 col4
0 b analytics
1 a teradata
2 d are you
>>> new_df1.merge(right = new_df2, how = "inner", on = "col5=col7")
col5 col4 col2 col7
0 b analytics analytics b
1 a teradata teradata a
>>> # 5) When no merge conditions are specified, teradataml DataFrame
# indexes are used as merge keys.
>>> df1.merge(right = df2, how = "full", lsuffix = "t1", rsuffix = "t2")
t2_col1 col5 t2_col 6 t1_col1 t2_'col8' t1_col3 col4 t2_col3 col7 col2 t1_col 6 t1_'col8'
0 2 b 1991-11-01 2 4 2.3 analytics 2.3 b analytics 1991-11-01 4
1 1 a 1991-10-31 1 3 1.3 teradata 1.3 a teradata 1991-10-31 3
2 3 c 1991-11-03 3 5 3.3 are you 4.3 d platform 1991-11-02 5