join() Method | Teradata Package for Python - join() Method - Teradata Package for Python

Teradata® Package for Python User Guide

Deployment
VantageCloud
VantageCore
Edition
VMware
Enterprise
IntelliFlex
Product
Teradata Package for Python
Release Number
20.00
Published
March 2025
ft:locale
en-US
ft:lastEdition
2026-02-20
dita:mapPath
nvi1706202040305.ditamap
dita:ditavalPath
plt1683835213376.ditaval
dita:id
rkb1531260709148
Product Category
Teradata Vantage

Use the join() function to join two teradataml DataFrame objects together. The operation is performed by carrying out a database-style join using teradataml specified as keys.

This method does not support operations on array columns.

Required Parameter

other
Specifies right teradataml DataFrame on which join is to be performed.

Optional Parameters

how
Specifies the type of the join performed.
Permitted values are the supported join operations:
  • 'left': Returns all matching rows, plus non-matching rows from the left teradataml DataFrame;

    This is the default value for the argument.

  • 'inner': Returns only matching rows, non-matching rows are eliminated;
  • 'right': Returns all matching rows, plus non-matching rows from the right teradataml DataFrame;
  • 'full': Returns all rows from both teradataml DataFrames, including non matching rows;
  • 'cross': Returns all rows from both tables where each row from the first table is joined with each row from the second table. The result of the join is a Cartesian cross product.

Default value is 'left'.

on
Specifies list of conditions that indicate the columns to be join keys. Supported join operators are =, ==, <, <=, >, >=, <> and !=.
  • = and <> operators are not supported when using DataFrame columns as operands.
  • This argument is optional when how is set to 'cross'; otherwise, it is required.

    If it is specified when how is set to 'cross', it is ignored.

See the following section for valid forms of on conditions.

lsuffix
Specifies the suffix to be added to the left table column.
rsuffix
Specifies the suffix to be added to the right table column.
lprefix
Specifies the prefix to be added to the left and table column.
rprefix
Specifies the prefix to be added to the right table column.

Valid forms of on conditions

The on conditions can take the following forms:
  • String comparisons, in the form of "col1 <= col2", where col1 is the column of left dataframe df1 and col2 is the column of right dataframe df2. For example:
    • ["a","b"] indicates df1.a = df2.a and df1.b = df2.b
    • ["a = b", "c == d"] indicates df1.a = df2.b and df1.c = df2.d
    • ["a <= b", "c > d"] indicates df1.a <= df2.b and df1.c > df2.d
    • ["a < b", "c >= d"] indicates df1.a < df2.b and df1.c >= df2.d
    • ["a <> b"] indicates df1.a != df2.b. Same is the case for ["a != b"]
  • Column comparisons, in the form of df1.col1 <= df1.col2, where col1 is the column of left dataframe df1 and col2 is the column of right dataframe df2. For example:
    • [df1.a == df2.a, df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b
    • [df1.a == df2.b, df1.c == df2.d] indicates df1.a = df2.b and df1.c = df2.d
    • [df1.a <= df2.b and df1.c > df2.d] indicates df1.a <= df2.b and df1.c > df2.d
    • [df1.a < df2.b and df1.c >= df2.d] indicates df1.a < df2.b and df1.c >= df2.d
    • df1.a != df2.b indicates df1.a != df2.b
  • The combination of both string comparisons and comparisons as column expressions. For example:
    • ["a", df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b
    • [df1.a <= df2.b, "c > d"] indicates df1.a <= df2.b and df1.c > df2.d
  • ColumnExpressions containing FunctionExpressions which represent SQL functions invoked on DataFrame Columns. For example:
    • (df1.a.round(1) - df2.a.round(1)).mod(2.5) > 2
    • df1.a.floor() - df2.b.floor() > 2
  • When multiple join conditions are given, they are joined using AND boolean operator. Other boolean operators are not supported.
  • Nesting of join on conditions in column expressions using & and | is not supported. For example, on = [(df1.a == df1.b) & (df1.c == df1.d)] is unsupported nested join on condition. You can use [df1.a == df1.b, df1.c == df1.d] instead.
  • For a cross join operation, the 'on' argument is ignored.

Example Prerequisite

Set up teradataml DataFrames for join operations.

>>> from datetime import datetime, timedelta
>>> dob = datetime.strptime('31101991', '%d%m%Y').date()
>>> pdf1 = pd.DataFrame(data={'col1': [1, 2,3],
                           'col2': ['teradata','analytics','platform'],
                           'col3': [1.3, 2.3, 3.3],
                           'col5': ['a','b','c'],
                           'col 6': [dob, dob + timedelta(1), dob + timedelta(2)],
                           "'col8'":[3,4,5]})
>>> pdf2 = pd.DataFrame(data={'col1': [1, 2, 3],
                                'col4': ['teradata', 'analytics', 'are you'],
                                'col3': [1.3, 2.3, 4.3],
                                'col7':['a','b','d'],
                                'col 6': [dob, dob + timedelta(1), dob + timedelta(3)],
                                "'col8'": [3, 4, 5]})
>>> copy_to_sql(pdf1, "t1", primary_index="col1")
>>> copy_to_sql(pdf2, "t2", primary_index="col1")
>>> df1 = DataFrame("t1")
>>> df2 = DataFrame("t2")
Display the DataFrames.
>>> df1
                 'col8'       col 6       col2  col3 col5
            col1                                        
            2         4  1991-11-01  analytics   2.3    b
            1         3  1991-10-31   teradata   1.3    a
            3         5  1991-11-02   platform   3.3    c
>>> df2
                 'col8'       col 6  col3       col4 col7
            col1                                        
            2         4  1991-11-01   2.3  analytics    b
            1         3  1991-10-31   1.3   teradata    a
            3         5  1991-11-03   4.3    are you    d

Example 1: Use 'on' and 'how' parameters to join the DataFrames

Specify a join condition using the 'on' parameter and the type of join using the 'how' parameter.

If teradataml DataFrames have columns with the same names, prefixes are required.
# Using string expressions for on conditions.
>>> df3 = df1.join(other = df2, on = ["col3","col5=col7"], how = "left", lprefix = "t1", rprefix = "t2")
>>> df3
 
   t2_'col8' t1_col1 col5    t2_col 6       col2 t1_'col8'  col7  t1_col3  t2_col3    t1_col 6 t2_col1      col4
0      None       3    c        None   platform         5  None      3.3      NaN  1991-11-02    None       None
1         4       2    b  1991-11-01  analytics         4     b      2.3      2.3  1991-11-01       2  analytics
2         3       1    a  1991-10-31   teradata         3     a      1.3      1.3  1991-10-31       1   teradata
# Using dataframe column expressions instead of string expressions for on conditions.
>>> df3 = df1.join(other = df2, on = [df1.col3 == df2.col3, df1.col5 == df2.col7], how = "left", lprefix = "t1", rprefix = "t2")
>>> df3
 
   t2_'col8' t1_col1 col5    t2_col 6       col2 t1_'col8'  col7  t1_col3  t2_col3    t1_col 6 t2_col1      col4
0      None       3    c        None   platform         5  None      3.3      NaN  1991-11-02    None       None
1         4       2    b  1991-11-01  analytics         4     b      2.3      2.3  1991-11-01       2  analytics
2         3       1    a  1991-10-31   teradata         3     a      1.3      1.3  1991-10-31       1   teradata

Example 2: Specify a required type of join using the 'how' parameter

# Using string expressions for on conditions.
>>> df3 = df1.join(other = df2, on = ["col3","'col8'"], how = "right", lprefix = "t1", rprefix = "t2")
>>> df3
 
  t2_'col8' t1_col1  col5    t2_col 6       col2 t1_'col8' col7  t1_col3  t2_col3    t1_col 6 t2_col1       col4
0         5    None  None  1991-11-03       None      None    d      NaN      4.3        None       3    are you
1         4       2     b  1991-11-01  analytics         4    b      2.3      2.3  1991-11-01       2  analytics
2         3       1     a  1991-10-31   teradata         3    a      1.3      1.3  1991-10-31       1   teradata
# Using dataframe column expressions instead of string expressions for some of on conditions.
>>> df3 = df1.join(other = df2, on = [df1.col3 == df2.col3,"'col8'"], how = "right", lprefix = "t1", rprefix = "t2")
>>> df3
 
  t2_'col8' t1_col1  col5    t2_col 6       col2 t1_'col8' col7  t1_col3  t2_col3    t1_col 6 t2_col1       col4
0         5    None  None  1991-11-03       None      None    d      NaN      4.3        None       3    are you
1         4       2     b  1991-11-01  analytics         4    b      2.3      2.3  1991-11-01       2  analytics
2         3       1     a  1991-10-31   teradata         3    a      1.3      1.3  1991-10-31       1   teradata

Example 3: Use the default value for 'how' parameter

When the 'how' parameter is not provided, a left join is used by default.

>>> df3 = df1.join(other = df2, on = df1.col3 == df2.col3, lprefix = "t1", rprefix = "t2")
>>> df3
 
  t2_'col8' t1_col1 col5    t2_col 6       col2 t1_'col8'  col7  t1_col3  t2_col3    t1_col 6 t2_col1       col4
0      None       3    c        None   platform         5  None      3.3      NaN  1991-11-02    None       None
1         4       2    b  1991-11-01  analytics         4     b      2.3      2.3  1991-11-01       2  analytics
2         3       1    a  1991-10-31   teradata         3     a      1.3      1.3  1991-10-31       1   teradata

Example 4: How cross join works

This example cross join "admissions_train" with "admissions_train".

>>> df1 = DataFrame("admissions_train").head(3).sort("id")
>>> print(df1)
   masters   gpa     stats programming admitted
id                                            
1      yes  3.95  Beginner    Beginner        0
2      yes  3.76  Beginner    Beginner        0
3       no  3.70    Novice    Beginner        1
>>> df2 = DataFrame("admissions_train").head(3).sort("id")
>>> print(df2)
   masters   gpa     stats programming admitted
id                                            
1      yes  3.95  Beginner    Beginner        0
2      yes  3.76  Beginner    Beginner        0
3       no  3.70    Novice    Beginner        1
>>> df3 = df1.join(other=df2, how="cross", lprefix="l", rprefix="r")
>>> df3.set_index("l_id").sort("l_id")
     r_id l_masters r_masters  l_gpa  r_gpa   l_stats   r_stats l_programming r_programming l_admitted r_admitted
l_id                                                                                                            
1       3       yes        no   3.95   3.70  Beginner    Novice      Beginner      Beginner          0          1
1       1       yes       yes   3.95   3.95  Beginner  Beginner      Beginner      Beginner          0          0
1       2       yes       yes   3.95   3.76  Beginner  Beginner      Beginner      Beginner          0          0
2       1       yes       yes   3.76   3.95  Beginner  Beginner      Beginner      Beginner          0          0
2       2       yes       yes   3.76   3.76  Beginner  Beginner      Beginner      Beginner          0          0
2       3       yes        no   3.76   3.70  Beginner    Novice      Beginner      Beginner          0          1
3       1        no       yes   3.70   3.95    Novice  Beginner      Beginner      Beginner          1          0
3       2        no       yes   3.70   3.76    Novice  Beginner      Beginner      Beginner          1          0
3       3        no        no   3.70   3.70    Novice    Novice      Beginner      Beginner          1          1

See merge() Method for examples using suffix arguments lsuffix and rsuffix.

Example 5: Using not equal operation in ColumnExpression condition

>>> df1.join(other = df2, on = ["col5==col7",df1.col2 != df2.col4], how = "full", lprefix = "t1", rprefix = "t2")
    t1_col1     t2_col1       col2    t1_col3    t2_col3    col5       col4     col7
0         1        None   teradata        1.3       None       a       None     None
1         2        None  analytics        2.3       None       b       None     None
2      None           2       None       None        2.3    None   analytics       b
3      None           1       None       None        1.3    None    teradata       a
4         3        None   platform        3.3        None      c        None    None
5      None           3       None       None         4.3   None     are you       d

Example 6: Using only one string expression with < > operation

>>> df1.join(other = df2, on = "col2<>col4", how = "left", lprefix = "t1", rprefix = "t2")
     t1_col1    t2_col1          col2    t1_col3     t2_col3    col5         col4      col7
0          2          3     analytics        2.3         4.3       b      are you         d
1          2          1     analytics        2.3         1.3       b     teradata         a
2          3          2      platform        3.3         2.3       c    analytics         b
3          1          2      teradata        1.3         2.3       a    analytics         b
4          3          1      platform        3.3         1.3       c     teradata         a
5          1          3      teradata        1.3         4.3       a      are you         d
6          3          3      platform        3.3         4.3       c      are you         d

Example 7: Using only one ColumnExpression in "on" argument conditions

>>> df1.join(other = df2, on = df1.col5 != df2.col7, how = "full", lprefix = "t1", rprefix = "t2")
    t1_col1   t2_col1        col2    t1_col3   t2_col3    col5          col4    col7
0         1         3    teradata        1.3       4.3       a       are you       d
1         3         1    platform        3.3       1.3       c      teradata       a
2         1         2    teradata        1.3       2.3       a     analytics       b
3         3         2    platform        3.3       2.3       c     analytics       b
4         2         1   analytics        2.3       1.3       b      teradata       a
5         3         3    platform        3.3       4.3       c       are you       d
6         2         3   analytics        2.3       4.3       b       are you       d

Example 8: cross join "admissions_train" with "admissions_train"

>>> df1 = DataFrame("admissions_train").head(3).sort("id")
>>> df2 = DataFrame("admissions_train").head(3).sort("id")
>>> df3 = df1.join(other=df2, how="cross", lprefix="l", rprefix="r")
>>> df3.set_index("l_id").sort("l_id")
      l_programming     r_stats    r_id     r_gpa    r_programming   l_gpa     l_stats   r_admitted   l_admitted   l_masters   r_masters
l_id
1           Beginner      Novice       3      3.70         Beginner    3.95    Beginner            1            0         yes          no
1           Beginner    Beginner       1      3.95         Beginner    3.95    Beginner            0            0         yes         yes
1           Beginner    Beginner       2      3.76         Beginner    3.95    Beginner            0            0         yes         yes
2           Beginner    Beginner       1      3.95         Beginner    3.76    Beginner            0            0         yes         yes
2           Beginner    Beginner       2      3.76         Beginner    3.76    Beginner            0            0         yes         yes
2           Beginner      Novice       3      3.70         Beginner    3.76    Beginner            1            0         yes          no
3           Beginner    Beginner       1      3.95         Beginner    3.70      Novice            0            1          no         yes
3           Beginner    Beginner       2      3.76         Beginner    3.70      Novice            0            1          no         yes
3           Beginner      Novice       3      3.70         Beginner    3.70      Novice            1            1          no          no

Example 9: Perform self join using aliased DataFrame

# Create an aliased DataFrame.
>>> lhs = DataFrame("admissions_train").head(3).sort("id")
>>> rhs = lhs.alias("rhs")
# Use aliased DataFrame for self join.
>>> joined_df = lhs.join(other=rhs, how="cross", lprefix="l", rprefix="r")
>>> joined_df
     l_id   r_id    l_masters   r_masters     l_gpa   r_gpa     l_stats      r_stats  l_programming    r_programming   l_admitted r_admitted
0       1      3          yes          no      3.95    3.70    Beginner       Novice       Beginner         Beginner            0          1
1       2      2          yes         yes      3.76    3.76    Beginner     Beginner       Beginner         Beginner            0          0
2       2      3          yes          no      3.76    3.70    Beginner       Novice       Beginner         Beginner            0          1
3       3      1           no         yes      3.70    3.95      Novice     Beginner       Beginner         Beginner            1          0
4       3      3           no          no      3.70    3.70      Novice       Novice       Beginner         Beginner            1          1
5       3      2           no         yes      3.70    3.76      Novice     Beginner       Beginner         Beginner            1          0
6       2      1          yes         yes      3.76    3.95    Beginner     Beginner       Beginner         Beginner            0          0
7       1      2          yes         yes      3.95    3.76    Beginner     Beginner       Beginner         Beginner            0          0
8       1      1          yes         yes      3.95    3.95    Beginner     Beginner       Beginner         Beginner            0          0

Example 10: Perform join with compound 'on' condition having more than one binary operator

>>> rhs_2 = lhs.assign(double_gpa=lhs.gpa * 2)
>>> joined_df_2 = lhs.join(rhs_2, on=rhs_2.double_gpa == lhs.gpa * 2, how="left", lprefix="l", rprefix="r")
>>> joined_df_2
    l_id    r_id  l_masters   r_masters    l_gpa    r_gpa      l_stats     r_stats   l_programming    r_programming   l_admitted  r_admitted   double_gpa
0      3       3         no          no     3.70     3.70       Novice      Novice        Beginner         Beginner            1           1         7.40
1      2       2        yes         yes     3.76     3.76     Beginner    Beginner        Beginner         Beginner            0           0         7.52
2      1       1        yes         yes     3.95     3.95     Beginner    Beginner        Beginner         Beginner            0           0         7.90

Example 11: Perform join on DataFrame with 'on' condition having FunctionExpression

>>> df = DataFrame("admissions_train")
>>> df2 = df.alias("rhs_df")
>>> joined_df_3 = df.join(df2, on=(df.gpa.round(1) - df2.gpa.round(1)).mod(2.5) > 2, how="inner", lprefix="l")
>>> joined_df_3.sort(["id", "l_id"])
   l_id id l_masters masters   l_gpa     gpa     l_stats        stats    l_programming    programming    l_admitted   admitted
0     1 24       yes     no     3.95    1.87    Beginner     Advanced         Beginner         Novice             0          1
1    13 24        no     no      4.0    1.87    Advanced     Advanced           Novice         Novice             1          1
2    15 24       yes     no      4.0    1.87    Advanced     Advanced         Advanced         Novice             1          1
3    25 24        no     no     3.96    1.87    Advanced     Advanced         Advanced         Novice             1          1
4    27 24       yes     no     3.96    1.87    Advanced     Advanced         Advanced         Novice             0          1
5    29 24       yes     no      4.0    1.87      Novice     Advanced         Beginner         Novice             0          1
6    40 24       yes     no     3.95    1.87      Novice     Advanced         Beginner         Novice             0          1