Use the join() function to join two teradataml DataFrame objects together. The operation is performed by carrying out a database-style join using teradataml specified as keys.
Required Arguments:
- other: Specifies right teradataml DataFrame on which join is to be performed.
Optional Arguments:
- how: Specifies the type of the join performed.Permitted values are the supported join operations:
- 'left': Returns all matching rows, plus non-matching rows from the left teradataml DataFrame;
This is the default value for the argument.
- 'inner': Returns only matching rows, non-matching rows are eliminated;
- 'right': Returns all matching rows, plus non-matching rows from the right teradataml DataFrame;
- 'full': Returns all rows from both teradataml DataFrames, including non matching rows;
- 'cross': Returns all rows from both tables where each row from the first table is joined with each row from the second table. The result of the join is a Cartesian cross product.
Default value is 'left'.
- 'left': Returns all matching rows, plus non-matching rows from the left teradataml DataFrame;
- on: Specifies list of conditions that indicate the columns to be join keys. Supported join operators are =, ==, <, <=, >, >=, <> and !=.= and <> operators are not supported when using DataFrame columns as operands.This argument is optional when how is set to 'cross'; otherwise, it is required.
If it is specified when how is set to 'cross', it is ignored.
See the following section for valid forms of on conditions.
- lsuffix and rsuffix: Specify the suffix to be added to the left and right table columns, respectively.
- lprefix and rprefix: Specify the prefix to be added to the left and right table columns, respectively.
Valid forms of on conditions
The on conditions can take the following forms:
- String comparisons, in the form of "col1 <= col2", where col1 is the column of left dataframe df1 and col2 is the column of right dataframe df2. For example:
- ["a","b"] indicates df1.a = df2.a and df1.b = df2.b
- ["a = b", "c == d"] indicates df1.a = df2.b and df1.c = df2.d
- ["a <= b", "c > d"] indicates df1.a <= df2.b and df1.c > df2.d
- ["a < b", "c >= d"] indicates df1.a < df2.b and df1.c >= df2.d
- ["a <> b"] indicates df1.a != df2.b. Same is the case for ["a != b"]
- Column comparisons, in the form of df1.col1 <= df1.col2, where col1 is the column of left dataframe df1 and col2 is the column of right dataframe df2. For example:
- [df1.a == df2.a, df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b
- [df1.a == df2.b, df1.c == df2.d] indicates df1.a = df2.b and df1.c = df2.d
- [df1.a <= df2.b and df1.c > df2.d] indicates df1.a <= df2.b and df1.c > df2.d
- [df1.a < df2.b and df1.c >= df2.d] indicates df1.a < df2.b and df1.c >= df2.d
- df1.a != df2.b indicates df1.a != df2.b
- The combination of both string comparisons and comparisons as column expressions. For example:
- ["a", df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b
- [df1.a <= df2.b, "c > d"] indicates df1.a <= df2.b and df1.c > df2.d
- When multiple join conditions are given, they are joined using AND boolean operator. Other boolean operators are not supported.
- Nesting of join on conditions in column expressions using & and | is not supported. For example, on = [(df1.a == df1.b) & (df1.c == df1.d)] is unsupported nested join on condition. You can use [df1.a == df1.b, df1.c == df1.d] instead.
- For a cross join operation, the 'on' argument is ignored.
Example Prerequisite
Set up teradataml DataFrames for join operations.
>>> from datetime import datetime, timedelta
>>> dob = datetime.strptime('31101991', '%d%m%Y').date()
>>> pdf1 = pd.DataFrame(data={'col1': [1, 2,3], 'col2': ['teradata','analytics','platform'], 'col3': [1.3, 2.3, 3.3], 'col5': ['a','b','c'], 'col 6': [dob, dob + timedelta(1), dob + timedelta(2)], "'col8'":[3,4,5]})
>>> pdf2 = pd.DataFrame(data={'col1': [1, 2, 3], 'col4': ['teradata', 'analytics', 'are you'], 'col3': [1.3, 2.3, 4.3], 'col7':['a','b','d'], 'col 6': [dob, dob + timedelta(1), dob + timedelta(3)], "'col8'": [3, 4, 5]})
>>> copy_to_sql(pdf1, "t1", primary_index="col1")
>>> copy_to_sql(pdf2, "t2", primary_index="col1")
>>> df1 = DataFrame("t1")
>>> df2 = DataFrame("t2")
Display the DataFrames.
>>> df1 'col8' col 6 col2 col3 col5 col1 2 4 1991-11-01 analytics 2.3 b 1 3 1991-10-31 teradata 1.3 a 3 5 1991-11-02 platform 3.3 c
>>> df2 'col8' col 6 col3 col4 col7 col1 2 4 1991-11-01 2.3 analytics b 1 3 1991-10-31 1.3 teradata a 3 5 1991-11-03 4.3 are you d
Example 1: Use 'on' and 'how' parameters to join the DataFrames
Specify a join condition using the 'on' parameter and the type of join using the 'how' parameter.
If teradataml DataFrames have columns with the same names, prefixes are required.
# Using string expressions for on conditions. >>> df3 = df1.join(other = df2, on = ["col3","col5=col7"], how = "left", lprefix = "t1", rprefix = "t2") >>> df3 t2_'col8' t1_col1 col5 t2_col 6 col2 t1_'col8' col7 t1_col3 t2_col3 t1_col 6 t2_col1 col4 0 None 3 c None platform 5 None 3.3 NaN 1991-11-02 None None 1 4 2 b 1991-11-01 analytics 4 b 2.3 2.3 1991-11-01 2 analytics 2 3 1 a 1991-10-31 teradata 3 a 1.3 1.3 1991-10-31 1 teradata
# Using dataframe column expressions instead of string expressions for on conditions. >>> df3 = df1.join(other = df2, on = [df1.col3 == df2.col3, df1.col5 == df2.col7], how = "left", lprefix = "t1", rprefix = "t2") >>> df3 t2_'col8' t1_col1 col5 t2_col 6 col2 t1_'col8' col7 t1_col3 t2_col3 t1_col 6 t2_col1 col4 0 None 3 c None platform 5 None 3.3 NaN 1991-11-02 None None 1 4 2 b 1991-11-01 analytics 4 b 2.3 2.3 1991-11-01 2 analytics 2 3 1 a 1991-10-31 teradata 3 a 1.3 1.3 1991-10-31 1 teradata
Example 2: Specify a required type of join using the 'how' parameter
# Using string expressions for on conditions. >>> df3 = df1.join(other = df2, on = ["col3","'col8'"], how = "right", lprefix = "t1", rprefix = "t2") >>> df3 t2_'col8' t1_col1 col5 t2_col 6 col2 t1_'col8' col7 t1_col3 t2_col3 t1_col 6 t2_col1 col4 0 5 None None 1991-11-03 None None d NaN 4.3 None 3 are you 1 4 2 b 1991-11-01 analytics 4 b 2.3 2.3 1991-11-01 2 analytics 2 3 1 a 1991-10-31 teradata 3 a 1.3 1.3 1991-10-31 1 teradata
# Using dataframe column expressions instead of string expressions for some of on conditions. >>> df3 = df1.join(other = df2, on = [df1.col3 == df2.col3,"'col8'"], how = "right", lprefix = "t1", rprefix = "t2") >>> df3 t2_'col8' t1_col1 col5 t2_col 6 col2 t1_'col8' col7 t1_col3 t2_col3 t1_col 6 t2_col1 col4 0 5 None None 1991-11-03 None None d NaN 4.3 None 3 are you 1 4 2 b 1991-11-01 analytics 4 b 2.3 2.3 1991-11-01 2 analytics 2 3 1 a 1991-10-31 teradata 3 a 1.3 1.3 1991-10-31 1 teradata
Example 3: Use the default value for 'how' parameter
When the 'how' parameter is not provided, a left join is used by default.
>>> df3 = df1.join(other = df2, on = df1.col3 == df2.col3, lprefix = "t1", rprefix = "t2")
>>> df3 t2_'col8' t1_col1 col5 t2_col 6 col2 t1_'col8' col7 t1_col3 t2_col3 t1_col 6 t2_col1 col4 0 None 3 c None platform 5 None 3.3 NaN 1991-11-02 None None 1 4 2 b 1991-11-01 analytics 4 b 2.3 2.3 1991-11-01 2 analytics 2 3 1 a 1991-10-31 teradata 3 a 1.3 1.3 1991-10-31 1 teradata
Example 4: How cross join works
This example cross join "admissions_train" with "admissions_train".
>>> df1 = DataFrame("admissions_train").head(3).sort("id") >>> print(df1) masters gpa stats programming admitted id 1 yes 3.95 Beginner Beginner 0 2 yes 3.76 Beginner Beginner 0 3 no 3.70 Novice Beginner 1
>>> df2 = DataFrame("admissions_train").head(3).sort("id") >>> print(df2) masters gpa stats programming admitted id 1 yes 3.95 Beginner Beginner 0 2 yes 3.76 Beginner Beginner 0 3 no 3.70 Novice Beginner 1
>>> df3 = df1.join(other=df2, how="cross", lprefix="l", rprefix="r") >>> df3.set_index("l_id").sort("l_id") r_id l_masters r_masters l_gpa r_gpa l_stats r_stats l_programming r_programming l_admitted r_admitted l_id 1 3 yes no 3.95 3.70 Beginner Novice Beginner Beginner 0 1 1 1 yes yes 3.95 3.95 Beginner Beginner Beginner Beginner 0 0 1 2 yes yes 3.95 3.76 Beginner Beginner Beginner Beginner 0 0 2 1 yes yes 3.76 3.95 Beginner Beginner Beginner Beginner 0 0 2 2 yes yes 3.76 3.76 Beginner Beginner Beginner Beginner 0 0 2 3 yes no 3.76 3.70 Beginner Novice Beginner Beginner 0 1 3 1 no yes 3.70 3.95 Novice Beginner Beginner Beginner 1 0 3 2 no yes 3.70 3.76 Novice Beginner Beginner Beginner 1 0 3 3 no no 3.70 3.70 Novice Novice Beginner Beginner 1 1
See merge() Method for examples using suffix arguments lsuffix and rsuffix.