Teradata Package for Python Function Reference on VantageCloud Lake - join - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference on VantageCloud Lake
- Deployment
- VantageCloud
- Edition
- Lake
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Lake_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.join = join(self, other, on=None, how='left', lsuffix=None, rsuffix=None, lprefix=None, rprefix=None)
- DESCRIPTION:
Joins two different teradataml DataFrames together based on column comparisons
specified in argument 'on' and type of join is specified in the argument 'how'.
Supported join operations are:
• Inner join: Returns only matching rows, non matching rows are eliminated.
• Left outer join: Returns all matching rows plus non matching rows from the left table.
• Right outer join: Returns all matching rows plus non matching rows from the right table.
• Full outer join: Returns all rows from both tables, including non matching rows.
• Cross join: Returns all rows from both tables where each row from the first table
is joined with each row from the second table. The result of the join
is a cartesian cross product.
Note: For a cross join, the 'on' argument is ignored.
Supported join operators are =, ==, <, <=, >, >=, <> and != (= and <> operators are
not supported when using DataFrame columns as operands).
Notes:
1. When multiple join conditions are given as a list string/ColumnExpression,
they are joined using AND operator.
2. Two or more on conditions can be combined using & and | operators
and can be passed as single ColumnExpression.
You can use (df1.a == df1.b) & (df1.c == df1.d) in place of
[df1.a == df1.b, df1.c == df1.d].
3. Two or more on conditions can not be combined using pythonic 'and'
and 'or'.
You can use (df1.a == df1.b) & (df1.c == df1.d) in place of
[df1.a == df1.b and df1.c == df1.d].
4. Performing self join using same DataFrame object in 'other'
argument is not supported. In order to perform self join,
first create aliased DataFrame using alias() API and pass it
for 'other' argument. Refer to Example 10 in EXAMPLES section.
PARAMETERS:
other:
Required Argument.
Specifies the right teradataml DataFrame on which join is to be performed.
Types: teradataml DataFrame
on:
Optional argument when "how" is "cross", otherwise required.
If specified when "how" is "cross", it is ignored.
Specifies list of conditions that indicate the columns to be join keys.
It can take the following forms:
• String comparisons, in the form of "col1 <= col2", where col1 is
the column of left dataframe df1 and col2 is the column of right
dataframe df2.
Examples:
1. ["a","b"] indicates df1.a = df2.a and df1.b = df2.b.
2. ["a = b", "c == d"] indicates df1.a = df2.b and df1.c = df2.d.
3. ["a <= b", "c > d"] indicates df1.a <= df2.b and df1.c > df2.d.
4. ["a < b", "c >= d"] indicates df1.a < df2.b and df1.c >= df2.d.
5. ["a <> b"] indicates df1.a != df2.b. Same is the case for ["a != b"].
• Column comparisons, in the form of df1.col1 <= df2.col2, where col1
is the column of left dataframe df1 and col2 is the column of right
dataframe df2.
Examples:
1. [df1.a == df2.a, df1.b == df2.b] indicates df1.a = df2.a AND df1.b = df2.b.
2. [df1.a == df2.b, df1.c == df2.d] indicates df1.a = df2.b AND df1.c = df2.d.
3. [df1.a <= df2.b & df1.c > df2.d] indicates df1.a <= df2.b AND df1.c > df2.d.
4. [df1.a < df2.b | df1.c >= df2.d] indicates df1.a < df2.b OR df1.c >= df2.d.
5. df1.a != df2.b indicates df1.a != df2.b.
• The combination of both string comparisons and comparisons as column expressions.
Examples:
1. ["a", df1.b == df2.b] indicates df1.a = df2.a AND df1.b = df2.b.
2. [df1.a <= df2.b, "c > d"] indicates df1.a <= df2.b AND df1.c > df2.d.
• ColumnExpressions containing FunctionExpressions which represent SQL functions
invoked on DataFrame Columns.
Examples:
1. (df1.a.round(1) - df2.a.round(1)).mod(2.5) > 2
2. df1.a.floor() - df2.b.floor() > 2
Types: str (or) ColumnExpression (or) List of strings(str) or ColumnExpressions
how:
Optional Argument.
Specifies the type of join to perform.
Default value is "left".
Permitted Values : "inner", "left", "right", "full" and "cross"
Types: str
lsuffix:
Optional Argument.
Specifies the suffix to be added to the left table columns.
Default Value: None.
Types: str
rsuffix:
Optional Argument.
Specifies the suffix to be added to the right table columns.
Default Value: None.
Types: str
lprefix:
Optional Argument.
Specifies the prefix to be added to the left table columns.
Default Value: None.
Types: str
rprefix:
Optional Argument.
Specifies the prefix to be added to the right table columns.
Default Value: None.
Types: str
RETURNS:
teradataml DataFrame
RAISES:
TeradataMlException
EXAMPLES:
# Load the data to run the example.
>>> from teradataml import load_example_data
>>> load_example_data("dataframe", ["join_table1", "join_table2"])
>>> load_example_data("glm", "admissions_train") # used in cross join
>>> df1 = DataFrame("join_table1")
>>> df2 = DataFrame("join_table2")
# Print dataframe.
>>> df1
col2 col3 col5
col1
2 analytics 2.3 b
1 teradata 1.3 a
3 platform 3.3 c
# Print dataframe.
>>> df2
col4 col3 col7
col1
2 analytics 2.3 b
1 teradata 1.3 a
3 are you 4.3 d
# Example 1: Both "on" argument conditions as strings.
>>> df1.join(other = df2, on = ["col2=col4", "col1"], how = "inner", lprefix = "t1", rprefix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 2 2 analytics 2.3 2.3 b analytics b
1 1 1 teradata 1.3 1.3 a teradata a
# Example 2: One "on" argument condition is ColumnExpression and other is string having two
# columns with left outer join.
>>> df1.join(df2, on = [df1.col2 == df2.col4,"col5 = col7"], how = "left", lprefix = "t1", rprefix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 3 None platform 3.3 None c None None
1 2 2 analytics 2.3 2.3 b analytics b
2 1 1 teradata 1.3 1.3 a teradata a
# Example 3: One "on" argument condition is ColumnExpression and other is string having only one column.
>>> df1.join(other = df2, on = [df1.col2 == df2.col4,"col3"], how = "inner", lprefix = "t1", rprefix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 2 2 analytics 2.3 2.3 b analytics b
1 1 1 teradata 1.3 1.3 a teradata a
# Example 4: One "on" argument condition is ColumnExpression and other is string having two
# columns with full join.
>>> df1.join(other = df2, on = ["col2=col4",df1.col5 == df2.col7], how = "full", lprefix = "t1", rprefix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 3 None platform 3.3 None c None None
1 None 3 None None 4.3 None are you d
2 1 1 teradata 1.3 1.3 a teradata a
3 2 2 analytics 2.3 2.3 b analytics b
# Example 5: Using not equal operation in ColumnExpression condition.
>>> df1.join(other = df2, on = ["col5==col7",df1.col2 != df2.col4], how = "full", lprefix = "t1", rprefix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 1 None teradata 1.3 None a None None
1 2 None analytics 2.3 None b None None
2 None 2 None None 2.3 None analytics b
3 None 1 None None 1.3 None teradata a
4 3 None platform 3.3 None c None None
5 None 3 None None 4.3 None are you d
# Example 6: Using only one string expression with <> operation.
>>> df1.join(other = df2, on = "col2<>col4", how = "left", lprefix = "t1", rprefix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 2 3 analytics 2.3 4.3 b are you d
1 2 1 analytics 2.3 1.3 b teradata a
2 3 2 platform 3.3 2.3 c analytics b
3 1 2 teradata 1.3 2.3 a analytics b
4 3 1 platform 3.3 1.3 c teradata a
5 1 3 teradata 1.3 4.3 a are you d
6 3 3 platform 3.3 4.3 c are you d
# Example 7: Using only one ColumnExpression in "on" argument conditions.
>>> df1.join(other = df2, on = df1.col5 != df2.col7, how = "full", lprefix = "t1", rprefix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 1 3 teradata 1.3 4.3 a are you d
1 3 1 platform 3.3 1.3 c teradata a
2 1 2 teradata 1.3 2.3 a analytics b
3 3 2 platform 3.3 2.3 c analytics b
4 2 1 analytics 2.3 1.3 b teradata a
5 3 3 platform 3.3 4.3 c are you d
6 2 3 analytics 2.3 4.3 b are you d
# Example 8: Both "on" argument conditions as ColumnExpressions.
>>> df1.join(df2, on = [df1.col2 == df2.col4, df1.col5 > df2.col7], how = "right", lprefix = "t1", rprefix ="t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 None 2 None None 2.3 None analytics b
1 None 1 None None 1.3 None teradata a
2 None 3 None None 4.3 None are you d
# Example 9: cross join "admissions_train" with "admissions_train".
>>> df1 = DataFrame("admissions_train").head(3).sort("id")
>>> print(df1)
masters gpa stats programming admitted
id
1 yes 3.95 Beginner Beginner 0
2 yes 3.76 Beginner Beginner 0
3 no 3.70 Novice Beginner 1
>>> df2 = DataFrame("admissions_train").head(3).sort("id")
>>> print(df2)
masters gpa stats programming admitted
id
1 yes 3.95 Beginner Beginner 0
2 yes 3.76 Beginner Beginner 0
3 no 3.70 Novice Beginner 1
>>> df3 = df1.join(other=df2, how="cross", lprefix="l", rprefix="r")
>>> df3.set_index("l_id").sort("l_id")
l_programming r_stats r_id r_gpa r_programming l_gpa l_stats r_admitted l_admitted l_masters r_masters
l_id
1 Beginner Novice 3 3.70 Beginner 3.95 Beginner 1 0 yes no
1 Beginner Beginner 1 3.95 Beginner 3.95 Beginner 0 0 yes yes
1 Beginner Beginner 2 3.76 Beginner 3.95 Beginner 0 0 yes yes
2 Beginner Beginner 1 3.95 Beginner 3.76 Beginner 0 0 yes yes
2 Beginner Beginner 2 3.76 Beginner 3.76 Beginner 0 0 yes yes
2 Beginner Novice 3 3.70 Beginner 3.76 Beginner 1 0 yes no
3 Beginner Beginner 1 3.95 Beginner 3.70 Novice 0 1 no yes
3 Beginner Beginner 2 3.76 Beginner 3.70 Novice 0 1 no yes
3 Beginner Novice 3 3.70 Beginner 3.70 Novice 1 1 no no
# Example 10: Perform self join using aliased DataFrame.
# Create an aliased DataFrame.
>>> lhs = DataFrame("admissions_train").head(3).sort("id")
>>> rhs = lhs.alias("rhs")
# Use aliased DataFrame for self join.
>>> joined_df = lhs.join(other=rhs, how="cross", lprefix="l", rprefix="r")
>>> joined_df
l_id r_id l_masters r_masters l_gpa r_gpa l_stats r_stats l_programming r_programming l_admitted r_admitted
0 1 3 yes no 3.95 3.70 Beginner Novice Beginner Beginner 0 1
1 2 2 yes yes 3.76 3.76 Beginner Beginner Beginner Beginner 0 0
2 2 3 yes no 3.76 3.70 Beginner Novice Beginner Beginner 0 1
3 3 1 no yes 3.70 3.95 Novice Beginner Beginner Beginner 1 0
4 3 3 no no 3.70 3.70 Novice Novice Beginner Beginner 1 1
5 3 2 no yes 3.70 3.76 Novice Beginner Beginner Beginner 1 0
6 2 1 yes yes 3.76 3.95 Beginner Beginner Beginner Beginner 0 0
7 1 2 yes yes 3.95 3.76 Beginner Beginner Beginner Beginner 0 0
8 1 1 yes yes 3.95 3.95 Beginner Beginner Beginner Beginner 0 0
# Example 11: Perform join with compound 'on' condition having
# more than one binary operator.
>>> rhs_2 = lhs.assign(double_gpa=lhs.gpa * 2)
>>> joined_df_2 = lhs.join(rhs_2, on=rhs_2.double_gpa == lhs.gpa * 2, how="left", lprefix="l", rprefix="r")
>>> joined_df_2
l_id r_id l_masters r_masters l_gpa r_gpa l_stats r_stats l_programming r_programming l_admitted r_admitted double_gpa
0 3 3 no no 3.70 3.70 Novice Novice Beginner Beginner 1 1 7.40
1 2 2 yes yes 3.76 3.76 Beginner Beginner Beginner Beginner 0 0 7.52
2 1 1 yes yes 3.95 3.95 Beginner Beginner Beginner Beginner 0 0 7.90
# Example 12: Perform join on DataFrames with 'on' condition
# having FunctionExpression.
>>> df = DataFrame("admissions_train")
>>> df2 = df.alias("rhs_df")
>>> joined_df_3 = df.join(df2, on=(df.gpa.round(1) - df2.gpa.round(1)).mod(2.5) > 2,
>>> how="inner", lprefix="l")
>>> joined_df_3.sort(["id", "l_id"])
l_id id l_masters masters l_gpa gpa l_stats stats l_programming programming l_admitted admitted
0 1 24 yes no 3.95 1.87 Beginner Advanced Beginner Novice 0 1
1 13 24 no no 4.0 1.87 Advanced Advanced Novice Novice 1 1
2 15 24 yes no 4.0 1.87 Advanced Advanced Advanced Novice 1 1
3 25 24 no no 3.96 1.87 Advanced Advanced Advanced Novice 1 1
4 27 24 yes no 3.96 1.87 Advanced Advanced Advanced Novice 0 1
5 29 24 yes no 4.0 1.87 Novice Advanced Beginner Novice 0 1
6 40 24 yes no 3.95 1.87 Novice Advanced Beginner Novice 0 1