Teradata Package for Python Function Reference | 17.10 - join - Teradata Package for Python
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.10
- Published
- April 2022
- Language
- English (United States)
- Last Update
- 2022-08-19
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.join = join(self, other, on=None, how='left', lsuffix=None, rsuffix=None)
- DESCRIPTION:
Joins two different teradataml DataFrames together based on column comparisons
specified in argument 'on' and type of join is specified in the argument 'how'.
Supported join operations are:
• Inner join: Returns only matching rows, non matching rows are eliminated.
• Left outer join: Returns all matching rows plus non matching rows from the left table.
• Right outer join: Returns all matching rows plus non matching rows from the right table.
• Full outer join: Returns all rows from both tables, including non matching rows.
• Cross join: Returns all rows from both tables where each row from the first table
is joined with each row from the second table. The result of the join
is a cartesian cross product.
Note: For a cross join, the 'on' argument is ignored.
Supported join operators are =, ==, <, <=, >, >=, <> and != (= and <> operators are
not supported when using DataFrame columns as operands).
Note:
1. When multiple join conditions are given, they are joined using AND boolean
operator. Other boolean operators are not supported.
2. Nesting of join on conditions in column expressions using & and | is not
supported. The example for unsupported nested join on conditions is:
on = [(df1.a == df1.b) & (df1.c == df1.d)]
You can use [df1.a == df1.b, df1.c == df1.d] in place of
[(df1.a == df1.b) & (df1.c == df1.d)].
PARAMETERS:
other:
Required Argument.
Specifies the right teradataml DataFrame on which join is to be performed.
Types: teradataml DataFrame
on:
Optional argument when "how" is "cross", otherwise required.
If specified when "how" is "cross", it is ignored.
Specifies list of conditions that indicate the columns to be join keys.
It can take the following forms:
• String comparisons, in the form of "col1 <= col2", where col1 is
the column of left dataframe df1 and col2 is the column of right
dataframe df2.
Examples:
1. ["a","b"] indicates df1.a = df2.a and df1.b = df2.b.
2. ["a = b", "c == d"] indicates df1.a = df2.b and df1.c = df2.d.
3. ["a <= b", "c > d"] indicates df1.a <= df2.b and df1.c > df2.d.
4. ["a < b", "c >= d"] indicates df1.a < df2.b and df1.c >= df2.d.
5. ["a <> b"] indicates df1.a != df2.b. Same is the case for ["a != b"].
• Column comparisons, in the form of df1.col1 <= df2.col2, where col1
is the column of left dataframe df1 and col2 is the column of right
dataframe df2.
Examples:
1. [df1.a == df2.a, df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b.
2. [df1.a == df2.b, df1.c == df2.d] indicates df1.a = df2.b and df1.c = df2.d.
3. [df1.a <= df2.b and df1.c > df2.d] indicates df1.a <= df2.b and df1.c > df2.d.
4. [df1.a < df2.b and df1.c >= df2.d] indicates df1.a < df2.b and df1.c >= df2.d.
5. df1.a != df2.b indicates df1.a != df2.b.
• The combination of both string comparisons and comparisons as column expressions.
Examples:
1. ["a", df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b.
2. [df1.a <= df2.b, "c > d"] indicates df1.a <= df2.b and df1.c > df2.d.
Types: str (or) ColumnExpression (or) List of strings(str) or ColumnExpressions
how:
Optional Argument.
Specifies the type of join to perform.
Default value is "left".
Permitted Values : "inner", "left", "right", "full" and "cross"
Types: str
lsuffix:
Optional Argument.
Specifies the suffix to be added to the left table columns.
Default Value: None.
Types: str
rsuffix:
Optional Argument.
Specifies the suffix to be added to the right table columns.
Default Value: None.
Types: str
RETURNS:
teradataml DataFrame
RAISES:
TeradataMlException
EXAMPLES:
# Load the data to run the example.
>>> from teradataml import load_example_data
>>> load_example_data("dataframe", ["join_table1", "join_table2"])
>>> load_example_data("glm", "admissions_train") # used in cross join
>>> df1 = DataFrame("join_table1")
>>> df2 = DataFrame("join_table2")
# Print dataframe.
>>> df1
col2 col3 col5
col1
2 analytics 2.3 b
1 teradata 1.3 a
3 platform 3.3 c
# Print dataframe.
>>> df2
col4 col3 col7
col1
2 analytics 2.3 b
1 teradata 1.3 a
3 are you 4.3 d
# Both on conditions as strings.
>>> df1.join(other = df2, on = ["col2=col4", "col1"], how = "inner", lsuffix = "t1", rsuffix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 2 2 analytics 2.3 2.3 b analytics b
1 1 1 teradata 1.3 1.3 a teradata a
# One on condition is ColumnExpression and other is string having two columns with left
# outer join.
>>> df1.join(df2, on = [df1.col2 == df2.col4,"col5 = col7"], how = "left", lsuffix = "t1", rsuffix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 3 None platform 3.3 NaN c None None
1 2 2 analytics 2.3 2.3 b analytics b
2 1 1 teradata 1.3 1.3 a teradata a
# One on condition is ColumnExpression and other is string having only one column.
>>> df1.join(other = df2, on = [df1.col2 == df2.col4,"col3"], how = "inner", lsuffix = "t1", rsuffix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 2 2 analytics 2.3 2.3 b analytics b
1 1 1 teradata 1.3 1.3 a teradata a
# One on condition is ColumnExpression and other is string having two columns with
# full join.
>>> df1.join(other = df2, on = ["col2=col4",df1.col5 == df2.col7], how = "full", lsuffix = "t1", rsuffix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 3 None platform 3.3 NaN c None None
1 None 3 None NaN 4.3 None are you d
2 1 1 teradata 1.3 1.3 a teradata a
3 2 2 analytics 2.3 2.3 b analytics b
# Using not equal operation in ColumnExpression condition.
>>> df1.join(other = df2, on = ["col5==col7",df1.col2 != df2.col4], how = "full", lsuffix = "t1", rsuffix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 1 None teradata 1.3 NaN a None None
1 2 None analytics 2.3 NaN b None None
2 None 2 None NaN 2.3 None analytics b
3 None 1 None NaN 1.3 None teradata a
4 3 None platform 3.3 NaN c None None
5 None 3 None NaN 4.3 None are you d
# Using only one string expression with <> operation.
>>> df1.join(other = df2, on = "col2<>col4", how = "left", lsuffix = "t1", rsuffix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 2 3 analytics 2.3 4.3 b are you d
1 2 1 analytics 2.3 1.3 b teradata a
2 3 2 platform 3.3 2.3 c analytics b
3 1 2 teradata 1.3 2.3 a analytics b
4 3 1 platform 3.3 1.3 c teradata a
5 1 3 teradata 1.3 4.3 a are you d
6 3 3 platform 3.3 4.3 c are you d
# Using only one ColumnExpression in on conditions.
>>> df1.join(other = df2, on = df1.col5 != df2.col7, how = "full", lsuffix = "t1", rsuffix = "t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 1 3 teradata 1.3 4.3 a are you d
1 3 1 platform 3.3 1.3 c teradata a
2 1 2 teradata 1.3 2.3 a analytics b
3 3 2 platform 3.3 2.3 c analytics b
4 2 1 analytics 2.3 1.3 b teradata a
5 3 3 platform 3.3 4.3 c are you d
6 2 3 analytics 2.3 4.3 b are you d
# Both on conditions as ColumnExpressions.
>>> df1.join(df2, on = [df1.col2 == df2.col4, df1.col5 > df2.col7], how = "right", lsuffix = "t1", rsuffix ="t2")
t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7
0 None 2 None None 2.3 None analytics b
1 None 1 None None 1.3 None teradata a
2 None 3 None None 4.3 None are you d
# cross join "admissions_train" with "admissions_train".
>>> df1 = DataFrame("admissions_train").head(3).sort("id")
>>> print(df1)
masters gpa stats programming admitted
id
1 yes 3.95 Beginner Beginner 0
2 yes 3.76 Beginner Beginner 0
3 no 3.70 Novice Beginner 1
>>> df2 = DataFrame("admissions_train").head(3).sort("id")
>>> print(df2)
masters gpa stats programming admitted
id
1 yes 3.95 Beginner Beginner 0
2 yes 3.76 Beginner Beginner 0
3 no 3.70 Novice Beginner 1
>>> df3 = df1.join(other=df2, how="cross", lsuffix="l", rsuffix="r")
>>> df3.set_index("l_id").sort("l_id")
l_programming r_stats r_id r_gpa r_programming l_gpa l_stats r_admitted l_admitted l_masters r_masters
l_id
1 Beginner Novice 3 3.70 Beginner 3.95 Beginner 1 0 yes no
1 Beginner Beginner 1 3.95 Beginner 3.95 Beginner 0 0 yes yes
1 Beginner Beginner 2 3.76 Beginner 3.95 Beginner 0 0 yes yes
2 Beginner Beginner 1 3.95 Beginner 3.76 Beginner 0 0 yes yes
2 Beginner Beginner 2 3.76 Beginner 3.76 Beginner 0 0 yes yes
2 Beginner Novice 3 3.70 Beginner 3.76 Beginner 1 0 yes no
3 Beginner Beginner 1 3.95 Beginner 3.70 Novice 0 1 no yes
3 Beginner Beginner 2 3.76 Beginner 3.70 Novice 0 1 no yes
3 Beginner Novice 3 3.70 Beginner 3.70 Novice 1 1 no no