Teradata Package for Python Function Reference | 17.10 - join - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.dataframe.dataframe.DataFrame.join = join(self, other, on=None, how='left', lsuffix=None, rsuffix=None): DESCRIPTION: Joins two different teradataml DataFrames together based on column comparisons specified in argument 'on' and type of join is specified in the argument 'how'. Supported join operations are: • Inner join: Returns only matching rows, non matching rows are eliminated. • Left outer join: Returns all matching rows plus non matching rows from the left table. • Right outer join: Returns all matching rows plus non matching rows from the right table. • Full outer join: Returns all rows from both tables, including non matching rows. • Cross join: Returns all rows from both tables where each row from the first table is joined with each row from the second table. The result of the join is a cartesian cross product. Note: For a cross join, the 'on' argument is ignored. Supported join operators are =, ==, <, <=, >, >=, <> and != (= and <> operators are not supported when using DataFrame columns as operands). Note: 1. When multiple join conditions are given, they are joined using AND boolean operator. Other boolean operators are not supported. 2. Nesting of join on conditions in column expressions using & and | is not supported. The example for unsupported nested join on conditions is: on = [(df1.a == df1.b) & (df1.c == df1.d)] You can use [df1.a == df1.b, df1.c == df1.d] in place of [(df1.a == df1.b) & (df1.c == df1.d)]. PARAMETERS: other: Required Argument. Specifies the right teradataml DataFrame on which join is to be performed. Types: teradataml DataFrame on: Optional argument when "how" is "cross", otherwise required. If specified when "how" is "cross", it is ignored. Specifies list of conditions that indicate the columns to be join keys. It can take the following forms: • String comparisons, in the form of "col1 <= col2", where col1 is the column of left dataframe df1 and col2 is the column of right dataframe df2. Examples: 1. ["a","b"] indicates df1.a = df2.a and df1.b = df2.b. 2. ["a = b", "c == d"] indicates df1.a = df2.b and df1.c = df2.d. 3. ["a <= b", "c > d"] indicates df1.a <= df2.b and df1.c > df2.d. 4. ["a < b", "c >= d"] indicates df1.a < df2.b and df1.c >= df2.d. 5. ["a <> b"] indicates df1.a != df2.b. Same is the case for ["a != b"]. • Column comparisons, in the form of df1.col1 <= df2.col2, where col1 is the column of left dataframe df1 and col2 is the column of right dataframe df2. Examples: 1. [df1.a == df2.a, df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b. 2. [df1.a == df2.b, df1.c == df2.d] indicates df1.a = df2.b and df1.c = df2.d. 3. [df1.a <= df2.b and df1.c > df2.d] indicates df1.a <= df2.b and df1.c > df2.d. 4. [df1.a < df2.b and df1.c >= df2.d] indicates df1.a < df2.b and df1.c >= df2.d. 5. df1.a != df2.b indicates df1.a != df2.b. • The combination of both string comparisons and comparisons as column expressions. Examples: 1. ["a", df1.b == df2.b] indicates df1.a = df2.a and df1.b = df2.b. 2. [df1.a <= df2.b, "c > d"] indicates df1.a <= df2.b and df1.c > df2.d. Types: str (or) ColumnExpression (or) List of strings(str) or ColumnExpressions how: Optional Argument. Specifies the type of join to perform. Default value is "left". Permitted Values : "inner", "left", "right", "full" and "cross" Types: str lsuffix: Optional Argument. Specifies the suffix to be added to the left table columns. Default Value: None. Types: str rsuffix: Optional Argument. Specifies the suffix to be added to the right table columns. Default Value: None. Types: str RETURNS: teradataml DataFrame RAISES: TeradataMlException EXAMPLES: # Load the data to run the example. >>> from teradataml import load_example_data >>> load_example_data("dataframe", ["join_table1", "join_table2"]) >>> load_example_data("glm", "admissions_train") # used in cross join >>> df1 = DataFrame("join_table1") >>> df2 = DataFrame("join_table2") # Print dataframe. >>> df1 col2 col3 col5 col1 2 analytics 2.3 b 1 teradata 1.3 a 3 platform 3.3 c # Print dataframe. >>> df2 col4 col3 col7 col1 2 analytics 2.3 b 1 teradata 1.3 a 3 are you 4.3 d # Both on conditions as strings. >>> df1.join(other = df2, on = ["col2=col4", "col1"], how = "inner", lsuffix = "t1", rsuffix = "t2") t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7 0 2 2 analytics 2.3 2.3 b analytics b 1 1 1 teradata 1.3 1.3 a teradata a # One on condition is ColumnExpression and other is string having two columns with left # outer join. >>> df1.join(df2, on = [df1.col2 == df2.col4,"col5 = col7"], how = "left", lsuffix = "t1", rsuffix = "t2") t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7 0 3 None platform 3.3 NaN c None None 1 2 2 analytics 2.3 2.3 b analytics b 2 1 1 teradata 1.3 1.3 a teradata a # One on condition is ColumnExpression and other is string having only one column. >>> df1.join(other = df2, on = [df1.col2 == df2.col4,"col3"], how = "inner", lsuffix = "t1", rsuffix = "t2") t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7 0 2 2 analytics 2.3 2.3 b analytics b 1 1 1 teradata 1.3 1.3 a teradata a # One on condition is ColumnExpression and other is string having two columns with # full join. >>> df1.join(other = df2, on = ["col2=col4",df1.col5 == df2.col7], how = "full", lsuffix = "t1", rsuffix = "t2") t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7 0 3 None platform 3.3 NaN c None None 1 None 3 None NaN 4.3 None are you d 2 1 1 teradata 1.3 1.3 a teradata a 3 2 2 analytics 2.3 2.3 b analytics b # Using not equal operation in ColumnExpression condition. >>> df1.join(other = df2, on = ["col5==col7",df1.col2 != df2.col4], how = "full", lsuffix = "t1", rsuffix = "t2") t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7 0 1 None teradata 1.3 NaN a None None 1 2 None analytics 2.3 NaN b None None 2 None 2 None NaN 2.3 None analytics b 3 None 1 None NaN 1.3 None teradata a 4 3 None platform 3.3 NaN c None None 5 None 3 None NaN 4.3 None are you d # Using only one string expression with <> operation. >>> df1.join(other = df2, on = "col2<>col4", how = "left", lsuffix = "t1", rsuffix = "t2") t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7 0 2 3 analytics 2.3 4.3 b are you d 1 2 1 analytics 2.3 1.3 b teradata a 2 3 2 platform 3.3 2.3 c analytics b 3 1 2 teradata 1.3 2.3 a analytics b 4 3 1 platform 3.3 1.3 c teradata a 5 1 3 teradata 1.3 4.3 a are you d 6 3 3 platform 3.3 4.3 c are you d # Using only one ColumnExpression in on conditions. >>> df1.join(other = df2, on = df1.col5 != df2.col7, how = "full", lsuffix = "t1", rsuffix = "t2") t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7 0 1 3 teradata 1.3 4.3 a are you d 1 3 1 platform 3.3 1.3 c teradata a 2 1 2 teradata 1.3 2.3 a analytics b 3 3 2 platform 3.3 2.3 c analytics b 4 2 1 analytics 2.3 1.3 b teradata a 5 3 3 platform 3.3 4.3 c are you d 6 2 3 analytics 2.3 4.3 b are you d # Both on conditions as ColumnExpressions. >>> df1.join(df2, on = [df1.col2 == df2.col4, df1.col5 > df2.col7], how = "right", lsuffix = "t1", rsuffix ="t2") t1_col1 t2_col1 col2 t1_col3 t2_col3 col5 col4 col7 0 None 2 None None 2.3 None analytics b 1 None 1 None None 1.3 None teradata a 2 None 3 None None 4.3 None are you d # cross join "admissions_train" with "admissions_train". >>> df1 = DataFrame("admissions_train").head(3).sort("id") >>> print(df1) masters gpa stats programming admitted id 1 yes 3.95 Beginner Beginner 0 2 yes 3.76 Beginner Beginner 0 3 no 3.70 Novice Beginner 1 >>> df2 = DataFrame("admissions_train").head(3).sort("id") >>> print(df2) masters gpa stats programming admitted id 1 yes 3.95 Beginner Beginner 0 2 yes 3.76 Beginner Beginner 0 3 no 3.70 Novice Beginner 1 >>> df3 = df1.join(other=df2, how="cross", lsuffix="l", rsuffix="r") >>> df3.set_index("l_id").sort("l_id") l_programming r_stats r_id r_gpa r_programming l_gpa l_stats r_admitted l_admitted l_masters r_masters l_id 1 Beginner Novice 3 3.70 Beginner 3.95 Beginner 1 0 yes no 1 Beginner Beginner 1 3.95 Beginner 3.95 Beginner 0 0 yes yes 1 Beginner Beginner 2 3.76 Beginner 3.95 Beginner 0 0 yes yes 2 Beginner Beginner 1 3.95 Beginner 3.76 Beginner 0 0 yes yes 2 Beginner Beginner 2 3.76 Beginner 3.76 Beginner 0 0 yes yes 2 Beginner Novice 3 3.70 Beginner 3.76 Beginner 1 0 yes no 3 Beginner Beginner 1 3.95 Beginner 3.70 Novice 0 1 no yes 3 Beginner Beginner 2 3.76 Beginner 3.70 Novice 0 1 no yes 3 Beginner Novice 3 3.70 Beginner 3.70 Novice 1 1 no no