Teradata Package for Python Function Reference - concat - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference
- Product
- Teradata Package for Python
- Release Number
- 17.00
- Published
- November 2021
- Language
- English (United States)
- Last Update
- 2021-11-19
- lifecycle
- previous
- Product Category
- Teradata Vantage
- teradataml.dataframe.setop.concat = concat(df_list, join='OUTER', allow_duplicates=True, sort=False, ignore_index=False)
- DESCRIPTION:
Concatenates a list of teradataml DataFrames along the index axis.
PARAMETERS:
df_list:
Required argument.
Specifies the list of teradataml DataFrames on which the concatenation is to be performed.
Types: list of teradataml DataFrames
join:
Optional argument.
Specifies how to handle indexes on columns axis.
Supported values are:
• 'OUTER': It instructs the function to project all columns from all the DataFrames.
Columns not present in any DataFrame will have a SQL NULL value.
• 'INNER': It instructs the function to project only the columns common to all DataFrames.
Default value: 'OUTER'
Permitted values: 'INNER', 'OUTER'
Types: str
allow_duplicates:
Optional argument.
Specifies if the result of concatenation can have duplicate rows.
Default value: True
Types: bool
sort:
Optional argument.
Specifies a flag to sort the columns axis if it is not already aligned when
the join argument is set to 'outer'.
Default value: False
Types: bool
ignore_index:
Optional argument.
Specifies whether to ignore the index columns in resulting DataFrame or not.
If True, then index columns will be ignored in the concat operation.
Default value: False
Types: bool
RETURNS:
teradataml DataFrame
RAISES:
TeradataMlException
EXAMPLES:
>>> from teradataml import load_example_data
>>> load_example_data("dataframe", "admissions_train")
>>> from teradataml.dataframe import concat
>>>
>>> # Default options
>>> df = DataFrame('admissions_train')
>>> df1 = df[df.gpa == 4].select(['id', 'stats', 'masters', 'gpa'])
>>> df1
stats masters gpa
id
13 Advanced no 4.0
29 Novice yes 4.0
15 Advanced yes 4.0
>>> df2 = df[df.gpa < 2].select(['id', 'stats', 'programming', 'admitted'])
>>> df2
stats programming admitted
id
24 Advanced Novice 1
19 Advanced Advanced 0
>>> cdf = concat([df1, df2])
>>> cdf
stats masters gpa programming admitted
id
19 Advanced None NaN Advanced 0
24 Advanced None NaN Novice 1
13 Advanced no 4.0 None None
29 Novice yes 4.0 None None
15 Advanced yes 4.0 None None
>>>
>>> # concat more than two DataFrames
>>> df3 = df[df.gpa == 3].select(['id', 'stats', 'programming', 'gpa'])
>>> df3
stats programming gpa
id
36 Advanced Novice 3.0
>>> cdf = concat([df1, df2, df3])
>>> cdf
stats masters gpa programming admitted
id
15 Advanced yes 4.0 None NaN
19 Advanced None NaN Advanced 0.0
36 Advanced None 3.0 Novice NaN
29 Novice yes 4.0 None NaN
13 Advanced no 4.0 None NaN
24 Advanced None NaN Novice 1.0
>>> # join = 'inner'
>>> cdf = concat([df1, df2], join='inner')
>>> cdf
stats
id
19 Advanced
24 Advanced
13 Advanced
29 Novice
15 Advanced
>>>
>>> # allow_duplicates = True (default)
>>> cdf = concat([df1, df2])
>>> cdf
stats masters gpa programming admitted
id
19 Advanced None NaN Advanced 0
24 Advanced None NaN Novice 1
13 Advanced no 4.0 None None
29 Novice yes 4.0 None None
15 Advanced yes 4.0 None None
>>> cdf = concat([cdf, df2])
>>> cdf
stats masters gpa programming admitted
id
19 Advanced None NaN Advanced 0
13 Advanced no 4.0 None None
24 Advanced None NaN Novice 1
24 Advanced None NaN Novice 1
19 Advanced None NaN Advanced 0
29 Novice yes 4.0 None None
15 Advanced yes 4.0 None None
>>>
>>> # allow_duplicates = False
>>> cdf = concat([cdf, df2], allow_duplicates=False)
>>> cdf
stats masters gpa programming admitted
id
19 Advanced None NaN Advanced 0
29 Novice yes 4.0 None None
24 Advanced None NaN Novice 1
15 Advanced yes 4.0 None None
13 Advanced no 4.0 None None
>>>
>>> # sort = True
>>> cdf = concat([df1, df2], sort=True)
>>> cdf
admitted gpa masters programming stats
id
19 0 NaN None Advanced Advanced
24 1 NaN None Novice Advanced
13 None 4.0 no None Advanced
29 None 4.0 yes None Novice
15 None 4.0 yes None Advanced
>>>
>>> # ignore_index = True
>>> cdf = concat([df1, df2], ignore_index=True)
>>> cdf
stats masters gpa programming admitted
0 Advanced yes 4.0 None NaN
1 Advanced None NaN Advanced 0.0
2 Novice yes 4.0 None NaN
3 Advanced None NaN Novice 1.0
4 Advanced no 4.0 None NaN