Teradata Package for Python Function Reference | 20.00 - concat - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- Published
- December 2024
- Language
- English (United States)
- Last Update
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- Product Category
- Teradata Vantage
- teradataml.dataframe.setop.concat = concat(df_list, join='OUTER', allow_duplicates=True, sort=False, ignore_index=False)
Concatenates a list of teradataml DataFrames, GeoDataFrames, or both along the index axis.
Required argument.
Specifies a list of teradataml DataFrames, GeoDataFrames, or both on which the
concatenation is to be performed.
Types: list of teradataml DataFrames and/or GeoDataFrames
Optional argument.
Specifies how to handle indexes on columns axis.
Supported values are:
• 'OUTER': It instructs the function to project all columns from all the DataFrames.
Columns not present in any DataFrame will have a SQL NULL value.
• 'INNER': It instructs the function to project only the columns common to all DataFrames.
Default value: 'OUTER'
Permitted values: 'INNER', 'OUTER'
Types: str
Optional argument.
Specifies if the result of concatenation can have duplicate rows.
Default value: True
Types: bool
Optional argument.
Specifies a flag to sort the columns axis if it is not already aligned when
the join argument is set to 'outer'.
Default value: False
Types: bool
Optional argument.
Specifies whether to ignore the index columns in resulting DataFrame or not.
If True, then index columns will be ignored in the concat operation.
Default value: False
Types: bool
teradataml DataFrame, if result does not contain any geometry data, otherwise returns teradataml GeoDataFrame.
>>> from teradataml import load_example_data
>>> load_example_data("dataframe", "admissions_train")
>>> load_example_data("geodataframe", ["sample_shapes"])
>>> from teradataml.dataframe import concat
>>> # Default options
>>> df = DataFrame('admissions_train')
>>> df1 = df[df.gpa == 4].select(['id', 'stats', 'masters', 'gpa'])
>>> df1
stats masters gpa
13 Advanced no 4.0
29 Novice yes 4.0
15 Advanced yes 4.0
>>> df2 = df[df.gpa < 2].select(['id', 'stats', 'programming', 'admitted'])
>>> df2
stats programming admitted
24 Advanced Novice 1
19 Advanced Advanced 0
>>> cdf = concat([df1, df2])
>>> cdf
stats masters gpa programming admitted
19 Advanced None NaN Advanced 0
24 Advanced None NaN Novice 1
13 Advanced no 4.0 None None
29 Novice yes 4.0 None None
15 Advanced yes 4.0 None None
>>> # concat more than two DataFrames
>>> df3 = df[df.gpa == 3].select(['id', 'stats', 'programming', 'gpa'])
>>> df3
stats programming gpa
36 Advanced Novice 3.0
>>> cdf = concat([df1, df2, df3])
>>> cdf
stats masters gpa programming admitted
15 Advanced yes 4.0 None NaN
19 Advanced None NaN Advanced 0.0
36 Advanced None 3.0 Novice NaN
29 Novice yes 4.0 None NaN
13 Advanced no 4.0 None NaN
24 Advanced None NaN Novice 1.0
>>> # join = 'inner'
>>> cdf = concat([df1, df2], join='inner')
>>> cdf
19 Advanced
24 Advanced
13 Advanced
29 Novice
15 Advanced
>>> # allow_duplicates = True (default)
>>> cdf = concat([df1, df2])
>>> cdf
stats masters gpa programming admitted
19 Advanced None NaN Advanced 0
24 Advanced None NaN Novice 1
13 Advanced no 4.0 None None
29 Novice yes 4.0 None None
15 Advanced yes 4.0 None None
>>> cdf = concat([cdf, df2])
>>> cdf
stats masters gpa programming admitted
19 Advanced None NaN Advanced 0
13 Advanced no 4.0 None None
24 Advanced None NaN Novice 1
24 Advanced None NaN Novice 1
19 Advanced None NaN Advanced 0
29 Novice yes 4.0 None None
15 Advanced yes 4.0 None None
>>> # allow_duplicates = False
>>> cdf = concat([cdf, df2], allow_duplicates=False)
>>> cdf
stats masters gpa programming admitted
19 Advanced None NaN Advanced 0
29 Novice yes 4.0 None None
24 Advanced None NaN Novice 1
15 Advanced yes 4.0 None None
13 Advanced no 4.0 None None
>>> # sort = True
>>> cdf = concat([df1, df2], sort=True)
>>> cdf
admitted gpa masters programming stats
19 0 NaN None Advanced Advanced
24 1 NaN None Novice Advanced
13 None 4.0 no None Advanced
29 None 4.0 yes None Novice
15 None 4.0 yes None Advanced
>>> # ignore_index = True
>>> cdf = concat([df1, df2], ignore_index=True)
>>> cdf
stats masters gpa programming admitted
0 Advanced yes 4.0 None NaN
1 Advanced None NaN Advanced 0.0
2 Novice yes 4.0 None NaN
3 Advanced None NaN Novice 1.0
4 Advanced no 4.0 None NaN
# Perform concatenation of two GeoDataFrames
>>> geo_dataframe = GeoDataFrame('sample_shapes')
>>> geo_dataframe1 = geo_dataframe[geo_dataframe.skey == 1004].select(['skey','linestrings'])
>>> geo_dataframe1
skey linestrings
1004 LINESTRING (10 20 30,40 50 60,70 80 80)
>>> geo_dataframe2 = geo_dataframe[geo_dataframe.skey < 1010].select(['skey','polygons'])
>>> geo_dataframe2
skey polygons
1009 MULTIPOLYGON (((0 0 0,0 20 20,20 20 20,20 0 20,0 0 0)),((50 50 50,50 90 90,90 90 90,90 50 90,50 50 50)))
1005 POLYGON ((0 0 0,0 0 20.435,0.0 20.435 0,0.0 20.435 20.435,20.435 0.0 0,20.435 0.0 20.435,20.435 20.435 0,20.435 20.435 20.435,0 0 0))
1004 POLYGON ((0 0 0,0 10 20,20 20 30,20 10 0,0 0 0),(5 5 5,5 10 10,10 10 10,10 10 5,5 5 5))
1002 POLYGON ((0 0,0 20,20 20,20 0,0 0),(5 5,5 10,10 10,10 5,5 5))
1001 POLYGON ((0 0,0 20,20 20,20 0,0 0))
1003 POLYGON ((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8))
1007 MULTIPOLYGON (((1 1,1 3,6 3,6 0,1 1)),((10 5,10 10,20 10,20 5,10 5)))
1006 POLYGON ((0 0 0,0 0 20,0 20 0,0 20 20,20 0 0,20 0 20,20 20 0,20 20 20,0 0 0))
1008 MULTIPOLYGON (((0 0,0 20,20 20,20 0,0 0)),((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8)))
>>> concat([geo_dataframe1,geo_dataframe2])
skey linestrings polygons
1009 None MULTIPOLYGON (((0 0 0,0 20 20,20 20 20,20 0 20,0 0 0)),((50 50 50,50 90 90,90 90 90,90 50 90,50 50 50)))
1005 None POLYGON ((0 0 0,0 0 20.435,0.0 20.435 0,0.0 20.435 20.435,20.435 0.0 0,20.435 0.0 20.435,20.435 20.435 0,20.435 20.435 20.435,0 0 0))
1004 LINESTRING (10 20 30,40 50 60,70 80 80) None
1004 None POLYGON ((0 0 0,0 10 20,20 20 30,20 10 0,0 0 0),(5 5 5,5 10 10,10 10 10,10 10 5,5 5 5))
1003 None POLYGON ((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8))
1001 None POLYGON ((0 0,0 20,20 20,20 0,0 0))
1002 None POLYGON ((0 0,0 20,20 20,20 0,0 0),(5 5,5 10,10 10,10 5,5 5))
1007 None MULTIPOLYGON (((1 1,1 3,6 3,6 0,1 1)),((10 5,10 10,20 10,20 5,10 5)))
1006 None POLYGON ((0 0 0,0 0 20,0 20 0,0 20 20,20 0 0,20 0 20,20 20 0,20 20 20,0 0 0))
1008 None MULTIPOLYGON (((0 0,0 20,20 20,20 0,0 0)),((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8)))
# Perform concatenation of a DataFrame and GeoDataFrame which returns a GeoDataFrame.
>>> normal_df=df.select(['id','stats'])
>>> normal_df
34 Advanced
32 Advanced
11 Advanced
40 Novice
38 Advanced
36 Advanced
7 Novice
26 Advanced
19 Advanced
13 Advanced
>>> geo_df = geo_dataframe[geo_dataframe.skey < 1010].select(['skey', 'polygons'])
>>> geo_df
skey polygons
1003 POLYGON ((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8))
1008 MULTIPOLYGON (((0 0,0 20,20 20,20 0,0 0)),((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8)))
1006 POLYGON ((0 0 0,0 0 20,0 20 0,0 20 20,20 0 0,20 0 20,20 20 0,20 20 20,0 0 0))
1009 MULTIPOLYGON (((0 0 0,0 20 20,20 20 20,20 0 20,0 0 0)),((50 50 50,50 90 90,90 90 90,90 50 90,50 50 50)))
1005 POLYGON ((0 0 0,0 0 20.435,0.0 20.435 0,0.0 20.435 20.435,20.435 0.0 0,20.435 0.0 20.435,20.435 20.435 0,20.435 20.435 20.435,0 0 0))
1007 MULTIPOLYGON (((1 1,1 3,6 3,6 0,1 1)),((10 5,10 10,20 10,20 5,10 5)))
1001 POLYGON ((0 0,0 20,20 20,20 0,0 0))
1002 POLYGON ((0 0,0 20,20 20,20 0,0 0),(5 5,5 10,10 10,10 5,5 5))
1004 POLYGON ((0 0 0,0 10 20,20 20 30,20 10 0,0 0 0),(5 5 5,5 10 10,10 10 10,10 10 5,5 5 5))
>>> idf = concat([normal_df, geo_df])
>>> idf
stats skey polygons
38 Advanced None None
7 Novice None None
26 Advanced None None
17 Advanced None None
34 Advanced None None
13 Advanced None None
32 Advanced None None
11 Advanced None None
15 Advanced None None
36 Advanced None None