Teradata Package for Python Function Reference | 17.10 - concat - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.10

Published

April 2022

Language

English (United States)

Last Update

2022-08-19

lifecycle

Product Category

Teradata Vantage

teradataml.dataframe.setop.concat = concat(df_list, join='OUTER', allow_duplicates=True, sort=False, ignore_index=False): DESCRIPTION: Concatenates a list of teradataml DataFrames, GeoDataFrames, or both along the index axis. PARAMETERS: df_list: Required argument. Specifies a list of teradataml DataFrames, GeoDataFrames, or both on which the concatenation is to be performed. Types: list of teradataml DataFrames and/or GeoDataFrames join: Optional argument. Specifies how to handle indexes on columns axis. Supported values are: • 'OUTER': It instructs the function to project all columns from all the DataFrames. Columns not present in any DataFrame will have a SQL NULL value. • 'INNER': It instructs the function to project only the columns common to all DataFrames. Default value: 'OUTER' Permitted values: 'INNER', 'OUTER' Types: str allow_duplicates: Optional argument. Specifies if the result of concatenation can have duplicate rows. Default value: True Types: bool sort: Optional argument. Specifies a flag to sort the columns axis if it is not already aligned when the join argument is set to 'outer'. Default value: False Types: bool ignore_index: Optional argument. Specifies whether to ignore the index columns in resulting DataFrame or not. If True, then index columns will be ignored in the concat operation. Default value: False Types: bool RETURNS: teradataml DataFrame, if result does not contain any geometry data, otherwise returns teradataml GeoDataFrame. RAISES: TeradataMlException EXAMPLES: >>> from teradataml import load_example_data >>> load_example_data("dataframe", "admissions_train") >>> load_example_data("geodataframe", ["sample_shapes"]) >>> from teradataml.dataframe import concat >>> >>> # Default options >>> df = DataFrame('admissions_train') >>> df1 = df[df.gpa == 4].select(['id', 'stats', 'masters', 'gpa']) >>> df1 stats masters gpa id 13 Advanced no 4.0 29 Novice yes 4.0 15 Advanced yes 4.0 >>> df2 = df[df.gpa < 2].select(['id', 'stats', 'programming', 'admitted']) >>> df2 stats programming admitted id 24 Advanced Novice 1 19 Advanced Advanced 0 >>> cdf = concat([df1, df2]) >>> cdf stats masters gpa programming admitted id 19 Advanced None NaN Advanced 0 24 Advanced None NaN Novice 1 13 Advanced no 4.0 None None 29 Novice yes 4.0 None None 15 Advanced yes 4.0 None None >>> >>> # concat more than two DataFrames >>> df3 = df[df.gpa == 3].select(['id', 'stats', 'programming', 'gpa']) >>> df3 stats programming gpa id 36 Advanced Novice 3.0 >>> cdf = concat([df1, df2, df3]) >>> cdf stats masters gpa programming admitted id 15 Advanced yes 4.0 None NaN 19 Advanced None NaN Advanced 0.0 36 Advanced None 3.0 Novice NaN 29 Novice yes 4.0 None NaN 13 Advanced no 4.0 None NaN 24 Advanced None NaN Novice 1.0 >>> # join = 'inner' >>> cdf = concat([df1, df2], join='inner') >>> cdf stats id 19 Advanced 24 Advanced 13 Advanced 29 Novice 15 Advanced >>> >>> # allow_duplicates = True (default) >>> cdf = concat([df1, df2]) >>> cdf stats masters gpa programming admitted id 19 Advanced None NaN Advanced 0 24 Advanced None NaN Novice 1 13 Advanced no 4.0 None None 29 Novice yes 4.0 None None 15 Advanced yes 4.0 None None >>> cdf = concat([cdf, df2]) >>> cdf stats masters gpa programming admitted id 19 Advanced None NaN Advanced 0 13 Advanced no 4.0 None None 24 Advanced None NaN Novice 1 24 Advanced None NaN Novice 1 19 Advanced None NaN Advanced 0 29 Novice yes 4.0 None None 15 Advanced yes 4.0 None None >>> >>> # allow_duplicates = False >>> cdf = concat([cdf, df2], allow_duplicates=False) >>> cdf stats masters gpa programming admitted id 19 Advanced None NaN Advanced 0 29 Novice yes 4.0 None None 24 Advanced None NaN Novice 1 15 Advanced yes 4.0 None None 13 Advanced no 4.0 None None >>> >>> # sort = True >>> cdf = concat([df1, df2], sort=True) >>> cdf admitted gpa masters programming stats id 19 0 NaN None Advanced Advanced 24 1 NaN None Novice Advanced 13 None 4.0 no None Advanced 29 None 4.0 yes None Novice 15 None 4.0 yes None Advanced >>> >>> # ignore_index = True >>> cdf = concat([df1, df2], ignore_index=True) >>> cdf stats masters gpa programming admitted 0 Advanced yes 4.0 None NaN 1 Advanced None NaN Advanced 0.0 2 Novice yes 4.0 None NaN 3 Advanced None NaN Novice 1.0 4 Advanced no 4.0 None NaN # Perform concatenation of two GeoDataFrames >>> geo_dataframe = GeoDataFrame('sample_shapes') >>> geo_dataframe1 = geo_dataframe[geo_dataframe.skey == 1004].select(['skey','linestrings']) >>> geo_dataframe1 skey linestrings 1004 LINESTRING (10 20 30,40 50 60,70 80 80) >>> geo_dataframe2 = geo_dataframe[geo_dataframe.skey < 1010].select(['skey','polygons']) >>> geo_dataframe2 skey polygons 1009 MULTIPOLYGON (((0 0 0,0 20 20,20 20 20,20 0 20,0 0 0)),((50 50 50,50 90 90,90 90 90,90 50 90,50 50 50))) 1005 POLYGON ((0 0 0,0 0 20.435,0.0 20.435 0,0.0 20.435 20.435,20.435 0.0 0,20.435 0.0 20.435,20.435 20.435 0,20.435 20.435 20.435,0 0 0)) 1004 POLYGON ((0 0 0,0 10 20,20 20 30,20 10 0,0 0 0),(5 5 5,5 10 10,10 10 10,10 10 5,5 5 5)) 1002 POLYGON ((0 0,0 20,20 20,20 0,0 0),(5 5,5 10,10 10,10 5,5 5)) 1001 POLYGON ((0 0,0 20,20 20,20 0,0 0)) 1003 POLYGON ((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8)) 1007 MULTIPOLYGON (((1 1,1 3,6 3,6 0,1 1)),((10 5,10 10,20 10,20 5,10 5))) 1006 POLYGON ((0 0 0,0 0 20,0 20 0,0 20 20,20 0 0,20 0 20,20 20 0,20 20 20,0 0 0)) 1008 MULTIPOLYGON (((0 0,0 20,20 20,20 0,0 0)),((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8))) >>> concat([geo_dataframe1,geo_dataframe2]) skey linestrings polygons 1009 None MULTIPOLYGON (((0 0 0,0 20 20,20 20 20,20 0 20,0 0 0)),((50 50 50,50 90 90,90 90 90,90 50 90,50 50 50))) 1005 None POLYGON ((0 0 0,0 0 20.435,0.0 20.435 0,0.0 20.435 20.435,20.435 0.0 0,20.435 0.0 20.435,20.435 20.435 0,20.435 20.435 20.435,0 0 0)) 1004 LINESTRING (10 20 30,40 50 60,70 80 80) None 1004 None POLYGON ((0 0 0,0 10 20,20 20 30,20 10 0,0 0 0),(5 5 5,5 10 10,10 10 10,10 10 5,5 5 5)) 1003 None POLYGON ((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8)) 1001 None POLYGON ((0 0,0 20,20 20,20 0,0 0)) 1002 None POLYGON ((0 0,0 20,20 20,20 0,0 0),(5 5,5 10,10 10,10 5,5 5)) 1007 None MULTIPOLYGON (((1 1,1 3,6 3,6 0,1 1)),((10 5,10 10,20 10,20 5,10 5))) 1006 None POLYGON ((0 0 0,0 0 20,0 20 0,0 20 20,20 0 0,20 0 20,20 20 0,20 20 20,0 0 0)) 1008 None MULTIPOLYGON (((0 0,0 20,20 20,20 0,0 0)),((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8))) # Perform concatenation of a DataFrame and GeoDataFrame which returns a GeoDataFrame. >>> normal_df=df.select(['id','stats']) >>> normal_df stats id 34 Advanced 32 Advanced 11 Advanced 40 Novice 38 Advanced 36 Advanced 7 Novice 26 Advanced 19 Advanced 13 Advanced >>> geo_df = geo_dataframe[geo_dataframe.skey < 1010].select(['skey', 'polygons']) >>> geo_df skey polygons 1003 POLYGON ((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8)) 1008 MULTIPOLYGON (((0 0,0 20,20 20,20 0,0 0)),((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8))) 1006 POLYGON ((0 0 0,0 0 20,0 20 0,0 20 20,20 0 0,20 0 20,20 20 0,20 20 20,0 0 0)) 1009 MULTIPOLYGON (((0 0 0,0 20 20,20 20 20,20 0 20,0 0 0)),((50 50 50,50 90 90,90 90 90,90 50 90,50 50 50))) 1005 POLYGON ((0 0 0,0 0 20.435,0.0 20.435 0,0.0 20.435 20.435,20.435 0.0 0,20.435 0.0 20.435,20.435 20.435 0,20.435 20.435 20.435,0 0 0)) 1007 MULTIPOLYGON (((1 1,1 3,6 3,6 0,1 1)),((10 5,10 10,20 10,20 5,10 5))) 1001 POLYGON ((0 0,0 20,20 20,20 0,0 0)) 1002 POLYGON ((0 0,0 20,20 20,20 0,0 0),(5 5,5 10,10 10,10 5,5 5)) 1004 POLYGON ((0 0 0,0 10 20,20 20 30,20 10 0,0 0 0),(5 5 5,5 10 10,10 10 10,10 10 5,5 5 5)) >>> idf = concat([normal_df, geo_df]) >>> idf stats skey polygons id 38 Advanced None None 7 Novice None None 26 Advanced None None 17 Advanced None None 34 Advanced None None 13 Advanced None None 32 Advanced None None 11 Advanced None None 15 Advanced None None 36 Advanced None None >>>