concat | SET Operations | Teradata Package for Python - concat - Teradata Vantage

Teradata® VantageCloud Lake

Deployment
VantageCloud
Edition
Lake
Product
Teradata Vantage
Published
January 2023
Language
English (United States)
Last Update
2024-02-17
dita:mapPath
phg1621910019905.ditamap
dita:ditavalPath
pny1626732985837.ditaval
dita:id
phg1621910019905

Use the concat() API to concatenate a list of teradataml DataFrames, GeoDataFrames, or both, along the index axis. The operation is performed by carrying out a database-style UNION or UNION ALL operation.

If the list contains both teradataml DataFrames and GeoDataFrames, that is, it contains geometry data, the function returns a GeoDataFrame. See example 6.

Example Prerequisites

>>> df = DataFrame("admissions_train")
>>> df
   masters   gpa     stats programming admitted
id
22     yes  3.46    Novice    Beginner        0
36      no  3.00  Advanced      Novice        0
15     yes  4.00  Advanced    Advanced        1
38     yes  2.65  Advanced    Beginner        1
5       no  3.44    Novice      Novice        0
17      no  3.83  Advanced    Advanced        1
34     yes  3.85  Advanced    Beginner        0
13      no  4.00  Advanced      Novice        1
26     yes  3.57  Advanced    Advanced        1
19     yes  1.98  Advanced    Advanced        0
>>> df1 = df[df.gpa == 4].select(['id', 'stats', 'masters', 'gpa'])
>>> df1
       stats masters  gpa
id
13  Advanced      no  4.0
29    Novice     yes  4.0
15  Advanced     yes  4.0
>>> df2 = df[df.gpa < 2].select(['id', 'stats', 'programming', 'admitted'])
>>> df2
       stats programming admitted
id
24  Advanced      Novice        1
19  Advanced    Advanced        0

Example 1: Run concat() with default values for optional arguments

>>> cdf = concat([df1,df2])
>>> cdf
       stats masters  gpa programming admitted
id
19  Advanced    None  NaN    Advanced        0
24  Advanced    None  NaN      Novice        1
13  Advanced      no  4.0        None     None
29    Novice     yes  4.0        None     None
15  Advanced     yes  4.0        None     None

Example 2: Run concat() with optional argument "join"

Set join = inner

>>> cdf = concat([df1,df2], join='inner')
>>> cdf
       stats
id
19  Advanced
24  Advanced
13  Advanced
29    Novice
15  Advanced

Example 3: Run concat() with optional argument "allow_duplicates"

  • Set allow_duplicates = True (default)
    >>> cdf = concat([df1,df2])
    >>> cdf
           stats masters  gpa programming admitted
    id
    19  Advanced    None  NaN    Advanced        0
    24  Advanced    None  NaN      Novice        1
    13  Advanced      no  4.0        None     None
    29    Novice     yes  4.0        None     None
    15  Advanced     yes  4.0        None     None
    >>> cdf = concat([cdf,df2])
    >>> cdf
           stats masters  gpa programming admitted
    id
    19  Advanced    None  NaN    Advanced        0
    13  Advanced      no  4.0        None     None
    24  Advanced    None  NaN      Novice        1
    24  Advanced    None  NaN      Novice        1
    19  Advanced    None  NaN    Advanced        0
    29    Novice     yes  4.0        None     None
    15  Advanced     yes  4.0        None     None
  • Set allow_duplicates = False
    >>> cdf = concat([cdf,df2], allow_duplicates=False)
    >>> cdf
           stats masters  gpa programming admitted
    id
    19  Advanced    None  NaN    Advanced        0
    29    Novice     yes  4.0        None     None
    24  Advanced    None  NaN      Novice        1
    15  Advanced     yes  4.0        None     None
    13  Advanced      no  4.0        None     None

Example 4: Run concat() with optional argument "sort"

Set sort=True

>>> cdf = concat([df1,df2], sort=True)
>>> cdf
   admitted  gpa masters programming     stats
id
19        0  NaN    None    Advanced  Advanced
24        1  NaN    None      Novice  Advanced
13     None  4.0      no        None  Advanced
29     None  4.0     yes        None    Novice
15     None  4.0     yes        None  Advanced

Example 5: Perform concatenation of two GeoDataFrames

  • Create GeoDataFrames
    >>> geo_dataframe = GeoDataFrame('sample_shapes')
    >>> geo_dataframe1 = geo_dataframe[geo_dataframe.skey == 1004].select(['skey','linestrings'])
    
    >>> geo_dataframe1
    skey            linestrings
    1004  LINESTRING (10 20 30,40 50 60,70 80 80)
    >>> geo_dataframe2 = geo_dataframe[geo_dataframe.skey < 1010].select(['skey','polygons'])
    
    >>> geo_dataframe2
    skey                                                              polygons
    1009                               MULTIPOLYGON (((0 0 0,0 20 20,20 20 20,20 0 20,0 0 0)),((50 50 50,50 90 90,90 90 90,90 50 90,50 50 50)))
    1005  POLYGON ((0 0 0,0 0 20.435,0.0 20.435 0,0.0 20.435 20.435,20.435 0.0 0,20.435 0.0 20.435,20.435 20.435 0,20.435 20.435 20.435,0 0 0))
    1004                                                POLYGON ((0 0 0,0 10 20,20 20 30,20 10 0,0 0 0),(5 5 5,5 10 10,10 10 10,10 10 5,5 5 5))
    1002                                                                          POLYGON ((0 0,0 20,20 20,20 0,0 0),(5 5,5 10,10 10,10 5,5 5))
    1001                                                                                                    POLYGON ((0 0,0 20,20 20,20 0,0 0))
    1003                                                                                POLYGON ((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8))
    1007                                                                  MULTIPOLYGON (((1 1,1 3,6 3,6 0,1 1)),((10 5,10 10,20 10,20 5,10 5)))
    1006                                                          POLYGON ((0 0 0,0 0 20,0 20 0,0 20 20,20 0 0,20 0 20,20 20 0,20 20 20,0 0 0))
    1008                                             MULTIPOLYGON (((0 0,0 20,20 20,20 0,0 0)),((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8)))
  • Perform concatenation
    >>> concat([geo_dataframe1,geo_dataframe2])
    skey                                    linestrings                                 polygons
    1009                                     None                               MULTIPOLYGON (((0 0 0,0 20 20,20 20 20,20 0 20,0 0 0)),((50 50 50,50 90 90,90 90 90,90 50 90,50 50 50)))
    1005                                     None  POLYGON ((0 0 0,0 0 20.435,0.0 20.435 0,0.0 20.435 20.435,20.435 0.0 0,20.435 0.0 20.435,20.435 20.435 0,20.435 20.435 20.435,0 0 0))
    1004  LINESTRING (10 20 30,40 50 60,70 80 80)                                                                                                                                   None
    1004                                     None                                                POLYGON ((0 0 0,0 10 20,20 20 30,20 10 0,0 0 0),(5 5 5,5 10 10,10 10 10,10 10 5,5 5 5))
    1003                                     None                                                                                POLYGON ((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8))
    1001                                     None                                                                                                    POLYGON ((0 0,0 20,20 20,20 0,0 0))
    1002                                     None                                                                          POLYGON ((0 0,0 20,20 20,20 0,0 0),(5 5,5 10,10 10,10 5,5 5))
    1007                                     None                                                                  MULTIPOLYGON (((1 1,1 3,6 3,6 0,1 1)),((10 5,10 10,20 10,20 5,10 5)))
    1006                                     None                                                          POLYGON ((0 0 0,0 0 20,0 20 0,0 20 20,20 0 0,20 0 20,20 20 0,20 20 20,0 0 0))
    1008                                     None                                             MULTIPOLYGON (((0 0,0 20,20 20,20 0,0 0)),((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8)))
    

Example 6: Perform concatenation of a DataFrame and GeoDataFrame

  • >>> normal_df=df.select(['id','stats'])
    
    >>> normal_df
        stats
    id
    34  Advanced
    32  Advanced
    11  Advanced
    40    Novice
    38  Advanced
    36  Advanced
    7     Novice
    26  Advanced
    19  Advanced
    13  Advanced
    >>> geo_df = geo_dataframe[geo_dataframe.skey < 1010].select(['skey', 'polygons'])
    
    >>> geo_df
    skey                                                                            polygons
    1003                                                                                POLYGON ((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8))
    1008                                             MULTIPOLYGON (((0 0,0 20,20 20,20 0,0 0)),((0.6 0.8,0.6 20.8,20.6 20.8,20.6 0.8,0.6 0.8)))
    1006                                                          POLYGON ((0 0 0,0 0 20,0 20 0,0 20 20,20 0 0,20 0 20,20 20 0,20 20 20,0 0 0))
    1009                               MULTIPOLYGON (((0 0 0,0 20 20,20 20 20,20 0 20,0 0 0)),((50 50 50,50 90 90,90 90 90,90 50 90,50 50 50)))
    1005  POLYGON ((0 0 0,0 0 20.435,0.0 20.435 0,0.0 20.435 20.435,20.435 0.0 0,20.435 0.0 20.435,20.435 20.435 0,20.435 20.435 20.435,0 0 0))
    1007                                                                  MULTIPOLYGON (((1 1,1 3,6 3,6 0,1 1)),((10 5,10 10,20 10,20 5,10 5)))
    1001                                                                                                    POLYGON ((0 0,0 20,20 20,20 0,0 0))
    1002                                                                          POLYGON ((0 0,0 20,20 20,20 0,0 0),(5 5,5 10,10 10,10 5,5 5))
    1004                                                POLYGON ((0 0 0,0 10 20,20 20 30,20 10 0,0 0 0),(5 5 5,5 10 10,10 10 10,10 10 5,5 5 5))
    >>> idf = concat([normal_df,geo_df])
    
    >>> idf
        stats     skey   polygons
    id
    38  Advanced  None     None
    7     Novice  None     None
    26  Advanced  None     None
    17  Advanced  None     None
    34  Advanced  None     None
    13  Advanced  None     None
    32  Advanced  None     None
    11  Advanced  None     None
    15  Advanced  None     None
    36  Advanced  None     None