concat | Teradata Python Package - concat - Teradata Package for Python

Teradata® Package for Python User Guide

Product
Teradata Package for Python
Release Number
17.00
Published
November 2021
Language
English (United States)
Last Update
2022-01-14
dita:mapPath
bol1585763678431.ditamap
dita:ditavalPath
ayr1485454803741.ditaval
dita:id
B700-4006
lifecycle
previous
Product Category
Teradata Vantage

Use the concat() API to concatenate two or more teradataml DataFrame objects along the index axis. The operation is performed by carrying out a database-style union or union all operation.

Example Prerequisites

>>> df = DataFrame("admissions_train")
>>> df
   masters   gpa     stats programming admitted
id
22     yes  3.46    Novice    Beginner        0
36      no  3.00  Advanced      Novice        0
15     yes  4.00  Advanced    Advanced        1
38     yes  2.65  Advanced    Beginner        1
5       no  3.44    Novice      Novice        0
17      no  3.83  Advanced    Advanced        1
34     yes  3.85  Advanced    Beginner        0
13      no  4.00  Advanced      Novice        1
26     yes  3.57  Advanced    Advanced        1
19     yes  1.98  Advanced    Advanced        0
>>> df1 = df[df.gpa == 4].select(['id', 'stats', 'masters', 'gpa'])
>>> df1
       stats masters  gpa
id
13  Advanced      no  4.0
29    Novice     yes  4.0
15  Advanced     yes  4.0
>>> df2 = df[df.gpa < 2].select(['id', 'stats', 'programming', 'admitted'])
>>> df2
       stats programming admitted
id
24  Advanced      Novice        1
19  Advanced    Advanced        0

Example 1: Run concat() with default values for optional arguments

>>> cdf = concat([df1,df2])
>>> cdf
       stats masters  gpa programming admitted
id
19  Advanced    None  NaN    Advanced        0
24  Advanced    None  NaN      Novice        1
13  Advanced      no  4.0        None     None
29    Novice     yes  4.0        None     None
15  Advanced     yes  4.0        None     None

Example 2: Run concat() with optional argument "join"

Set join = inner

>>> cdf = concat([df1,df2], join='inner')
>>> cdf
       stats
id
19  Advanced
24  Advanced
13  Advanced
29    Novice
15  Advanced

Example 3: Run concat() with optional argument "allow_duplicates"

  • Set allow_duplicates = True (default)
    >>> cdf = concat([df1,df2])
    >>> cdf
           stats masters  gpa programming admitted
    id
    19  Advanced    None  NaN    Advanced        0
    24  Advanced    None  NaN      Novice        1
    13  Advanced      no  4.0        None     None
    29    Novice     yes  4.0        None     None
    15  Advanced     yes  4.0        None     None
    
    >>> cdf = concat([cdf,df2])
    >>> cdf
           stats masters  gpa programming admitted
    id
    19  Advanced    None  NaN    Advanced        0
    13  Advanced      no  4.0        None     None
    24  Advanced    None  NaN      Novice        1
    24  Advanced    None  NaN      Novice        1
    19  Advanced    None  NaN    Advanced        0
    29    Novice     yes  4.0        None     None
    15  Advanced     yes  4.0        None     None
  • Set allow_duplicates = False
    >>> cdf = concat([cdf,df2], allow_duplicates=False)
    >>> cdf
           stats masters  gpa programming admitted
    id
    19  Advanced    None  NaN    Advanced        0
    29    Novice     yes  4.0        None     None
    24  Advanced    None  NaN      Novice        1
    15  Advanced     yes  4.0        None     None
    13  Advanced      no  4.0        None     None

Example 4: Run concat() with optional argument "sort"

Set sort=True

>>> cdf = concat([df1,df2], sort=True)
>>> cdf
   admitted  gpa masters programming     stats
id
19        0  NaN    None    Advanced  Advanced
24        1  NaN    None      Novice  Advanced
13     None  4.0      no        None  Advanced
29     None  4.0     yes        None    Novice
15     None  4.0     yes        None  Advanced