Teradata Package for Python Function Reference - concat - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.00

Published

November 2021

Language

English (United States)

Last Update

2021-11-19

lifecycle

Product Category

Teradata Vantage

teradataml.dataframe.setop.concat = concat(df_list, join='OUTER', allow_duplicates=True, sort=False, ignore_index=False): DESCRIPTION: Concatenates a list of teradataml DataFrames along the index axis. PARAMETERS: df_list: Required argument. Specifies the list of teradataml DataFrames on which the concatenation is to be performed. Types: list of teradataml DataFrames join: Optional argument. Specifies how to handle indexes on columns axis. Supported values are: • 'OUTER': It instructs the function to project all columns from all the DataFrames. Columns not present in any DataFrame will have a SQL NULL value. • 'INNER': It instructs the function to project only the columns common to all DataFrames. Default value: 'OUTER' Permitted values: 'INNER', 'OUTER' Types: str allow_duplicates: Optional argument. Specifies if the result of concatenation can have duplicate rows. Default value: True Types: bool sort: Optional argument. Specifies a flag to sort the columns axis if it is not already aligned when the join argument is set to 'outer'. Default value: False Types: bool ignore_index: Optional argument. Specifies whether to ignore the index columns in resulting DataFrame or not. If True, then index columns will be ignored in the concat operation. Default value: False Types: bool RETURNS: teradataml DataFrame RAISES: TeradataMlException EXAMPLES: >>> from teradataml import load_example_data >>> load_example_data("dataframe", "admissions_train") >>> from teradataml.dataframe import concat >>> >>> # Default options >>> df = DataFrame('admissions_train') >>> df1 = df[df.gpa == 4].select(['id', 'stats', 'masters', 'gpa']) >>> df1 stats masters gpa id 13 Advanced no 4.0 29 Novice yes 4.0 15 Advanced yes 4.0 >>> df2 = df[df.gpa < 2].select(['id', 'stats', 'programming', 'admitted']) >>> df2 stats programming admitted id 24 Advanced Novice 1 19 Advanced Advanced 0 >>> cdf = concat([df1, df2]) >>> cdf stats masters gpa programming admitted id 19 Advanced None NaN Advanced 0 24 Advanced None NaN Novice 1 13 Advanced no 4.0 None None 29 Novice yes 4.0 None None 15 Advanced yes 4.0 None None >>> >>> # concat more than two DataFrames >>> df3 = df[df.gpa == 3].select(['id', 'stats', 'programming', 'gpa']) >>> df3 stats programming gpa id 36 Advanced Novice 3.0 >>> cdf = concat([df1, df2, df3]) >>> cdf stats masters gpa programming admitted id 15 Advanced yes 4.0 None NaN 19 Advanced None NaN Advanced 0.0 36 Advanced None 3.0 Novice NaN 29 Novice yes 4.0 None NaN 13 Advanced no 4.0 None NaN 24 Advanced None NaN Novice 1.0 >>> # join = 'inner' >>> cdf = concat([df1, df2], join='inner') >>> cdf stats id 19 Advanced 24 Advanced 13 Advanced 29 Novice 15 Advanced >>> >>> # allow_duplicates = True (default) >>> cdf = concat([df1, df2]) >>> cdf stats masters gpa programming admitted id 19 Advanced None NaN Advanced 0 24 Advanced None NaN Novice 1 13 Advanced no 4.0 None None 29 Novice yes 4.0 None None 15 Advanced yes 4.0 None None >>> cdf = concat([cdf, df2]) >>> cdf stats masters gpa programming admitted id 19 Advanced None NaN Advanced 0 13 Advanced no 4.0 None None 24 Advanced None NaN Novice 1 24 Advanced None NaN Novice 1 19 Advanced None NaN Advanced 0 29 Novice yes 4.0 None None 15 Advanced yes 4.0 None None >>> >>> # allow_duplicates = False >>> cdf = concat([cdf, df2], allow_duplicates=False) >>> cdf stats masters gpa programming admitted id 19 Advanced None NaN Advanced 0 29 Novice yes 4.0 None None 24 Advanced None NaN Novice 1 15 Advanced yes 4.0 None None 13 Advanced no 4.0 None None >>> >>> # sort = True >>> cdf = concat([df1, df2], sort=True) >>> cdf admitted gpa masters programming stats id 19 0 NaN None Advanced Advanced 24 1 NaN None Novice Advanced 13 None 4.0 no None Advanced 29 None 4.0 yes None Novice 15 None 4.0 yes None Advanced >>> >>> # ignore_index = True >>> cdf = concat([df1, df2], ignore_index=True) >>> cdf stats masters gpa programming admitted 0 Advanced yes 4.0 None NaN 1 Advanced None NaN Advanced 0.0 2 Novice yes 4.0 None NaN 3 Advanced None NaN Novice 1.0 4 Advanced no 4.0 None NaN