Use the Array method to create array objects that can be used with the DataFrame Assign method.
To use array utilities, you must have either the UDTTYPE or UDTMETHOD privilege on the SYSUDTLIB database. Use the following commands to grant the required permissions:
GRANT UDTMETHOD ON SYSUDTLIB TO <USER>; GRANT SELECT ON UDTInfo TO <USER>;
Required Parameter
- elements
- Specifies the elements of similar Teradata types or literal values or numpy arrays to store in an array.If an empty tuple is provided, the atype defaults to ARRAY_VARCHAR with scope 100.
Optional Parameters
- atype
- Specifies the Teradata type of the array elements.
If the argument is not specified, then teradataml infers the type based on the first element of the array.
- default_null
- Specifies whether the array type prepopulates the missing values with NULL or not. When set to True, missing values are populated with NULL values, otherwise they are undefined.
Default value: False
Example setup
>>> from teradataml import load_example_data, Array, DataFrame >>> from teradatasqlalchemy.types import ARRAY_NUMBER, ARRAY_TIMESTAMP >>> import datetime
Load the 'sales' data and create a DataFrame.
>>> load_example_data("dataframe", "sales")
>>> df = DataFrame("sales")
Output
Feb Jan Mar Apr datetime accounts Orange Inc 210.0 NaN NaN 250.0 04/01/2017 Blue Inc 90.0 50.0 95.0 101.0 04/01/2017 Jones LLC 200.0 150.0 140.0 180.0 04/01/2017 Alpha Co 210.0 200.0 215.0 250.0 04/01/2017 Yellow Inc 90.0 NaN NaN NaN 04/01/2017 Red Inc 200.0 150.0 140.0 NaN 04/01/2017
Example 1: Create an Array with literal values and add it as a new column to the DataFrame
>>> arr = Array((1, 2, 3)) >>> arr
Array [elements=(1, 2, 3), atype=ARRAY_INTEGER, default_null=False]
Create 'arr_col' column using Array object.
>>> res = df.assign(arr_col = arr) >>> res
Output
Feb Jan Mar Apr datetime arr_col accounts Red Inc 200.0 150.0 140.0 NaN 04/01/2017 (1,2,3) Blue Inc 90.0 50.0 95.0 101.0 04/01/2017 (1,2,3) Jones LLC 200.0 150.0 140.0 180.0 04/01/2017 (1,2,3) Alpha Co 210.0 200.0 215.0 250.0 04/01/2017 (1,2,3) Yellow Inc 90.0 NaN NaN NaN 04/01/2017 (1,2,3) Orange Inc 210.0 NaN NaN 250.0 04/01/2017 (1,2,3)
>>> res.tdtypes
accounts VARCHAR(length=20, charset='LATIN')
Feb FLOAT()
Jan BIGINT()
Mar BIGINT()
Apr BIGINT()
datetime DATE()
arr_col ARRAY_INTEGER('[100]')
Example 2: Create an Array with DataFrame columns with atype and default_null as True and add it as a new column to the DataFrame
>>> arr = Array((df.Jan, df.Feb, df.Mar), atype=ARRAY_NUMBER('[5]'), default_null=True)
>>> arr
Output
Array [elements=(<teradataml.dataframe.sql._SQLColumnExpression object at 0x00000194F3E9BE80>, <teradataml.dataframe.sql._SQLColumnExpression object at 0x00000194F3E9B610>, <teradataml.dataframe.sql._SQLColumnExpression object at 0x00000194F3E9B1F0>), atype=ARRAY_NUMBER, default_null=True]
Create 'arr_col' column using Array object.
>>> res = df.assign(arr_col = arr) >>> res
Output
Feb Jan Mar Apr datetime arr_col accounts Yellow Inc 90.0 NaN NaN NaN 04/01/2017 (NULL,90,NULL,NULL,NULL) Jones LLC 200.0 150.0 140.0 180.0 04/01/2017 (150,200,140,NULL,NULL) Alpha Co 210.0 200.0 215.0 250.0 04/01/2017 (200,210,215,NULL,NULL) Red Inc 200.0 150.0 140.0 NaN 04/01/2017 (150,200,140,NULL,NULL) Blue Inc 90.0 50.0 95.0 101.0 04/01/2017 (50,90,95,NULL,NULL) Orange Inc 210.0 NaN NaN 250.0 04/01/2017 (NULL,210,NULL,NULL,NULL)
>>> res.tdtypes
Output
accounts VARCHAR(length=20, charset='LATIN')
Feb FLOAT()
Jan BIGINT()
Mar BIGINT()
Apr BIGINT()
datetime DATE()
arr_col ARRAY_NUMBER('[5]', default_null=True)
Example 3: Creating an Array to store timestamp-type values and adding it as a new column to the DataFrame
>>> arr = Array((datetime.datetime(2024, 6, 20, 12, 0, 52), datetime.datetime(2025, 6, 2, 15, 32)), atype=ARRAY_TIMESTAMP('[1:2]', timezone=True))
>>> arr
Output
Array [elements=(datetime.datetime(2024, 6, 20, 12, 0, 52), datetime.datetime(2025, 6, 2, 15, 32)), atype=ARRAY_TIMESTAMP, default_null=False]
Create 'arr_col' column using Array object.
>>> res = df.assign(arr_col = arr) >>> res
Output
Feb Jan Mar Apr datetime arr_col accounts Alpha Co 210.0 200.0 215.0 250.0 04/01/2017 (2024-06-20 12:00:52.000000-00:00,2025-06-02 15:32:00.000000-00:00) Jones LLC 200.0 150.0 140.0 180.0 04/01/2017 (2024-06-20 12:00:52.000000-00:00,2025-06-02 15:32:00.000000-00:00) Blue Inc 90.0 50.0 95.0 101.0 04/01/2017 (2024-06-20 12:00:52.000000-00:00,2025-06-02 15:32:00.000000-00:00) Orange Inc 210.0 NaN NaN 250.0 04/01/2017 (2024-06-20 12:00:52.000000-00:00,2025-06-02 15:32:00.000000-00:00) Yellow Inc 90.0 NaN NaN NaN 04/01/2017 (2024-06-20 12:00:52.000000-00:00,2025-06-02 15:32:00.000000-00:00) Red Inc 200.0 150.0 140.0 NaN 04/01/2017 (2024-06-20 12:00:52.000000-00:00,2025-06-02 15:32:00.000000-00:00)
>>> res.tdtypes
Output
accounts VARCHAR(length=20, charset='LATIN')
Feb FLOAT()
Jan BIGINT()
Mar BIGINT()
Apr BIGINT()
datetime DATE()
arr_col ARRAY_TIMESTAMP('[1:2]', timezone=True)