Use the cast() method to cast a DataFrame column (Column Expression) to a specified teradatasqlalchemy type. It can be used with the DataFrame methods assign and filter.
- type_: Specifies a teradatasqlalchemy type or an object of a teradatasqlalchemy type that the column needs to be cast to.
Default value is None.
Types: teradatasqlalchemy type or object of teradatasqlalchemy type
- format: Specifies a variable length string containing formatting characters that define the display format for the data type.
Formats can be specified for columns that have character, numeric, byte, DateTime, Period or UDT data types.
Teradata supports different formats. Refer to "Data Type Formats and Format Phrases" in Teradata Vantage™ - Data Types and Literals, B035-1143.Default value: None
Types: str
- timezone: Specifies the timezone string.
Refer to Teradata Vantage™ - SQL Date and Time Functions and Expressions, B035-1211 for supported timezones.
Type: ColumnExpression or str.
Example Setup
>>> load_example_data("dataframe","admissions_train")
>>> df = DataFrame('admissions_train')
>>> df
masters gpa stats programming admitted id 13 no 4.00 Advanced Novice 1 26 yes 3.57 Advanced Advanced 1 5 no 3.44 Novice Novice 0 19 yes 1.98 Advanced Advanced 0 15 yes 4.00 Advanced Advanced 1 40 yes 3.95 Novice Beginner 0 7 yes 2.33 Novice Novice 1 22 yes 3.46 Novice Beginner 0 36 no 3.00 Advanced Novice 0 38 yes 2.65 Advanced Beginner 1
>>> df.dtypes
id int masters str gpa float stats str programming str admitted int
Example 1: Use assign to create a new DataFrame with a new column, casted to VARCHAR(5) type
Use assign() to create a new DataFrame with a new column 'char_id', which is the 'id' column (of type INTEGER) cast to VARCHAR(5) (object of teradatasqlalchemy type corresponding to SQL Type VARCHAR(5)).
>>> from teradatasqlalchemy import VARCHAR
>>> new_df = df.assign(char_id = df.id.cast(type_=VARCHAR(5)))
>>> new_df
masters gpa stats programming admitted char_id id 5 no 3.44 Novice Novice 0 5 34 yes 3.85 Advanced Beginner 0 34 13 no 4.00 Advanced Novice 1 13 40 yes 3.95 Novice Beginner 0 40 22 yes 3.46 Novice Beginner 0 22 19 yes 1.98 Advanced Advanced 0 19 36 no 3.00 Advanced Novice 0 36 15 yes 4.00 Advanced Advanced 1 15 7 yes 2.33 Novice Novice 1 7 17 no 3.83 Advanced Advanced 1 17
>>> new_df.dtypes
id int masters str gpa float stats str programming str admitted int char_id str
Example 2: Use assign to create a new DataFrame with a new column, casted to VARCHAR type
Use assign() to create a new DataFrame with a new column 'char_id', which is the 'id' column (of type INTEGER) cast to VARCHAR (teradatasqlalchemy type corresponding to SQL Type VARCHAR).
>>> new_df_2 = df.assign(char_id = df.id.cast(type_=VARCHAR))
>>> new_df_2
masters gpa stats programming admitted char_id id 5 no 3.44 Novice Novice 0 5 34 yes 3.85 Advanced Beginner 0 34 13 no 4.00 Advanced Novice 1 13 40 yes 3.95 Novice Beginner 0 40 22 yes 3.46 Novice Beginner 0 22 19 yes 1.98 Advanced Advanced 0 19 36 no 3.00 Advanced Novice 0 36 15 yes 4.00 Advanced Advanced 1 15 7 yes 2.33 Novice Novice 1 7 17 no 3.83 Advanced Advanced 1 17
>>> new_df_2.dtypes
id int masters str gpa float stats str programming str admitted int char_id str
Example 3: Filter data with match on a column cast to another type
>>> df[df.id.cast(VARCHAR(5)) == '1']
masters gpa stats programming admitted id 1 yes 3.95 Beginner Beginner 0
>>> df[df.id.cast(VARCHAR) == '1']
masters gpa stats programming admitted id 1 yes 3.95 Beginner Beginner 0
Example 4: Create new DataFrame casting 'timestamp_col' column (of type VARCHAR) to TIMESTAMP, using format
>>> dataframe_dict = {"id": [100, 200,300],
>>> "timestamp_col": ['1000-01-10 23:00:12-02:00', '2015-01-08 13:00:00+12:00', '2014-12-10 10:00:35-08:00'], >>> "timezone_col": ["GMT", "America Pacific", "GMT+10"]} >>> pandas_df = pd.DataFrame(dataframe_dict) >>> copy_to_sql(pandas_df, table_name = 'new_table', if_exists = 'replace')
>>> df1 = DataFrame("new_table") >>> df1
id timestamp_col timezone_col 300 2014-12-10 10:00:35-08:00 GMT+10 200 2015-01-08 13:00:00+12:00 America Pacific 100 1000-01-10 23:00:12-02:00 GMT
>>> df1.dtypes
id int timestamp_col str timezone_col str
>>> new_df1 = df1.assign(new_col = df1.timestamp_col.cast(TIMESTAMP, format='Y4-MM-DDBHH:MI:SSBZ'))
id timestamp_col timezone_col new_col 300 2014-12-10 10:00:35-08:00 GMT+10 2014-12-10 18:00:35 200 2015-01-08 13:00:00+12:00 America Pacific 2015-01-08 01:00:00 100 1000-01-10 23:00:12-02:00 GMT 1000-01-11 01:00:12
>>> new_df1.tdtypes
id int timestamp_col str timezone_col str new_col datetime.datetime
Example 5: Create new DataFrame casting 'id' column (of type INTEGER) to VARCHAR, using format
>>> new_df2 = df1.assign(new_col = df1.id.cast(VARCHAR, format='zzz.zz'))
id timestamp_col timezone_col new_col 300 2014-12-10 10:00:35-08:00 GMT+10 300.00 200 2015-01-08 13:00:00+12:00 America Pacific 200.00 100 1000-01-10 23:00:12-02:00 GMT 100.00
>>> new_df2.dtypes
id int timestamp_col str timezone_col str new_col str
Example 6: Create new DataFrame casting 'timestamp_with_timezone' column (of type TIMESTAMP) to TIMESTAMP WITH TIMEZONE, with offset 'GMT+10'
>>> new_df3 = new_df1.assign(timestamp_with_timezone = new_df1.new_col.cast(TIMESTAMP(timezone=True), timezone='GMT+10'))
id timestamp_col timezone_col new_col timestamp_with_timezone 300 2014-12-10 10:00:35-08:00 GMT+10 2014-12-10 18:00:35 2014-12-11 04:00:35.000000+10:00 200 2015-01-08 13:00:00+12:00 America Pacific 2015-01-08 01:00:00 2015-01-08 11:00:00.000000+10:00 100 1000-01-10 23:00:12-02:00 GMT 1000-01-11 01:00:12 1000-01-11 11:00:12.000000+10:00
>>> new_df3.dtypes
id int timestamp_col str timezone_col str new_col datetime.datetime timestamp_with_timezone datetime.datetime