Teradata Package for Python Function Reference | 20.00 - FillNa - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- Product Category
- Teradata Vantage
- teradataml.analytics.Transformations.FillNa.__init__ = __init__(self, style='mean', value=None, columns=None, out_columns=None, datatype=None)
- DESCRIPTION:
FillNa allows user to perform missing value/null replacement transformations.
Note:
Output of this function is passed to "fillna" argument of "Transform"
function from Vantage Analytic Library.
PARAMETERS:
style:
Optional Argument.
Specifies the nullstyle for missing value/null value replacement.
A literal value, the mean, median, mode, or an imputed value joined
from another table can be used as the replacement value. The median
value can be requested with or without averaging of two middle values
when there is an even number of values.
Literal value replacement is supported for numeric, character, and
date data types.
Mean value replacement is supported for columns of numeric type or
date type.
Median without averaging, mode, and imputed value replacement are
valid for any supported type. Median with averaging is supported
only for numeric and date type.
Permitted Values: 'literal', 'mean', 'median', 'mode', 'median_wo_mean',
'imputed'
Default Value: 'mean'
Types: str
value:
Optional Argument. Required when "style" is 'literal' or 'imputed'.
Specifies the value to be used for null replacement transformations.
Notes:
1. When "style" is 'imputed', value must be of type teradataml
DataFrame.
2. When "style" is 'literal', value can be of any type.
3. If date values are entered as string, the keyword 'DATE' must precede
the date value, and do not enclose in single quotes OR
pass a datetime.date object.
For example,
value='DATE 1987-06-09'
value=date(1987, 6, 9)
Types: teradataml DataFrame, bool, int, str, float, datetime.date
columns:
Optional Argument.
Specifies the names of the columns.
Types: str or list of str
out_columns:
Optional Argument.
Specifies the names of the output columns.
Notes:
Number of elements in "columns" and "out_columns" must be same.
Types: str or list of str
datatype:
Optional Argument.
Specifies the name of the intended datatype of the output column.
Intended data types for the output column can be specified using either the
teradatasqlalchemy types or the permitted strings mentioned below:
-------------------------------------------------------------------
| If intended SQL Data Type is | Permitted Value to be passed is |
|-------------------------------------------------------------------|
| bigint | bigint |
| byteint | byteint |
| char(n) | char,n |
| date | date |
| decimal(m,n) | decimal,m,n |
| float | float |
| integer | integer |
| number(*) | number |
| number(n) | number,n |
| number(*,n) | number,*,n |
| number(n,n) | number,n,n |
| smallint | smallint |
| time(p) | time,p |
| timestamp(p) | timestamp,p |
| varchar(n) | varchar,n |
--------------------------------------------------------------------
Notes:
1. char without a size is not supported.
2. number(*) does not include the * in its datatype format.
Examples:
1. If intended datatype for the output column is "bigint", then
pass string "bigint" to the argument as shown below:
datatype="bigint"
2. If intended datatype for the output column is "decimal(3,5)", then
pass string "decimal,3,5" to the argument as shown below:
datatype="decimal,3,5"
Types: str, BIGINT, BYTEINT, CHAR, DATE, DECIMAL, FLOAT, INTEGER, NUMBER, SMALLINT, TIME,
TIMESTAMP, VARCHAR.
RETURNS:
An instance of FillNa class.
RAISES:
TeradataMlException, TypeError, ValueError
EXAMPLE:
# Note:
# To run any transformation, user needs to use Transform() function from
# Vantage Analytic Library.
# To do so import valib first and set the "val_install_location".
>>> from teradataml import configure, DataFrame, FillNa, load_example_data, valib
>>> configure.val_install_location = "SYSLIB"
>>>
# Load example data.
>>> load_example_data("dataframe", ["sales", "employee_info"])
>>>
# Create the required DataFrames.
>>> sales = DataFrame("sales")
>>> sales
Feb Jan Mar Apr datetime
accounts
Alpha Co 210.0 200.0 215.0 250.0 04/01/2017
Blue Inc 90.0 50.0 95.0 101.0 04/01/2017
Yellow Inc 90.0 NaN NaN NaN 04/01/2017
Jones LLC 200.0 150.0 140.0 180.0 04/01/2017
Red Inc 200.0 150.0 140.0 NaN 04/01/2017
Orange Inc 210.0 NaN NaN 250.0 04/01/2017
>>>
# Example 1: Replace missing values in columns 'Jan' and 'Mar', with
# a literal value 0. Output columns are named as 'january' and
# 'march' respectively.
>>> fillna_literal = FillNa(style="literal", value=0, columns=["Jan", "Mar"],
... out_columns=["january", "march"])
>>> obj = valib.Transform(data=sales, fillna=fillna_literal, key_columns="accounts")
>>> obj.result
accounts january march
0 Blue Inc 50 95
1 Orange Inc 0 0
2 Red Inc 150 140
3 Yellow Inc 0 0
4 Jones LLC 150 140
5 Alpha Co 200 215
>>>
# Example 2: Replace missing values in column 'Jan' with 'median' value from
# that column. Output column produced in the output is named as
# 'Jan2'.
>>> fillna_median = FillNa(style="median", columns="Jan", out_columns="Jan2")
>>> obj = valib.Transform(data=sales, fillna=fillna_median, key_columns="accounts")
>>> obj.result
accounts Jan2
0 Alpha Co 200.000
1 Red Inc 150.000
2 Orange Inc 150.000
3 Jones LLC 150.000
4 Yellow Inc 150.000
5 Blue Inc 50.000
>>>
# Example 3: Replace missing values in column 'Apr' with a median value
# without mean from that column.
>>> fillna_mwm = FillNa(style="median_wo_mean", columns="Apr")
>>> obj = valib.Transform(data=sales, fillna=fillna_mwm, key_columns="accounts")
>>> obj.result
accounts Apr
0 Alpha Co 250
1 Blue Inc 101
2 Yellow Inc 180
3 Jones LLC 180
4 Red Inc 180
5 Orange Inc 250
>>>
# Example 4: Replace missing values in column 'Apr' with 'mode' value from
# that column. Output column produced in the output is named as
# 'Apr2000'.
>>> fillna_mode = FillNa(style="mode", columns="Apr", out_columns="Apr2000")
>>> obj = valib.Transform(data=sales, fillna=fillna_mode, key_columns="accounts")
>>> obj.result
accounts Apr2000
0 Blue Inc 101
1 Orange Inc 250
2 Red Inc 250
3 Yellow Inc 250
4 Jones LLC 180
5 Alpha Co 250
>>>
# Example 5: Replace missing values in columns 'masters' and 'programming' using
# 'imputed' style.
>>> load_example_data("dataframe", ["admissions_train_nulls", "admissions_train"])
# Dataframe to be used for 'imputed' style replacement.
>>> admissions_train = DataFrame("admissions_train")
>>> admissions_train
masters gpa stats programming admitted
id
22 yes 3.46 Novice Beginner 0
26 yes 3.57 Advanced Advanced 1
5 no 3.44 Novice Novice 0
17 no 3.83 Advanced Advanced 1
13 no 4.00 Advanced Novice 1
19 yes 1.98 Advanced Advanced 0
36 no 3.00 Advanced Novice 0
15 yes 4.00 Advanced Advanced 1
34 yes 3.85 Advanced Beginner 0
38 yes 2.65 Advanced Beginner 1
>>>
# DataFrame containing NULL values in columns "programming" and "masters".
>>> admissions_train_nulls = DataFrame("admissions_train_nulls")
>>> admissions_train_nulls
masters gpa stats programming admitted
id
5 no 3.44 Novice Novice 0
7 yes 2.33 Novice Novice 1
22 None 3.46 Novice None 0
19 yes 1.98 Advanced Advanced 0
15 None 4.00 Advanced Advanced 1
17 None 3.83 Advanced Advanced 1
34 None 3.85 Advanced Beginner 0
13 no 4.00 Advanced Novice 1
36 no 3.00 Advanced Novice 0
40 yes 3.95 Novice Beginner 0
>>>
# Replace NULL values in the columns "masters" and "programming"
# in admissions_train_nulls dataframe with the values in the corresponding
# columns' values in admissions_train dataframe.
>>> fillna_imputed = FillNa(style="imputed",
... columns=["masters", "programming"],
... value=admissions_train)
>>> obj = valib.Transform(data=admissions_train_nulls,
... fillna=fillna_imputed,
... key_columns="id")
>>> obj.result
id masters programming
0 22 yes Beginner
1 36 no Novice
2 15 yes Advanced
3 38 yes Beginner
4 5 no Novice
5 17 no Advanced
6 34 yes Beginner
7 13 no Novice
8 26 yes Advanced
9 19 yes Advanced
>>>
# Example 6: This example shows how one can operate on date and character
# columns. Example also showcases using multiple missing value
# treatment techniques in one single call for variable
# transformation.
# Create the required DataFrames.
>>> einfo = DataFrame("employee_info")
>>> einfo
first_name marks dob joined_date
employee_no
100 abcd None None None
112 None None None 18/12/05
101 abcde None None 02/12/05
>>>
# Using literal style for missing value treatment on a date type
# column "joined_date".
>>> fillna_1 = FillNa(style="literal", value="DATE 1995-12-23",
... columns="joined_date", out_columns="date1")
# Using literal style for missing value treatment on a character type
# column "first_name". Repalce missing values with 'FNU', i.e.,
# First Name Unknown.
>>> fillna_2 = FillNa(style="literal", value="FNU", columns="first_name",
... out_columns="char1")
# Using mean value for missing value treatment on a date type
# column "joined_date".
>>> fillna_3 = FillNa(style="mean", columns="joined_date",
... out_columns="date2")
# Using median value for missing value treatment on a date type
# column "joined_date".
>>> fillna_4 = FillNa(style="median", columns="joined_date",
... out_columns="date2A")
# Using median value without mean for missing value treatment on
# a date type column "joined_date".
>>> fillna_5 = FillNa(style="median_wo_mean", columns="joined_date",
... out_columns="date3")
# Using mode value for missing value treatment on a date type
# column "joined_date".
>>> fillna_6 = FillNa(style="mode", columns="joined_date",
... out_columns="date4")
# Using median value without mean for missing value treatment on
# a character type column "first_name".
>>> fillna_7 = FillNa(style="median_wo_mean", columns="first_name",
... out_columns="char2")
# Using mode value for missing value treatment on a character type
# column "first_name".
>>> fillna_8 = FillNa(style="mode", columns="first_name",
... out_columns="char3")
# Perform the missing value transformations using Transform() function
# from Vantage Analytic Library.
>>> obj = valib.Transform(data=einfo,
... fillna=[fillna_1, fillna_2, fillna_3, fillna_4,
... fillna_5, fillna_6, fillna_7, fillna_8],
... key_columns="employee_no")
>>> obj.result
employee_no date1 char1 date2 date2A date3 date4 char2 char3
0 112 18/12/05 FNU 18/12/05 18/12/05 18/12/05 18/12/05 abcd abcd
1 101 02/12/05 abcde 02/12/05 02/12/05 02/12/05 02/12/05 abcde abcde
2 100 95/12/23 abcd 60/12/04 60/12/04 02/12/05 02/12/05 abcd abcd
>>>