Use the dropna() method to remove rows with null values in a DataFrame.
Arguments:
- how: optional argument specifies how rows are removed. It has options 'any' or 'all'.
- 'any': Removes rows with at least one null value.
- 'all': Removes rows with all null values.
- thresh: optional argument specifies the minimum number of non-null values in a row to include.
- subset: optional argument specifies list of column names to include, in array-like format. Use this argument to limit the search for null values to specific columns.
Examples Prerequisite
Assume the table "sales" exists. And a DataFrame "df" is created using the command:
>>> df = DataFrame("sales")
>>> df Feb Jan Mar Apr datetime accounts Jones LLC 200.0 150 140 180 2017-04-01 Yellow Inc 90.0 None None None 2017-04-01 Orange Inc 210.0 None None 250 2017-04-01 Blue Inc 90.0 50 95 101 2017-04-01 Alpha Co 210.0 200 215 250 2017-04-01 Red Inc 200.0 150 140 None 2017-04-01
Example 1: Drop rows with at least one Null value
>>> df.dropna()
Feb Jan Mar Apr datetime accounts Blue Inc 90.0 50 95 101 2017-04-01 Jones LLC 200.0 150 140 180 2017-04-01 Alpha Co 210.0 200 215 250 2017-04-01
Example 2: Keep rows with at least four Non-Null values
>>> df.dropna(thresh=4)
Feb Jan Mar Apr datetime accounts Jones LLC 200.0 150 140 180 2017-04-01 Blue Inc 90.0 50 95 101 2017-04-01 Orange Inc 210.0 None None 250 2017-04-01 Alpha Co 210.0 200 215 250 2017-04-01 Red Inc 200.0 150 140 None 2017-04-01
Example 3: Keep rows with at least five Non-Null values
>>> df.dropna(thresh=5)
Feb Jan Mar Apr datetime accounts Alpha Co 210.0 200 215 250 2017-04-01 Jones LLC 200.0 150 140 180 2017-04-01 Blue Inc 90.0 50 95 101 2017-04-01 Red Inc 200.0 150 140 None 2017-04-01
Example 4: Drop rows with all Null values in columns 'Jan' and 'Mar'
>>> df.dropna(how='all', subset=['Jan','Mar'])
Feb Jan Mar Apr datetime accounts Alpha Co 210.0 200 215 250 2017-04-01 Jones LLC 200.0 150 140 180 2017-04-01 Red Inc 200.0 150 140 None 2017-04-01 Blue Inc 90.0 50 95 101 2017-04-01