Teradata Package for Python Function Reference | 17.10 - dropna - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product
Teradata Package for Python
Release Number
17.10
Published
April 2022
Language
English (United States)
Last Update
2022-08-19
lifecycle
previous
Product Category
Teradata Vantage
teradataml.geospatial.geodataframe.GeoDataFrame.dropna = dropna(self, how='any', thresh=None, subset=None)
DESCRIPTION:
    Removes rows with null values.
 
PARAMETERS:
    how:
        Optional Argument.
        Specifies how rows are removed.
        'any' removes rows with at least one null value.
        'all' removes rows with all null values.
        Default Value: 'any'
        Permitted Values: 'any' or 'all'
        Types: str
 
    thresh:
        Optional Argument.
        Specifies the minimum number of non null values in a row to include.
        Types: int
 
    subset:
        Optional Argument.
        Specifies list of column names to include, in array-like format.
        Types: str OR list of Strings (str)
 
RETURNS:
    teradataml GeoDataFrame
 
RAISE:
    TeradataMlException
 
EXAMPLES:
    >>> load_example_data("geodataframe","sample_shapes")
    >>> df = GeoDataFrame('sample_shapes').select(["skey", "geosequence"])
    >>> df
                                                                                                                                                                         geosequence
    skey
    1006                                                                                                                                                                        None
    1001                           GEOSEQUENCE((10 20,30 40,50 60),(2007-08-22 12:05:23.560000,2007-08-22 12:08:25.140000,2007-08-22 12:11:41.520000),(1,2,3),(2,10,12,11,18,21,19))
    1002  GEOSEQUENCE((10 10,15 15,-2 0),(2007-03-14 01:35:00.000000,2007-03-14 01:35:05.000000,2007-03-14 01:35:08.000000),(1222,1223,1224),(2,12.1,3.14159,2.78128,-10,-11,100.1))
    1010                                                                                                                                                                        None
    1004                                                                                                                                                                        None
    1003                                                                                                                                                                        None
    1008                                                                                                                                                                        None
    1005                                                                                                                                                                        None
    1007                                                                                                                                                                        None
    1009                                                                                                                                                                        None
    >>>
 
    # Drop the rows where at least one element is null.
    >>> df.dropna()
                                                                                                                                                                         geosequence
    skey
    1002  GEOSEQUENCE((10 10,15 15,-2 0),(2007-03-14 01:35:00.000000,2007-03-14 01:35:05.000000,2007-03-14 01:35:08.000000),(1222,1223,1224),(2,12.1,3.14159,2.78128,-10,-11,100.1))
    1001                           GEOSEQUENCE((10 20,30 40,50 60),(2007-08-22 12:05:23.560000,2007-08-22 12:08:25.140000,2007-08-22 12:11:41.520000),(1,2,3),(2,10,12,11,18,21,19))
    >>>
 
    # Drop the rows where all elements are nulls for columns 'geosequence' and 'skey'.
    >>> df.dropna(how='all', subset=['skey','geosequence'])
                                                                                                                                                                         geosequence
    skey
    1008                                                                                                                                                                        None
    1003                                                                                                                                                                        None
    1009                                                                                                                                                                        None
    1007                                                                                                                                                                        None
    1005                                                                                                                                                                        None
    1001                           GEOSEQUENCE((10 20,30 40,50 60),(2007-08-22 12:05:23.560000,2007-08-22 12:08:25.140000,2007-08-22 12:11:41.520000),(1,2,3),(2,10,12,11,18,21,19))
    1006                                                                                                                                                                        None
    1004                                                                                                                                                                        None
    1010                                                                                                                                                                        None
    1002  GEOSEQUENCE((10 10,15 15,-2 0),(2007-03-14 01:35:00.000000,2007-03-14 01:35:05.000000,2007-03-14 01:35:08.000000),(1222,1223,1224),(2,12.1,3.14159,2.78128,-10,-11,100.1))
    >>>
 
    # Keep only the rows with at least 2 non null values.
    >>> df.dropna(thresh=2)
                                                                                                                                                                         geosequence
    skey
    1001                           GEOSEQUENCE((10 20,30 40,50 60),(2007-08-22 12:05:23.560000,2007-08-22 12:08:25.140000,2007-08-22 12:11:41.520000),(1,2,3),(2,10,12,11,18,21,19))
    1002  GEOSEQUENCE((10 10,15 15,-2 0),(2007-03-14 01:35:00.000000,2007-03-14 01:35:05.000000,2007-03-14 01:35:08.000000),(1222,1223,1224),(2,12.1,3.14159,2.78128,-10,-11,100.1))
    >>>