Teradata Package for Python Function Reference - count - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product
Teradata Package for Python
Release Number
17.00
Published
November 2021
Language
English (United States)
Last Update
2021-11-19
lifecycle
previous
Product Category
Teradata Vantage
teradataml.dataframe.dataframe.DataFrame.count = count(self, distinct=False)
DESCRIPTION:
    Returns column-wise count of the dataframe.
 
PARAMETERS:
    distinct:
        Optional Argument.
        Specifies whether to exclude duplicate values while calculating the count.
        Default Value: False
        Types: bool
 
RETURNS:
    teradataml DataFrame object with count() operation
    performed.
 
RAISES:
    TeradataMLException
    1. EXECUTION_FAILED - If count() operation fails to
        generate the column-wise count of the dataframe.
 
        Possible error message:
        Failed to perform 'count'. (Followed by error message).
 
    2. TDMLDF_AGGREGATE_COMBINED_ERR - If the count() operation
        doesn't support all the columns in the dataframe.
 
        Possible error message:
        No results. Below is/are the error message(s):
        All selected columns [(col2 -  PERIOD_TIME), (col3 -
        BLOB)] is/are unsupported for 'count' operation.
 
EXAMPLES :
    # Load the data to run the example.
    >>> from teradataml.data.load_example_data import load_example_data
    >>> load_example_data("dataframe", ["employee_info"])
 
    # Create teradataml dataframe.
    >>> df1 = DataFrame("employee_info")
    >>> print(df1)
                first_name marks   dob joined_date
    employee_no
    101              abcde  None  None    02/12/05
    100               abcd  None  None        None
    112               None  None  None    18/12/05
    >>>
 
    # Select only subset of columns from the DataFrame.
    >>> df2 = df1.select(['employee_no', 'first_name', 'marks'])
 
    # Prints count of the values in all the selected columns
    # (excluding None types).
    >>> df2.count()
      count_employee_no count_first_name count_marks
    0                 3                2           0
    >>>
 
    #
    # Using count() as Time Series Aggregate.
    #
    >>> # Load the example datasets.
    ... load_example_data("dataframe", ["ocean_buoys_seq"])
    >>>
 
    #
    # Time Series Aggregate Example 1: Executing count() function on DataFrame created on
    #                                  sequenced PTI table. We will consider all rows for the
    #                                  columns while calculating the count.
    #
    >>> ocean_buoys_seq = DataFrame("ocean_buoys_seq")
    >>> ocean_buoys_seq.columns
    ['TD_TIMECODE', 'TD_SEQNO', 'buoyid', 'salinity', 'temperature', 'dates']
    >>> ocean_buoys_seq
                           TD_TIMECODE  TD_SEQNO  salinity  temperature       dates
    buoyid
    44      2014-01-06 10:00:25.122200         6        55           43  2014-06-06
    44      2014-01-06 10:01:25.122200         8        55           53  2014-08-08
    44      2014-01-06 10:01:25.122200        20        55           54  2015-08-20
    1       2014-01-06 09:01:25.122200        11        55           70  2014-11-11
    1       2014-01-06 09:02:25.122200        12        55           71  2014-12-12
    1       2014-01-06 09:02:25.122200        24        55           78  2015-12-24
    1       2014-01-06 09:03:25.122200        13        55           72  2015-01-13
    1       2014-01-06 09:03:25.122200        25        55           79  2016-01-25
    1       2014-01-06 09:01:25.122200        23        55           77  2015-11-23
    44      2014-01-06 10:00:26.122200         7        55           43  2014-07-07
    >>>
 
    # To use count() as Time Series Aggregate we must run groupby_time() first, followed by count().
    >>> ocean_buoys_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="2cy", value_expression="buoyid", fill="NULLS")
    >>> ocean_buoys_grpby1.count().sort(["TIMECODE_RANGE", "buoyid"])
                                          TIMECODE_RANGE  GROUP BY TIME(CAL_YEARS(2))  buoyid  count_TD_TIMECODE  count_TD_SEQNO  count_salinity  count_temperature  count_dates
    0  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0                  5               5               5                  4            5
    1  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1                  6               6               6                  6            6
    2  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2                  3               3               3                  3            3
    3  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      22                  1               1               1                  1            1
    4  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44                 13              13              13                 13           13
    >>>
 
    #
    # Time Series Aggregate Example 2: Executing count() function on DataFrame created on
    #                                  sequenced PTI table. We will consider DISTINCT rows for the
    #                                  columns while calculating the count.
    #
    # To use count() as Time Series Aggregate we must run groupby_time() first, followed by count().
    >>> ocean_buoys_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="2cy",value_expression="buoyid", fill="NULLS")
    >>> ocean_buoys_grpby1.count(distinct = True).sort(["TIMECODE_RANGE", "buoyid"])
                                          TIMECODE_RANGE  GROUP BY TIME(CAL_YEARS(2))  buoyid  count_TD_TIMECODE  count_TD_SEQNO  count_salinity  count_temperature  count_dates
    0  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       0                  4               5               1                  3            5
    1  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       1                  3               6               1                  6            6
    2  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2       2                  3               3               1                  3            3
    3  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      22                  1               1               1                  1            1
    4  ('2014-01-01 00:00:00.000000-00:00', '2016-01-...                            2      44                 10              13               1                  5           13
    >>>