Teradata Package for Python Function Reference | 20.00 - count - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.
Teradata® Package for Python Function Reference - 20.00
- Deployment
- VantageCloud
- VantageCore
- Edition
- Enterprise
- IntelliFlex
- VMware
- Product
- Teradata Package for Python
- Release Number
- 20.00.00.03
- Published
- December 2024
- ft:locale
- en-US
- ft:lastEdition
- 2024-12-19
- dita:id
- TeradataPython_FxRef_Enterprise_2000
- lifecycle
- latest
- Product Category
- Teradata Vantage
- teradataml.dataframe.dataframe.DataFrame.count = count(self, distinct=False)
- DESCRIPTION:
Returns column-wise count of the dataframe.
PARAMETERS:
distinct:
Optional Argument.
Specifies whether to exclude duplicate values while calculating the count.
Default Value: False
Types: bool
RETURNS:
teradataml DataFrame object with count() operation
performed.
RAISES:
TeradataMLException
1. EXECUTION_FAILED - If count() operation fails to
generate the column-wise count of the dataframe.
Possible error message:
Failed to perform 'count'. (Followed by error message).
2. TDMLDF_AGGREGATE_COMBINED_ERR - If the count() operation
doesn't support all the columns in the dataframe.
Possible error message:
No results. Below is/are the error message(s):
All selected columns [(col2 - PERIOD_TIME), (col3 -
BLOB)] is/are unsupported for 'count' operation.
EXAMPLES :
# Load the data to run the example.
>>> from teradataml.data.load_example_data import load_example_data
>>> load_example_data("dataframe", ["employee_info"])
# Create teradataml dataframe.
>>> df1 = DataFrame("employee_info")
>>> print(df1)
first_name marks dob joined_date
employee_no
101 abcde None None 02/12/05
100 abcd None None None
112 None None None 18/12/05
>>>
# Select only subset of columns from the DataFrame.
>>> df2 = df1.select(['employee_no', 'first_name', 'marks'])
# Prints count of the values in all the selected columns
# (excluding None types).
>>> df2.count()
count_employee_no count_first_name count_marks
0 3 2 0
>>>
#
# Using count() as Time Series Aggregate.
#
>>> # Load the example datasets.
... load_example_data("dataframe", ["ocean_buoys_seq"])
>>>
#
# Time Series Aggregate Example 1: Executing count() function on DataFrame created on
# sequenced PTI table. We will consider all rows for the
# columns while calculating the count.
#
>>> ocean_buoys_seq = DataFrame("ocean_buoys_seq")
>>> ocean_buoys_seq.columns
['TD_TIMECODE', 'TD_SEQNO', 'buoyid', 'salinity', 'temperature', 'dates']
>>> ocean_buoys_seq
TD_TIMECODE TD_SEQNO salinity temperature dates
buoyid
44 2014-01-06 10:00:25.122200 6 55 43 2014-06-06
44 2014-01-06 10:01:25.122200 8 55 53 2014-08-08
44 2014-01-06 10:01:25.122200 20 55 54 2015-08-20
1 2014-01-06 09:01:25.122200 11 55 70 2014-11-11
1 2014-01-06 09:02:25.122200 12 55 71 2014-12-12
1 2014-01-06 09:02:25.122200 24 55 78 2015-12-24
1 2014-01-06 09:03:25.122200 13 55 72 2015-01-13
1 2014-01-06 09:03:25.122200 25 55 79 2016-01-25
1 2014-01-06 09:01:25.122200 23 55 77 2015-11-23
44 2014-01-06 10:00:26.122200 7 55 43 2014-07-07
>>>
# To use count() as Time Series Aggregate we must run groupby_time() first, followed by count().
>>> ocean_buoys_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="2cy", value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.count().sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid count_TD_TIMECODE count_TD_SEQNO count_salinity count_temperature count_dates
0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 5 5 5 4 5
1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 6 6 6 6 6
2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 3 3 3 3 3
3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 22 1 1 1 1 1
4 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 13 13 13 13 13
>>>
#
# Time Series Aggregate Example 2: Executing count() function on DataFrame created on
# sequenced PTI table. We will consider DISTINCT rows for the
# columns while calculating the count.
#
# To use count() as Time Series Aggregate we must run groupby_time() first, followed by count().
>>> ocean_buoys_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="2cy",value_expression="buoyid", fill="NULLS")
>>> ocean_buoys_grpby1.count(distinct = True).sort(["TIMECODE_RANGE", "buoyid"])
TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid count_TD_TIMECODE count_TD_SEQNO count_salinity count_temperature count_dates
0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 4 5 1 3 5
1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 3 6 1 6 6
2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 3 3 1 3 3
3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 22 1 1 1 1 1
4 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 10 13 1 5 13
>>>