Teradata Package for Python Function Reference - count - Teradata Package for Python - Look here for syntax, methods and examples for the functions included in the Teradata Package for Python.

Teradata® Package for Python Function Reference

Product

Teradata Package for Python

Release Number

17.00

Published

November 2021

Language

English (United States)

Last Update

2021-11-19

lifecycle

Product Category

Teradata Vantage

teradataml.dataframe.dataframe.DataFrame.count = count(self, distinct=False): DESCRIPTION: Returns column-wise count of the dataframe. PARAMETERS: distinct: Optional Argument. Specifies whether to exclude duplicate values while calculating the count. Default Value: False Types: bool RETURNS: teradataml DataFrame object with count() operation performed. RAISES: TeradataMLException 1. EXECUTION_FAILED - If count() operation fails to generate the column-wise count of the dataframe. Possible error message: Failed to perform 'count'. (Followed by error message). 2. TDMLDF_AGGREGATE_COMBINED_ERR - If the count() operation doesn't support all the columns in the dataframe. Possible error message: No results. Below is/are the error message(s): All selected columns [(col2 - PERIOD_TIME), (col3 - BLOB)] is/are unsupported for 'count' operation. EXAMPLES : # Load the data to run the example. >>> from teradataml.data.load_example_data import load_example_data >>> load_example_data("dataframe", ["employee_info"]) # Create teradataml dataframe. >>> df1 = DataFrame("employee_info") >>> print(df1) first_name marks dob joined_date employee_no 101 abcde None None 02/12/05 100 abcd None None None 112 None None None 18/12/05 >>> # Select only subset of columns from the DataFrame. >>> df2 = df1.select(['employee_no', 'first_name', 'marks']) # Prints count of the values in all the selected columns # (excluding None types). >>> df2.count() count_employee_no count_first_name count_marks 0 3 2 0 >>> # # Using count() as Time Series Aggregate. # >>> # Load the example datasets. ... load_example_data("dataframe", ["ocean_buoys_seq"]) >>> # # Time Series Aggregate Example 1: Executing count() function on DataFrame created on # sequenced PTI table. We will consider all rows for the # columns while calculating the count. # >>> ocean_buoys_seq = DataFrame("ocean_buoys_seq") >>> ocean_buoys_seq.columns ['TD_TIMECODE', 'TD_SEQNO', 'buoyid', 'salinity', 'temperature', 'dates'] >>> ocean_buoys_seq TD_TIMECODE TD_SEQNO salinity temperature dates buoyid 44 2014-01-06 10:00:25.122200 6 55 43 2014-06-06 44 2014-01-06 10:01:25.122200 8 55 53 2014-08-08 44 2014-01-06 10:01:25.122200 20 55 54 2015-08-20 1 2014-01-06 09:01:25.122200 11 55 70 2014-11-11 1 2014-01-06 09:02:25.122200 12 55 71 2014-12-12 1 2014-01-06 09:02:25.122200 24 55 78 2015-12-24 1 2014-01-06 09:03:25.122200 13 55 72 2015-01-13 1 2014-01-06 09:03:25.122200 25 55 79 2016-01-25 1 2014-01-06 09:01:25.122200 23 55 77 2015-11-23 44 2014-01-06 10:00:26.122200 7 55 43 2014-07-07 >>> # To use count() as Time Series Aggregate we must run groupby_time() first, followed by count(). >>> ocean_buoys_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="2cy", value_expression="buoyid", fill="NULLS") >>> ocean_buoys_grpby1.count().sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid count_TD_TIMECODE count_TD_SEQNO count_salinity count_temperature count_dates 0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 5 5 5 4 5 1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 6 6 6 6 6 2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 3 3 3 3 3 3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 22 1 1 1 1 1 4 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 13 13 13 13 13 >>> # # Time Series Aggregate Example 2: Executing count() function on DataFrame created on # sequenced PTI table. We will consider DISTINCT rows for the # columns while calculating the count. # # To use count() as Time Series Aggregate we must run groupby_time() first, followed by count(). >>> ocean_buoys_grpby1 = ocean_buoys_seq.groupby_time(timebucket_duration="2cy",value_expression="buoyid", fill="NULLS") >>> ocean_buoys_grpby1.count(distinct = True).sort(["TIMECODE_RANGE", "buoyid"]) TIMECODE_RANGE GROUP BY TIME(CAL_YEARS(2)) buoyid count_TD_TIMECODE count_TD_SEQNO count_salinity count_temperature count_dates 0 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 0 4 5 1 3 5 1 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 1 3 6 1 6 6 2 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 2 3 3 1 3 3 3 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 22 1 1 1 1 1 4 ('2014-01-01 00:00:00.000000-00:00', '2016-01-... 2 44 10 13 1 5 13 >>>