Analysis of Row Partitioning Benefits
The partitioning definition used for “Scenario 1” on page 376 partitioned by month because many of the queries use months as their basic time unit.
Another option is to attempt to optimize access further by changing the granularity
of the partitioning scheme by partitioning at a finer level. Suppose the DBA considers
partitioning the table by day instead of by month. The table now has about 760 partitions
(25 months), with those corresponding to future dates in the current month being empty.
The results of telescoping the partition granularity for this table and query workload
look like this:
The effort to delete the oldest month of data is substantially the same as it was
for the same table defined with 25 primary index partitions. The run time for that
job is also substantially the same. Making the simplifying assumption that all months
have 30 days, the DBA would have to delete 30 smaller partitions with the oldest month
of data instead of deleting one larger partition, but the same number of rows are
deleted, and the elapsed time to delete them is roughly the same.
The nightly inserts benefit from the finer partitioning because instead of being concentrated
in one or two partitions out of 25, the inserted rows are now directed to three, four,
or at most five partitions out of 760; well under 1% of the partitions. Most of the
inserts are directed to one partition, the one containing the activity for the day
just completed.
The short-running primary index access queries are not impacted either way by having
760 partitions instead of 25.
The ad hoc queries that usually analyze two or three months of data are also little
changed, now accessing roughly 60 out of 760 partitions rather than 2 out of 25, roughly
the same percentage of the defined partitions for the primary index. Queries varying
by the time of month, however, might realize some performance gain with the larger
number of partitions. For example, a query submitted on the fourth day of the month
might be likely to analyze the current day and the previous 33 days of data, while
a query submitted later in the month might restrict the analysis to that calendar
month. In this case, the 34 day query would involve 68 out of 760 partitions, instead
of 4 out of 25, a significantly smaller percentage.
The analysis queries examining 24 months of data run in about the same length of time
because they touch most of the rows in the table in either case.
The joins are not impacted by the number of partitions because there are no direct
joins against this table.