Table Structure and Update Schedule, Query Workload, and Problem Statement
This case study is a continuation of the scenario presented in Scenario 1. The table structure, update schedule, query workload, and problem statement are unchanged. The scenario evaluates whether the partitioning definition for this table can be improved further.
Analysis of Row Partitioning Benefits
The partitioning definition used for Scenario 1 partitioned by month because many of the queries use months as their basic time unit. Another option is to attempt to optimize access further by changing the granularity of the partitioning scheme by partitioning at a finer level. Suppose the DBA considers partitioning the table by day instead of by month. The table now has about 760 partitions (25 months), with those corresponding to future dates in the current month being empty.
The results of telescoping the partition granularity for this table and query workload look like this:
- The effort to delete the oldest month of data is substantially the same as it was for the same table defined with 25 primary index partitions. The run time for that job is also substantially the same. Making the simplifying assumption that all months have 30 days, the DBA would have to delete 30 smaller partitions with the oldest month of data instead of deleting one larger partition, but the same number of rows are deleted, and the elapsed time to delete them is roughly the same.
- The nightly inserts benefit from the finer partitioning because instead of being concentrated in one or two partitions out of 25, the inserted rows are now directed to three, four, or at most five partitions out of 760; well under 1% of the partitions. Most of the inserts are directed to one partition, the one containing the activity for the day just completed.
- The short-running primary index access queries are not impacted either way by having 760 partitions instead of 25.
- The ad hoc queries that usually analyze two or three months of data are also little changed, now accessing roughly 60 out of 760 partitions rather than 2 out of 25, roughly the same percentage of the defined partitions for the primary index. Queries varying by the time of month, however, might realize some performance gain with the larger number of partitions. For example, a query submitted on the fourth day of the month might be likely to analyze the current day and the previous 33 days of data, while a query submitted later in the month might restrict the analysis to that calendar month. In this case, the 34 day query would involve 68 out of 760 partitions, instead of 4 out of 25, a significantly smaller percentage.
- The analysis queries examining 24 months of data run in about the same length of time because they touch most of the rows in the table in either case.
- The joins are not impacted by the number of partitions because there are no direct joins against this table.
Increasing the number of partitions from 25 to 760 has only a modest effect on performance for this particular workload. The greatest gain is for queries that analyze only a few days of transactions.