Data Warehousing Institute 2005 Follow-Up Survey
Three years later, a TDWI survey of many of the same industry sources reported the following list of most frequent contributors to data quality problems in their organizations:
Source |
Percentage |
Data entry by employees |
75 |
Inconsistent definitions for common terms |
75 |
Data migration or conversion projects |
46 |
Mixed expectations by users |
40 |
External data |
38 |
Data entry by customers This includes typographical errors and information typed into the wrong fields made when entering data into forms on the World Wide Web. Note that human error can also have a significant effect on the security of your site (Liginlal, Sim, and Khansa, 2009). |
26 |
System errors |
25 |
Changes to source systems |
20 |
Other |
7 |
Source: Philip Russom, Taking Data Quality to the Enterprise Through Data Governance, The Data Warehousing Institute, 2005. |
The percentages do not add to 100 because they represent the relative number of survey respondents who reported the associated data quality problem as a significant source of data quality problems in their organization, not the percentage of all errors contributed by each source.
The surveys cover the same error sources except that the 2005 survey adds the category “Inconsistent definitions for common terms,” which tied with “Employee data entry” as the number one source of data quality problems in the organization.
Data Quality Problem Source |
2002 Survey Score |
2005 Survey Score |
Percent Change |
Data entry by employees |
76 |
75 |
-1 |
Inconsistent definitions for common terms |
∅ |
75 |
0 |
Data migration or conversion problems |
48 |
46 |
-2 |
Mixed expectations by users |
46 |
40 |
-6 |
External data |
34 |
38 |
4 |
Data entry by customers |
25 |
26 |
1 |
System errors |
26 |
25 |
-1 |
Changes to source systems |
53 |
20 |
-33 |
Other |
12 |
7 |
-5 |
Sources: |
The category “inconsistent definitions for common terms” was not measured in the 2002 survey.
Note that over three years time, the percent change, whether positive or negative, is, with two possible exceptions, not significant. The exceptions are the relatively small 6% drop in “Mixed expectations by users” and the significantly large 33% drop in “Changes to source systems.”
It is interesting to note, however, that of the 79% of the organizations surveyed that have a data quality initiative in place, the team leading that initiative in the 2005 survey is most likely to be the data warehousing group, whereas in the 2002 survey, the data warehousing team was second to the IT department in terms of which was the more likely leader of the initiative. Unfortunately, a whopping 42% of those surveyed had no plans to institute a data governance initiative, while a mere 8% had such an initiative already in place. 33% had an initiative under consideration, while another 17% had such a plan in either its design or implementation phase.