(Image by Scott Adams)
Governments around the world have increasingly been faced with challenges on how to make public service and administration transparent, effective and efficient. The role of “open data” is a key component in accomplishing these objectives. Many researchers have mentioned one part of evaluating the success of an open data program is to look at data quality. For this discussion, I’d like to define data quality to include the following dimensions: completeness, timeliness, accuracy and credibility. In part 1 of this blog, I’ll discuss completeness and timeliness. In part 2, I’ll discuss accuracy and credibility dimensions of data quality. I will end the “mini-series” with some case examples of good data quality.
Completeness and timeliness are key measurements of data quality as discussed in Chapter 3 of “Measuring E-government Efficiency” by Manuel Pedro Rodriguez-Bolivar. I want to pause a moment and point out that I agree with Douglas Hubbard of “How to Measure Anything” that the de facto definition of “measurement” is not a certainty but an approximation. So with that in mind, let’s begin by describing an approximation of what we mean by completeness and timeliness.
I agree with Rodriguez-Bolivar that says data are complete if a description is available, can be downloaded, is machine readable and linked to other related datasets. A key metric of completeness is whether there are meta-tags describing each particular dataset. Each meta-tag should contain a description text. However, only those familiar with the data subject matter can evaluate whether the contained text makes sense or whether it is actually related to a particular dataset. Data can be downloaded if, for each available resource, there is a tag containing a download link. A dataset is machine readable if its resources are published in formats that allow computer processing. A dataset is linked to other datasets if there are listed links in the [relationships] tag.
Data are timely if they contain information describing their timeliness. For example, timeliness could be measured by what period is covered by the data held in a dataset, how often the data are updated, and when the last update was. Time expectations between when the data is expected and delivers varies greatly between datasets. According to Information Management and many others, there is an increasing demand for data to be real-time. Part of the exciting part of data management will be providing complete and timely data.