Last week I talked about completeness and timeliness as measures of data quality. This week I want to delve a little more into two more dimensions of quality – accuracy and credibility. All organizations today are facing data quality concerns so I felt it was important to spend another blog post on it. Without accurate and credible data, organizational decisions are negatively impacted. I agree with Prashanta Chandramohan from Pivot Point that poor data quality has consequences in analytical and operational environments. In an article by Melissa Data, accuracy is when the data is a true representation of the expected values. Credibility is dependent on the source and is how reliable and believable the data is.
In his book Journey to Data Quality, Yang Lee says that the state of the data in an organization must be assessed before determining whether or not it is accurate. Yang discusses several assessment methods including an Information Quality Assessment survey tool by the Cambridge Research Group as shown below. From this IQA survey, accuracy is measured by whether or not all data is presented consistently in the same format, includes all necessary values and is complete. I would love to see something similar to this assessment tool in my organization since data is often gathered from multiple sources over time with multiple data stewards. Although the IQA survey itself may be subjective and not 100% accurate, it is a great starting point and provides documentation for the current state of data quality.
Finally, I will turn to a brief discussion of data credibility. In his book Virtual Unreality, Charles Seife points out that “a wrong piece of information, a digital brain-altering virus can spread at the speed of light through the internet and quickly find a home among a dispersed but digitally interconnected group of true believers. This group acts as a reservoir for the bad idea, allowing it to gather strength and reinfect people.” Another famous quote from Charles Spurgeon is that “A lie will go round the world while truth is pulling its boots on.” It is crucial now more than ever that data be believable and likely.
According to Gartner and the Data Warehousing Institute, by 2020 the average organization will be handling over 30 Zeta bytes of data. With that much data, it’s important that as close to 100% of your data can be trusted. A clear and well-thought out data policy is just one step in establishing data credibility. Treating information as a product and making sure it is aligned with the business goals is also key.
The question of knowing the data source is described in an article by Data First. Data consumers can verify data trustworthiness by considering where the data is published and how it is funded. It’s always a good idea to critically think about which organization produced the data and whether or not they have an inherent bias. There are also many university guidelines to information credibility on the web.
The various aspects of data quality that we’ve discussed over the last two weeks should be discussed and implemented in all organizations. Data completeness, timeliness, accuracy and credibility form the backbone of great data. As Ted Friedman so aptly summarizes, “Data is useful. High, quality, well-understood, auditable data is priceless.”