Organizations are increasingly relying on analytics and advanced data visualization techniques to deliver incremental business value. However, when data quality hampers their efforts, the credibility of their entire analytics strategy becomes questionable.
Because analytics is traditionally seen as a presentation of a broad landscape of data points, it is often assumed that data quality issues can be ignored since they would not impact broader trends. But should bad data be ignored to allow analytics to proceed? Or should organizations stall to address data quality issues?
Most analytics programs are designed based on the belief that removing outliers (data points that diverge greatly from an overall pattern that may indicate an error) is all that is needed to make sense of the data. However, organizations that ignore these data points make assumptions without considering important dimensions—and that could lead to very different decisions. This approach not only makes the analysis dubious, but often leads to incorrect conclusions.
The practice of removing outliers can lead to the deletion of a significant number of data points, in some cases 40 percent of the data set, from the analysis. This only reduces confidence in the analysis.
But that doesn’t mean data needs 100 percent validation in order to use it for analytics. Indeed, companies should only clean the data they intend to use for specific analysis.
In Data Quality for Analytics: clean input drives better decisions, an article in the Fall 2015 edition of CROSSINGS: The Sapient Journal of Business Transformation, we use a scenario from the shipping industry to highlight the dependence on quality data and discuss how companies can address data quality in parallel with the deployment of their analytics platforms to deliver greater business value.
Through this example, we state our case for a practical and fit-for-purpose approach to data quality management by incorporating the following guidelines:
- Tackle analytics with an eye on data quality
- Rely on analytics use cases to prioritize data quality hot spots
- Decide on a strategy for outliers and use the 80/20 rule when pruning the data set
- Ensure decisions are trustworthy and make data quality stick by addressing root causes and implementing a monitoring effort
- More than any other program, make this one business-led for optimum results
Niko Papadakos – Director
Niko Papadakos is a Director at Sapient Global Markets in Houston, focusing on data. He has more than 20 years of experience across financial services, energy and transportation. Niko joined Sapient Global Markets in 2004 and has led project engagements in key accounts involving data modeling, reference and market data strategy and implementation, information architecture, data governance and data quality.
Mohit Sharma – Senior Manger and Enterprise Architect
Mohit Sharma is a Senior Manger and Enterprise Architect with eight years of experience in the design and implementation of solutions for oil and gas trading and supply management. During this time, Mohit was engaged in multiple large and complex enterprise transformation programs for oil and gas majors. Most recently, he developed a total cost of ownership (TCO) model for a major North American gas trading implementation.
Mohit Arora – Senior Manager
Mohit Arora is a Senior Manager at Sapient Global Markets and is based in Houston. He has over 11 years of experience leading large data management programs for energy trading and risk management clients as well as for major investment banks and asset management firms. Mohit is an expert in data management and has a strong track record of delivering many data programs that include reference data management, trade data centralization, data migration, analytics, data quality and data governance.
Kunal Bahl – Senior Manager
Kunal Bahl is a Senior Manager in Sapient Global Markets’ Midstream Practice based in San Francisco. He is focused on Marine Transportation and his recent assignments include leading a data integration and analytics program for an integrated oil company, process automation for another integrated oil company and power trading system integration for a regional transmission authority.