Data quality goes mainstream.
Numerous studies document the pervasiveness of poor-quality data (see sidebar). In a survey of more than 1,000 professionals in seven countries, Experian (www.experian.com) found that more than 90% of respondents consider some of their customer data to be inaccurate and, on average, they believe that one-fourth of their data is inaccurate. Although 94% of companies want to use their data to optimize customer or prospect experiences, only 22% are using techniques that could be considered optimal. These techniques include centralizing responsibility for data quality, routinely monitoring it, and using a platform approach. The remainder of the companies, nearly 80%, are either unaware of their data quality, take a reactive approach, or are proactive to some degree but not sophisticated in their techniques.
The costs of failing to reach an acceptable standard for data quality are high; research conducted by Gartner (www.gartner.com) indicates that organizations believe they lose approximately $10-15 million per year as a result of poor-data quality. IBM published an estimate of $1.3 trillion as the cost of poor quality data in the U.S. each year. The impact of poor data can also strike at customer satisfaction and loyalty, as well as consume time spent by employees to handle errors. Forrester (www.forrester.com) found that one-third of analysts spend more than 40% of their time validating data before it can be used for decision making.
Why the neglect of data quality? Often it has been because the business case for data quality was not made clearly enough. "It is important to show how improved data will help the value chain," said Seth Earley, CEO of Earley Information Science (www.earley.com), a professional services firm specializing in helping firms get more value from their data content and knowledge. "Companies may have a difficult time making the connection to the core value proposition for this type of initiative. Data is only valuable when you are applying it to a problem, and stakeholders need to understand the connection. Often, the people who own the data are not the ones that feel the pain, so that connection is especially important."
Sources of data errors are numerous and varied, but the most common one is "human error" in data entry, such as typos, entering data in the wrong field, or failing to enter a piece of data. These errors may be committed by a customer service representative in a call center, or by the customer when entering data in an online form. Other errors may emerge when duplicate or near-duplicate records are produced. In the case of product data, changes in the product may be missed, or flawed communication may lead to the owner of the data not knowing about a new product introduction. Individuals may move or change their email address, and the new information is not received.
A plethora of data quality options
Once an organization has recognized the value of its data and a commitment has been made to improve quality, a variety of solutions can be implemented. Some data quality tools are integrated with applications such as product information management systems. Others are standalone products that are dedicated solely to data quality. Some are focused on a particular vertical such as pharmaceuticals or HR. Still others provide a suite of data management solutions, with data quality being one part. Data preparation software, for example, may include data quality as a component.
"Verification can be as basic as validating a date or ZIP code format, or determining whether a Social Security number is correctly structured for the United States versus Canada," said Jitesh Ghai, senior vice president and general manager, data governance and privacy, Informatica (www.informatica.com). "Our software can also look up an address to see whether it exists, and can also do more sophisticated things such as using machine learning to develop a confidence score on whether an individual in one data source is the same one as referenced in another."
Informatica has a long history in data integration, and is now focused on enterprise cloud and hybrid data environments. The company also expanded into master data management (MDM) in recognition of the fact that data is often fragmented among multiple systems. "To have a comprehensive view of the customer," continued Ghai, "business users must be assured that the data is complete, consistent, and timely."
As an example of how data quality ripples through an organization, Toyota North America (www.toyota.com) delved deeply into KPIs that were used in different departments. "Each one was calculating KPIs based on a different set of underlying data," Ghai commented. "Data scientists were working to understand customer preferences so that the company could design the cars that customers wanted. Data quality was basic to this process."
Another gap was that the sales department did not have access to marketing or manufacturing data. Informatica helped them build a data lake to provide visibility across departments. "For example, manufacturing was unaware of the impact of the subprime housing crisis on sales," noted Ghai. "Therefore, they did not adjust their production volume to account for the drop in sales of new cars. Data completeness and timeliness were vital in this context."
Identifying data quality as an issue
Companies that seek help for a business issue do not necessarily initially see data quality as an underlying problem. "When we are approached to provide an analytics solution," said Jake Freivald, VP of product marketing at Information Builders (www.informationbuilders.com), "they may not be aware that they have a data quality issue. If it turns out they do, they need to address that first."
Information Builders' platform emphasizes three primary capabilities: integration, integrity, and intelligence, with the goal of producing actionable analytics. Its Omni-Gen software solution provides real-time data standardization, cleansing, and remediation so that data quality issues can be detected and resolved, in order to provide the so-called golden records that allow for valid analytics.
Information Builders has developed a vertical solution for the healthcare industry, Omni-HealthData (www.omnihealthdata.com) that is being used at St. Luke's University Hospital (www.slhn.org) in the mid-Atlantic area, as well as in other health networks in the U.S. and Canada. The St. Luke's implementation began with a vision of strategic business priorities and then focused on what analytics would be required to achieve them. The organization determined what data would be required in order to ensure quality care and a positive patient experience, and to meet the challenges of value-based reimbursement and cost containment.
Combining automation and human intervention
Data from 31 different systems was cleansed and standardized to reconcile demographic information from patients. Duplicate records were eliminated and errors were corrected. The resulting data allowed accurate reporting and analysis, and data quality is continuously improved through St. Luke's data governance program.
A data quality solution should combine both automation and human intervention; on the one hand, scaling up cannot be done manually, but on the other hand, there are judgment calls that require human knowledge and insight. "We leverage human and AI input" said Freivald, "to enable business users to make recommendations about data quality rules, and then also surface likely rules to automate intelligent recommendations about what rules should be in place. Machine learning is about trying to surface data quality rules so they can be used as part of the larger enterprise model."
"It is hard to get a budget for data quality," explained Freivald, "but when the data quality is directly and explicitly tied to a business strategy, people see its importance." In some cases, Freivald advised, it's a good idea to start small and get a high-visibility success, and then build out. "Data quality is not an engaging topic on its own," he added, "but when it is put in a meaningful business context, it is easier to get people on board."
As organizations continue to recognize the value of data as an asset, the market for data quality solutions will grow. Gartner reported that the market for data quality software tools reached $1.61 billion in 2017 (the most recent year for which the company has data), an increase of 11.6% over 2016. Given the proliferation of customer data, medical data, Internet of Things streams, and other sources, this market growth is likely to accelerate in coming years as businesses support new initiatives for data quality.
BY JUDITH LAMONT, KMWORLD SENIOR WRITER
Judith Lamont, Ph.D., is a research analyst with Zentek Corp., email firstname.lastname@example.org
Customer Data: How Bad Is It?
One study of data quality reported in Harvard Business Review was carried out by managers participating in executive training being given by university and corporate trainers. The individuals were from companies and government agencies, and came from a variety of functional areas, including customer service and HR. They selected 10-15 critical work-related attributes from 100 work units (records), and checked each one for errors.
Nearly half the records had a least one critical error; for one-fourth of respondents, only 30% of the records were correct, and for half the respondents, fewer than 60% were correct. Only 3% of the records were deemed acceptable even using a low standard.
Many advertisers rely on information from third parties to target their ads, particularly online ads. Deloitte (www.deloitte.com) asked roughly 100 of its own employees to review information about themselves that was available through a consumer data broker's portal in order to evaluate its accuracy. The data covered numerous variables, grouped into six categories such as demographic, economic, and purchasing history. The employees were asked to indicate whether the data for each variable was correct, and also to indicate by category whether the percent of data correct overall was zero, 25%, 50%, 75%, or 100%.
Results showed that for nearly half of the variables, the data was inaccurate for half of the respondents. Nearly 60% of respondents rated the overall accuracy of the demographic category to be 50% or less, despite the fact that information such as birth date and marital status can be obtained from a variety of reliable sources. Fewer than half said their purchase activity as listed in the portal was correct. The bottom line is that when data is being used to target customers or predict what they might want to purchase, the initiatives are on shaky ground unless effort is devoted to improving data quality.