Dirty data may mean a dead-end to business value. Even worse, it can have a negative impact on business value. Dirty data can cause minor problems or be catastrophic.
A catastrophic problem would be losing a customer or having to take a major financial write-off [ ] less of an impact is an invoice going to the wrong department at a customer site but the customer routing it correctly to fix your mistake, again and again.
The impression you leave with the customer is that you are out of control or that you do not care.”
— Olin Thompson, Why Systems Fail: The Dead-End Of Dirty Data
The term ‘dirty data’ is vague. But the catastrophes that invalid data can cause you aren’t. ‘Invalid data’ is a better term to describe the real problem confronting customer database systems.
Data that under examination become misleading information to an end-user are invalid data. Its quality is nominal due to perceived shortcomings in its uniqueness, completeness, timeliness, accuracy and consistency.
In non-technical terms, if the data captured at one end of the line arrive to your end to become information that is duplicated, incomplete, too aged to meet your needs, imprecise or contradictory, then you got a severe problem in your hands, especially if this information describes your valued customer.
Yet the problem is not the data itself. The problem lies with your expectations about the information that should result from your scrutiny of these data.
That Dirty Customer Data
Data usefulness depends on a user’s conversion of said data into actionable information, and on the subsequent employment of this information to meet a user’s anticipated needs.
This is not a fanciful distinction between data and information.
A customer support call center, for example, might flawlessly capture every text character, every digit (i.e. data) that make up a customer name and phone number (i.e. information), because they are vital for call center agents to use to communicate with customers by phone. They won’t call this dirty customer data. But if you examine the address and email data for these records, what type of quality would you anticipate to find there?
If you realize upfront that data entry procedures were established primarily to facilitate the use of a phone by call center agents to call the customer, then it should come as no surprise that postal or email data in the same system was not top priority to collect, whether correctly or at all.
A marketer in need of sending out a direct mail promotion who is made to depend on customer address or email entries from this call center data source might characterize this source as containing ‘dirty customer data’, even if the call center agents had been perfectly satisfied with its quality for their own purposes.
Suddenly we have a dirty customer data or simply a data quality problem that is actually an information expectations management problem.
The marketer expects reliable postal addresses from a call center data set when that postal data actually sit in the shipping department’s system and not necessarily in the call center.
Now replicate this situation by extracting data from many other sources (e.g. sales, accounting, production, marketing) to consolidate and centralize for access by a diverse community of end-users with a multiplicity of usage demands, and were looking at a nightmare in the making.
This is the cascade effect that keeps “dirty customer data” circulating throughout the company and complicating customer treatment.
Nothing can kill a customer database faster than poor information surfacing from its entrails. Dirty customer data is lethal.
It can ruin the system’s reputation overnight. The bad news spreads most assuredly like the plague and the system is bound to be quarantined before it infects any other ones.
Step One In Customer Data Cleansing
-
- Before populating a customer database conduct an exploratory data analysis. If through your analysis you discover strange data patterns, then secure clear and reliable business data definitions and data transformation rules from people who control data entry and believe the data to be reliable for the immediate use for which it was captured. The meaning behind such data rests with these subject matter experts.
- It may be possible to automate some data transformation procedures to ensure standardization of these data elements at their point of entry or at some consolidation point.
- However, in my experience configuring such data quality management solutions in-house or by outsourcing the job to a team of experts has always depended on delineating these data transformation rules upfront, before assigning someone to own them as
data quality standards
- for the long haul.
- Find data stewards for every contributing data stream and appoint them to guard these quality standards.
- Herein lies the key to a successful management of data quality as bits of customer behavior and customer profile elements flow steadily into the customer database from multiple sources.
- These stewards become subject matter experts on the operations process that each data source system is meant to support in the company.
- For example, one steward becomes an expert on how Accounting enters, transforms and maintains financial data. Another becomes an expert in how Marketing does the same with response data. Another in Sales comes to understand intimately how orders and product data get processed.
- These stewards also become experts in the relationships that these data under their care have to other data streams, as the streams commingle under the administration of one data quality czar responsible for the quality within the central system. This means that the czar makes sure that the data can be useful to anyone accessing it, because meaning is not assumed but goes out with every extract from that central system.
- Measure the opportunity cost of failing to steward your data or of having to correct it manually, and seek with this estimate for the funding to hire the workers, promote existing staff or license the quality management tools and services that eliminate the dirty customer data concern.
Of all these criteria, the stewards are your biggest success factor. They will build the information inventory, maintain the data glossaries and facilitate the answers to the questions that analysts further down the stream will need explained to them to make sure the information in the database won’t be referring to “dirty customers.”
Return to By Their Customer Database Ye Shall Know Them from The Dirty Customer Data Pile Up