My way of joking is telling the truth; that is the funniest joke in the world.
— George Bernard Shaw
I worked once for a court jester. She was highly favored by her king, who had hired her to amuse him, by giving her license to tell him the truth wrapped in lies.
She liked to play the fool and did it well. In Latin the word fool, which is follis, stands for wind bag. She lived up to her name exquisitely.
But there was only one problem with this idyllic situation. Her troupe of actors was no joke.
The most important members in her group were responsible for data quality remediation.
In the span of 3 years they had spent nearly 1 million dollars fighting the dragon of garbage data. This is a difficult dragon to tame, because garbage data never dies. It can only be reduced to insignificance by way of data quality management techniques, such as data entry quality control, documented business data processing rules, data profiling and other data quality definition tactics.
When the garbage data dragon roams freely through the corporate kingdom’s causeways and byways, it feeds on refuse. For every village in the kingdom is able to consume what it needs to operate and survive. But the moment it is asked to share its bounty with another village, it rushes into the local castle, raises the draw bridge and asserts its fiefdom rights.
Whatever leftovers remain outside the castle or anything that the villagers toss over the castle walls into the moat below becomes the dragon’s spread.
The larger the kingdom, the greater the meal. Unimpeded, the garbage data dragon can grow to awesome dimensions this way. In such cases, his voluminous size can be fettered to the rock of ongoing data quality assessment only by way of complex data quality software. But this can be rather a sizable seigeworks in its own right and liable to collapse if handled by inexperienced fools.
Enter The Court Jester
To achieve data quality remediation, the court jester directed her troupe down the path of data quality assurance. On the face of it, it sounds like a most fitting route to take. After all, who wouldn’t want to be assured that the data quality in a given location of the kingdom is not healthy?
We all know that healthy data produce healthy brains. And healthy brains make sound decisions for the kingdom.
However, this was only the wind bag blowing hot air again.
For, as it happens to be the case, the route to take toward data quality remediation is first identified by the data quality issues pointed out by key villagers and next guaranteed as having been rightly understood through a data quality audit.
Failing to practice this by deploying a group of expert scouts, you venture sending your troupe down an unstable path of unclear data quality objectives.
This can lead directly into garbage data’s lair. But you face the disadvantage of not having a master data management map in hand to help you find an alternative way to confront the monster, in case either the volume of garbage data becomes too daunting to bring down in one attack or an escape route becomes necessary in case you fail to bring the dragon down after you’ve spent much of your resources fighting it.
But the king wanted a joke with every story of successful dragon slaying. So, after many successful attempts at binding the behemoth, he sent a jester before us to slay the dragon using hot air.
It was the dragon’s breath that was the real thing.
No Humor In The Dragon
In the quest for data quality remediation there seems to be one data quality control tool that even court jesters find easy to understand. It is called deduplication software. It is often thought to be a cure all — almost magical — particularly by those in the village of marketing, who believe there is only one purpose to such a tool: “Make it deduplicate my contacts list.”
To be sure, the deduplication software tool can be a powerful weapon in the arsenal of every garbage data dragon slayer. I used it quite effectively for years to knock down a dragon’s wings through a company of warriors aptly called Catapult Data in San Jose, California. This was a terrific team of data quality improvement experts who helped me enormously to achieve data quality remediation. But deduplication software has some severe limitations, because a garbage data dragon is not all wings.
The data garbage dragon is a complex beast able to cloak itself by shifting shape. This makes winning the data quality remediation war extremely difficult.
One day you’re fighting this devil’s interpretability to determine whether it can be understood by its many found duplicate records, and next thing you know you’re battling its granularity, as you realize that you cannot come to understand the extent of duplication without a deeper analysis of the monster’s cellular structure to make an educated comparison between duplicates to rid yourself of them.
No sooner have you gotten a clear picture of granularity and the monster rears its head and exposes its deadly claws of incompleteness, which means that you are unable to fully grasp the deeper details of data because you don’t have enough of it to make a sound assessment on what to do with what you do have before you.
While fighting the demons of granularity and incompleteness, this shape-shifter raises an impenetrable scaly shield of untraceable, undocumented variations, meaning that what seems to have come from one place looks like it came from several unrelated places in the kingdom and there is no way to trace it back or prove that it is what it seems to be.
This is usually the time when this infernal lizard launches its most deadly fiery venom, a noxious stream of imprecise, time-related data values that, under pressure, make decision-support and report reliability for users impossible.
Facing the impatience of the villagers for results and spent out in time and resources, and may be having been well over their heads in the use of expert tools and techniques that they did not understand, many valiant fighters only too often see their quest for data quality remediation come to a discouraging end with not much to show for except battle scars and a few remarks about the impossibility of taming the great garbage data dragon.
But the dragon can be tamed. It does take time and an entire village to do it, whose success will only encourage other villages to join in the fight.
But the king must be willing to dismiss all court jesters or, at least, not make them head of the dragon hunting party.
How To Tame The Garbage Data Dragon And Achieve Data Quality Remediation
To remedy the data quality problem, begin by demolishing the fiefdoms that produce the trash on which the garbage data dragon feeds. Set some precise data quality objectives based on clearly identified data quality issues that specific people would be willing to own until data quality remediation is achieved.
Run a data quality audit that can give substance to the issues identified and prioritize the need to address them according to their impact on the business if resolved.
Since you’re after data quality improvement, consider that you will need data quality definition centralized under a master data management plan. This is the only way to conduct a holistic data quality assessment that will cover business data processing, tools and people analysis.
That’s what answers the question ‘What is data quality?’ Then take your weapons of process, tools and people and, with plan in hand, venture down the road of total quality management to bring the garbage data dragon down to size.
What happened to the court jester in my story? She got eaten by the garbage data monster no sooner had I left that kingdom. It seems she persisted in joking around about the necessity of keeping things so simple in what is a complex fight, that she unfettered the fiend and ended by shooting spit balls at it, when the only thing that had managed to get it under control after a long seige of many years was the fierceness, cunning and persistence of many warriors in shining armor.
Return to SaaS Data Quality Control from Data Quality Remediation