Introduction
I would like to begin by thanking Ember for allowing me to be a guest blogger and explaining the motivation behind writing this blog.
The “need for speed” in today’s world has led to a sweeping reform where old ideas are discarded too rapidly. This includes sound data management principles which are critical to your business. We seem to have forgotten that getting the right data is just as important as getting it fast.
Witness the dialog in Alice’s Adventures in Wonderland:
“Would you tell me, please, which way I ought to go from here?”
“That depends a good deal on where you want to get to,” said the Cat.
“I don’t much care where—” said Alice.
“Then it doesn’t matter which way you go,” said the Cat.
“—so long as I get SOMEWHERE,” Alice added as an explanation.
“Oh, you’re sure to do that,” said the Cat, “if you only walk long enough.”
It seems to me that we are in a hurry to “get SOMEWHERE”, regardless of where it is. This is the reason I would like to cover the elusive definition of Single Source of Truth in this blog.
System of Record Vs. Source of Truth
A System of Record is typically an application (with data persistent in one or more databases), which is the authoritative source for a given data element or piece of information.
For simple data elements, this is clear-cut. For example, the system of record for an employee’s salary would be the HR system. For many business elements, things start to get fuzzy quickly. We live in a non-monolithic world where the attributes of an entity exist across multiple applications.
Let’s explore this further.
Establishing a Golden Record consists of building a composite of what is known by different applications. For example, one application may store the customer number, name and loyalty card balance. Another system may also store the same attributes in addition to the address. A third application may also store some of the same elements and an email address. Each of these systems of record might obtain and store data in different ways, resulting in different formats for the same fields. Each is valid in its own application silo, but how do we combine them into a single, consistent view of the data — a Single Source of Truth about the customer? This is not a trivial challenge from a technical and organizational perspective. Specifically, if a phone number for the customer exists in all three applications, which is the correct one? This is the crux of the problem!
Single Source of Truth is an architectural practice of maintaining data in one location (at least logically) where it is stored as a complete picture. When limited to just reading the data, this is not a major problem and many options exist to present this unified view to users. However, when updates also need to be handled, multiple sources of truth can exist. A bi-directional update can be a challenge, even with strong Atomicity, Consistency, Isolation, Durability (ACID) properties and isolation levels specified, since the timing of such updates need to be considered.
An additional problem is the semantics and actual meaning of the data element. When the CEO asks “What is the total cost of inventory at our stores?”, determining the quantity on hand may be relatively easy, but its valuation may be subject to interpretation. Do we use the cost price, sales price, consider discounts/promotions, shrink etc.?
Approaches towards the path of truth
Traditional attempts to create a Single Source of Truth include:
- System of Record — an application which is the authoritative source for a given data record or piece of data.
- Enterprise Service Bus — a logical store maintained via message queues using the Publisher/Subscriber (PubSub) approach.
- Apache Kafka — an open-source stream-processing software.
- Master Data Management (MDM) — a method or model that enables an enterprise to link all critical data to one ‘master’ file, which provides a single point of reference to the master data. MDM promises to increase collaboration and reduce data silos. This does not necessarily mean that all data is located physically in such a “master”. More on the MDM styles in the next section.
MDM Styles – Which One is Right for You?
Master data is the consistent and uniform set of identifiers and extended attributes that describe the core entities of an enterprise, such as customers, prospective clients, citizens, suppliers, sites, hierarchies and the chart of accounts.
Master Data Management (MDM) is a technology-enabled discipline in which business and IT teams work together to ensure the uniformity, accuracy, stewardship, semantic consistency, and accountability of their enterprise’s official shared master data assets.
Gartner has identified four implementation styles. They vary in terms of where the data is created, stored, latency, search complexity etc.
We will briefly describe these as identified by Gartner.
- Consolidation: Also called Analytical MDM, we create a snapshot copy of all relevant data in a warehouse environment, leaving the operational data alone where it resides.
- Registry: We maintain a central key of global identities, links to master data in source systems and hold transformation rules centrally. At runtime, the MDM hub accesses the master data from source systems and assembles a point-in-time view. This is popular when we have a lot of distributed data.
- Centralized: Also called “Transactional,” we centralize the creation, validation and storage of the actual master data in a single hub for all purposes. This is very invasive affects all existing applications and requires a huge commitment to maintain.
- Coexistence: This is a hybrid approach which combines aspects of all the other styles. We could start with a centralized approach for existing applications and then move to a registry approach later. Gartner expects this style to become pervasive in the future.
Regardless of the style chosen, we must document where master data is maintained and stored, and from where users will access it.
Challenges in the Cloud era
The problem in defining a single version of truth is hard, even with all applications having on-prem implementations. The problem becomes even more complex as different applications are located in different cloud environments as SAAS(Software As A Service) or PAAS(Platform As A Service) implementations.
Many organizations do not have an MDM solution or have attempted to build a central MDM solution but failed. Organizations are dynamic and flexible – an MDM solution built today rarely lasts for many tomorrows. The overhead of governance and the lack of agility leads to this monolithic approach being viewed as bureaucratic red tape. It is likely to be eventually discarded, because no one follows it.
The best option in today’s hybrid environments would be to create a Source of Truth by compiling item attributes from the different item Systems of Record, using the registry style or hybrid style depending on the organization’s culture and how data management has evolved. In any case, strict adherence to good data management principles is critical to long term success.
What can we do as Data Management professionals?
We must balance the “need for speed” and the need to ensure that business users not only get the data fast, but also get the right data. We need to do this without clinging to data management principles which have outlived their usefulness and embrace the more practical approaches outlined in this blog. If attempts at MDM did not work for you in the past, it is time to tailor your approach to your needs and not discard it completely.
I am not naïve enough to believe that a silver bullet to solve this issue exists, but how can we solve a problem we don’t recognize?
I welcome comments!
References
Gartner ID G00276267: “The Five Vectors of Complexity That Define Your MDM Strategy” – Andrew White, Refreshed 4 October 2016, Published 27 April 2015.
Gee, thanks for this blog Suresh, Some thoughts to think about indeed …. Even with all (ever changing, at higher rates than before) Applications and Data on-premise, it’s a challenge.