Theory

Some master data, such as information about companies or persons, are often stored in more than one system. It is important that this information is consistent, i.e. when the data is updated in one system it should be updated in the other systems too. The Nexus Data Sync Engine is a service for synchronizing master data between IT systems.

The basics

Master data

Master Data represents "data about the business entities that provide context for business transactions" according to Wikipedia. Here is an example: If the involved systems agree on the master data about a person, e.g. name and address, a transaction just needs to refer to the id of the person.

The business entities that typically have master data associated to them are persons, companies, products and assets.

The challenges

If the master data in one system is updated, then it is essential that it is updated in the other systems too. This seemingly simple task can become immensely complicated.

Which systems are leaders and which are followers?

Different system might be considered to be the leader for different attributes. E.g. the CRM system is the master for a persons name and legal address and the economy system is master for the billing address.
Can we only update data in the leader system? This will have a negative impact on correcting erroneous data, i.e. when a user observes an error in an IT system where it can't be corrected, there is a risk that the user ignores it or forgets to correct it.
If we only have one leader system, then it must have all the attributes that should be synced, even if they make no sense in that particular system.

Proprietary data models

It is difficult to agree on how the attributes correlate. The "Address" attribute for a person in the CRM system is most probably the legal address, while the "Address" attribute for a customer in another system might be a delivery address or a billing address.
It is very common that the data model has been denormalized to simplify the proprietary model in an IT system. Several customer records in one system can refer to the same person record in another system. This means that all those customer records should probably be updated if a person attribute is updated, e.g. the phone number.
With denormalized models, it becomes even harder to correlate attributes. If the same physical person has two customer records, should we allow different names, phone numbers and e-mail addresses for the two customers, or should they be kept in sync?
In legacy systems, you might have things like: all text in capital letters, limited number of characters in fields.

Different quality of data

Some systems provide text fields with no quality control in the user interface and thereby leaves the quality assurance to the user. This leads to manual mistakes and also to "innovative" usage of the fields, e.g. using different formats or adding extra information that wasn't intended for that field. Sending this data to an IT system that have strict quality control in the user interface, but fewer checks in the data integration can corrupt the data in the target system.

Different interpretation of data changes

An elaborate example: Suppose we have a company that delivers their products to employees in other companies. The company calls those "delivery customers". In the business system, the name of a delivery customer is more or less just the first line of the delivery address. If an employee at a company is replaced by another person, it seems logical to just change the name of the delivery customer. In the CRM system on the other hand they want to be able to provide an intimate contact with the contact persons at the companies. In the CRM system, this should be a new contact person and the old contact person should be marked as terminated. This profoundly different view on data needs to be understood, or the data will deteriorate.

External data sources

One way to get continous reliable data updates is to use an external data source, such as a public record. This data is rightfully seen as reliable, but there is at least one caveat: When data has changed, then there is often a latency until the public records are updated and your system might very well have been updated with the latest data. If we blindly accept the data from the public record, your newly updated data (correct) might get overwritten with old data.

Master data management

The importance of master data and the challenges associated with it has resulted in a discipline called Master data management, see Wikipedia.

There are a number of models for implementing a technology solution for master data management. Some examples:

Source of records
One system is designated as the master and all other systems are followers.
Consolidation
Create a single version of the truth based on data from a number of source systems. The consolidated data is used for reports and analysis. The data is usually not fed back to the source systems.
Coexistence
Data is mastered in source systems and then synchronized with a centralized hub, so data can coexist harmoniously and still offer a single version of the truth, "the golden record".

Choosing a model

In a digital platform you will need a capability for synchronizing master data between IT-systems, so we need to pick one of the models above.

The consolidation model does not synchronize data, i.e. it doesn't feed the consolidated data back to the systems, so it is not a valid choice.

Using one of the existing systems (source of records model) for master data management is very tempting. For instance when it comes to synchronize customer data, why not use the CRM system as the master as it already specializes in having correct customer data? This intuitive choice is seldom sustainable:

It will be difficult for the master system to handle all models and attributes that the other systems may require.
This model often requires that changes are only made in this central system, making it cumbersome when you spot an error in the data in one of the client systems.
Combining widely different capabilities in the same IT system will result in conflicting requriements on the system and will make it harder to replace.

We argue that the coexistence model is the natural choice and base our master data synchronization capability on that model. A specialized service for data synchronization fits very well in a the digital business platform architecture. It can hide the complexity of data synchronization from the clients and it can deal with all kinds of master data. It can be work with a high degree of automatisation and it has the capacity to handle most of the challenges that we face in this discipline.

We sometimes refer to this model as democratic data synchronization (as opposed to authoritarian); every client has the opportunity to contribute to the quality of the master data. This can be fine tuned on the attribute level, to reflect on how much we trust a specific client.