Our implementation of the master data synchronization capability is called Nexus Link Data Synchronization Engine or DSE for short.
Features
- Keeps track of old versions of the data for every attribute.
- If a client has changed two attributes and one of the changes is valid and the other is not, the valid change will still be synchronized.
- Provides data validation based on regular expressions and powerful dynamic functions.
- Can handle data mapping between client models and the central normalized model.
- Translates between different data sets for enumerations.
- Understands duplicate records and keeps them all synchronized.
- Can be configured for approving data before accepting changes and before distributing changes.
- Can handle the challenges of a "deaf" client, i.e. a data source that doesn't listen to updates.
- Can handle that a client truncates data, so that the truncated data isn't spread to other systems.
- Loosely coupled fields. Handle fields that are not 100% correlated. Example: The person record has an attribute for e-mail. The customer records for that customer also have an e-mail attribute. 99% of the time they have the same value. DSE understands that changes to the person e-mail should only be reflected on the customer e-mail if they were the same to start with.
- The synchronization process is completely asynchronous. This means that any of the clients and even the synchronization capability can be "down" for days. When the failing system is "up" again, the synchronization process will just continue where it left off, i.e. the synchronization process is very robust.
Concepts
Normalized canonical model
Keys in data exchange
When configuring an entity and its attributes, you can define an attribute to be of type key. When such key attributes are used in communication between the client and the DSE, they can affect how objects are connected or merged as duplicates.
There are three different usages of key attributes in client data and we will introduced them by using the following example:
We have a normalized canonical model for the that looks like this:
When DSE sends information to or receives information from a client it refers to the client key that uniquely identifies the current object. In the DSE, this key is called the primary key for the object.
One important aspect of
When DSE sends or receives data
- Primary key: The client primary key for the current client object (that we are dealing with)
- Secondary key: Another unique key for the current client object (always the same entity). This key always refers to a key in another client. That client can be a "virtual client" like the "Internet" client for email-address.
- Primary reference key: A reference from the current client object to another object at the same client (with the same or different entity). From a client perspective, this is a "foreign key" (DB lingo).
- Secondary reference key: Only relevant when we have a client object that is denormalized (consists of more than one entity). A reference from the current client object and the reference is to a "parent object" at another client (with a different entity).
Database tables
To understand DSE, you must become familiar with its database model. Its database tables can be divided into configuration tables and runtime data tables. The configuration tables are initially set up with the configuration for the specific customer case and rarely change much after that. The runtime data tables are constantly updated as DSE executes data synchronization transactions. In the illustrations below, we distinguish between them by using blue color for configuration tables and yellow color for runtime data tables.
In the illustrations we will only highlight a few columns of each table, in practice there are more columns. For instance you will always find an Id
column which is the primary key (a GUID) and two columns named Created_Utc
and Updated_Utc
which are set when a row is created or updated respectively.
Core
The DSE uses an entity-attribute-value (EAV) model to define master data entities and store versions of master data for those entities. The EAV tables are the four tables that are colored with a slightly darker color in the illustration:
Entity
: The master data entities that we would like to store data for. This table is for both defining the proprietary client entities (IsMd=0
) and the normalized entities (IsMd = 1
).Attribute
: For each entity, these are the attributes that we want to include in data synchronization.Object
: Contains one line for each entity instance that is stored in the database. We only store the normalized entities. Each object has a version associated to it. It is increased whenever a change is made to one or more attributes of the object.Data
: The attribute values for the objects. Both current (IsCurrent=1
) and older versions (IsCurrent=0
). When an attribute changes, a new line is added and the version number of that line is set to the object version. Please note that an attribute can have one line withVersion=1
,IsCurrent=0
and one line withVersion=6
,IsCurrent=1
without any versions in between; this implies that there were other attributes that were updated for the object version 2 up to 5.
The table Data
together with Key
are the most important tables, as they contains all the current and historical data of the master data objects and the connection to and state of the corresponding client objects. Some important columns of the Key
table:
Value
: The value of the client primary key.Version
: Because of normalization, one client object can point to several normalized objects, so there can be more than one object version for one client version. This is the current client version, which through theVersion
table points to theMdVersion
for each involvedObject
.Duplicate_No
: If one client is considered to have duplicates, then all those keys will point to the same object and the keys will have this attribute numbered 1, 2, 3, etc.Checksum
: This column is used to detect if DSE and the client really are in sync. DSE includes the checksum in every update request to the client, so the client can detect if it has newer data than DSE; in that case the client will reject the update and require DSE to make a new read.HasError
: If DSE has problems with updating the client for a specifc key, it will setHasError=1
. Then DSE will stop sending update requests until the object is updated the next time.
KeyType
has the configuration data for a Key
. Some examples:
- The client entity that the
Key
is primary key for. - The normalized entity that we associate the client entity with.
IsSendUpdateAllowed
: If this is set, then DSE will never try to update the corresponding client. This is used for "deaf" clients, i.e. clients that we can only get updates from, never update.IsSendUpdatedPaused
: This functions as a kind of emergency break. While this is set, DSE will temporarily pause all updates for this key type, but as soon as it is unset, DSE will resume from where it was paused.
The Attribute
table has two important support tables:
AttributeAccess
contains the access rights for each client to an attribute.IsAuthority=1
means that in case of a conflict between client updates, this client is considered an authority on this attribute.AttributeConnection
is the mapping configuration between client entity attributes and normalized entity attributes.IsSimple=1
means that the mapping has no associated transformation methods (see below) and can be carried out as a simple copy of the value
Dynamic methods
DSE has an (extendable) set of built in methods that can be called to validate attributes or for advanced mapping (transformation) between client entities and normlized entities. If you want to use them, you configure how you want them to be used.
Method
: References to built in methods in DSE.Parameter
: Defines the parameters for the built in method.MethodCall
: Definitions of calls to built in methods. Can refer to a number of calls that should be carried out sequentially where the out parameters from one call can be used as in parameters for the next call in the sequence.Argument
: The value that should go into a specific parameter. This could be a constant, the value of an attribute or an out parameter from another call.Validation
: Defines methods to call to validate a specific attribute.Transformation
: Defines methods to call to transform data for a specific entity.IsClientToMd
indicates the direction of the transformation (from the client entity to a normalized entity or the other way around).
Other configuration tables
Code
,CodeType
: Translations for enumeration values, e.g. currencies, countries, gender, etc.Option
: Settings for how DSE should behave in certain aspects, e.g. how often batches should be run, etc.Type
: DSE uses GUIDs for internal enumerations such as severity levels, method types, etc. This table is a dictionary for the enumerations.
Other runtime data tables
Approval
: Attribute updates that requires a manual approval. Both pending and history of approved and rejected updates.DataLog
: Log messages for failed validations.KeyQueueItem
: Keys that are waiting for client data to be read or updated are represented here. When this table is empty, then there is no outstanding data sync transactions.KeyRefresh
: Keys that we want to force DSE to read from and/or update to the client. This can be triggered automatically by setting theRefreshAfter_Days
attribute in theKeyType
table or manually by adding records manually.LastData
: Keeps a record of data from "deaf" clients that we should ignore; the data has been updated in the golden record, but we have not been able to inform the client about it.Log
: All log messages from DSE are stored here and the messages are associated with the corresponding key and objectMutex
: Each row here is a semaphore that is used to avoid parallel execution of things that can't be executed in parallel.Process
: Every update event results in a new process GUID. The process GUID is used to group updates and logs that occur during that update process.
Challenges
Unexpected rejections
The fact that the DSE is asynchronous and loosely coupled to the clients means that it is difficult to inform a user if the data they just entered was rejected by the DSE. To update data and later find that it has not changed can be confusing and frustrating if you are not informed why the data was rejected. We have some ideas on how to approach this:
- If the user that made the change could be identified, then we could send an alert to the user about the rejection.
- The client must provide an interface that the DSE can call and then it is up to the client to inform the user.
Create a normalized model
To understand the semantics of all proprietary client models and then create a unified normalized data model is difficult and often requires experience.
Change the configuration
Some configuration changes can be complex, i.e. they may require change of data.
Browse or update the data
DSE uses the entity-attribute-value (EAV) model to define and store versions of master data. It is very good for the intended purpuse, but is not well suited for browsing or updating the master data.
We suggest that you create a new client that has the same data model as the normalized model and let it take part in synchronization. You can then use that client for browsing and updating data.
FAQ
Can you compare the DSE to a version control system for code?
Q: For data code, there is a well known concept called version control. Are there any similarities with the DSE?
A: They have very much in common. There is a central golder record for the files/objects. A developer/client has their own version of the files/objects. When a developer/client has made an update, they commit the change to the central golden record. If two developers/clients changes the same file/object, the version control system will try to merge the changes. If the developers/clients have changed the same line/attribute, there will be a conflict that has to be resolved.
How does the DSE handle simultaneous changes?
Q: Suppose two clients make an update to the same object simultaneously, e.g. the same Person object. How does the DSE handle this potential conflict?
A: The DSE will try to merge the two objects. For the attributes where only one client have made changes, the changes will pass. Changes for the same attribute means that there is a conflict. Version control systems handles conflicts by asking the last developer to push their changes to resolve it manually. The DSE solves the conflicts using a number of rules, in short: Each sync client has an authority level for each attribute and the client with the highest authority level for a specific attribute wins. If the two clients have the same authority level, the first verified change wins (just like a VCS) and the second one will have its value revoked and restored to the value that the first client submitted.
How do you handle the risk of a bad system spreading bad data?
Q: A risk with data synchronization is that a system with no data checks can spread bad data to the other systems.
A: The DSE can be confifgured and programmed to verify the data from the clients. The DSE should have the union of all data checks from all systems, making it the toughest verifier. If a client sends bad data for an attribute, that attribute value will be rejected and the client will have the information restored to the data in the golden record.
How do you handle that a system is unavailable
Q: If a synchronization client, or the DSE itself, is down for an long time, how is that handled?
A: We use a queue with at-least-once guaranteed delivery. This means that if a client succeeds in putting an update message on that queue, it will eventually be handled by the DSE. The same goes for the other direction. The DSE puts GET and PUT requests on that queue and they will eventually be expedited by the synchronization clients. If a system is available, but is throttling incoming requests, the queue will handle this and retry later.
When a client is updated by the DSE, it will send another event. Loop?
Q: If the DSE updates a client with new data, the client will send a new update event. How will the DSE handle that situation.
A: The DSE will GET the data for the object from the client and compare the data with the golden record. If the data is equal, nothing will be done. If there are changes, it will be handle as any normal update event.