Theory

One small, but very significant part of integrating IT systems is translating between value sets. The Nexus Value Translator is a micro service for that.

The basics

Two kinds of value sets

There are two kind of value sets that the Nexus Value Translator handles; dynamic (e.g. identifiers) and static (e.g. enumerations). The dynamic data sets get new values now and then (typically in an automated manner) while the static data sets are configured with their entire value set initally and are then updated infrequently (typically in a manual manner).

Let's dive deeper into the dynamic data sets. For the same entity, e.g. Person, each IT system normally has its own proprietary identification scheme. Each entity object is uniquely identified in the system by a key; the identifying key. An object, e.g. a person, that has one identifying key in one system will most probably be identified by another key in just about every other system. There can also be other non-system schemes for identifying a person, such as social security numbers, e-mail addresses, etc. As new objects are created, new translations are required.

Traditionally the different systems need to agree on a common identifying scheme or store an "external id" within their own system to reference an id in another system. This is not desired as the systems become coupled and it also increases complexity for each system.

Now let's look at the static data sets. Sometimes you need a static set of values for something, e.g. currency codes, person gender, ticket status, etc. For the same set of values, different systems would most probably have different ways of denoting them. There can also be non-system schemes for values, such as ISO standards, e.g. ISO 4217 for currency codes.

A static data set can be expressed as a table. This is an example for currencies and two contexts; ISO 4271 and the system Agresso:

Currency code	ISO 4271	Agresso
Swedish krona	SEK	27
Pound sterling	GBP	12
United States dollar	USD	1

Entity model

The basic entities for value translation are; Concept, Context, Instance and Form.

Basic entity model

Lets say that you want to translate currency codes. Then "currency code" is the Concept that you would like to translate. Other examples of concepts would be "person id", "product id" or "person gender".

A specific concept normally has several different values sets that all mean the same thing. Each value set is called a Context and depicts the source of the values, i.e. the one that defines the values and creates new ones when needed. For "currency code" one non-system context could be "iso 4217" and system-contexts could be "agresso", "ms crm", etc.

Each context has a set of values. A value in a context is called an Instance. For the concept "currency code" and the context "iso 4217" here are some instances; "SEK", "USD", "GBP", "EUR". For the context "system x" the instances might be "1", "23", "17" and "74".

Finally we need something that connects the instances that denote the same value; we call this the Form. Each instance must be connected to a form. All instances that are connected to the same form are considered to denote the same value. If the instances "USD" and "23" are connected to the same form, then they are considered to denote the "same" value.

Vocabulary

Any instance value can be expressed in the global value format, i.e. a format where each value is unique wihin the digital business platform. That format consists of the concept name + the context name + the instance value. The process where we add the concept name and the context name to the instance value is called decorating. A decorated instance value is sometimes called a concept value. Here is an example of a concept value:

{
    "concept-name": "currency.code",
    "context-name": "iso4217",
    "instance-value": "SEK"
}

Here is another concept value, expressed in a compressed format: (person.id!email!john.doe@example.com).

When translating instance values we talk of source context and target context; the source context is the context of the concept value before translation and target context is the context after the translation.

Case sensitivity

A context can either be case sensitive or not. GUIDs for instance are not case sensitive; they consists of hexadecimal numbers (0-9, a-f). Values for contexts that are not case sensitive will be converted into lower case before they are handled by the value translator capability.

Functionality

The service has two main methods; translate and associate.

The translate method is the core of the entire service. When you call this method you will send along a list of concept values that you want a translation for and for each concept value there is a target context .

The associate method is how we add new instance values for dynamic value sets (i.e. identities). When you associate two instance values in the same concept, they will be connected to the same form and will therefore be considered to be "the same value".

Advanced topics

Clients

The translations are centered around translating from one Context to another. One potential problem is to know the contexts involved; in Business Events we want to translate data from a publishing client to a subscribing client. How should the Business Event service know the contexts involved?

The value translator solves this by being able to set up which contexts that should be used for a client. You can configure which context to use for each concept that is relevant for the client. If the client context is configured to null, then the default context for the concept is used. If the client context is not defined at all then this means that no translation is required for this client, i.e. the client will receive a decorated concept value.

Example: If we have a client portal that has been configured with context email for the concept person.id, then these two concept values are considered the same (person.id!email!john.doe@example.com) and (person.id!~context!john.doe@example.com). Note that you replace the context name with the client name and prefix it with a tilde (~).

Automatic instance generation

We support contexts that should be created automatically. The simplest form is increasing integer values, e.g. the values "1", "2", "3", etc.

We also have an advanced model where you configure the characters that you would like to include in the id and how long the value should be. Example: You want the id to be 5 characters long and have upper case letters A-Z and the digits 0-9. This will result in random id strings like "X7BFP", "3PQZI", etc.

We recommend that you make sure that the number of possible values is 100 times bigger than the expected maximum number of needed values. This is because the values are randomly picked and if the randomly picked number already exists, we will try again and again, but you want to avoid those collisions.

Here is a formula for calculating the number of characters that are needed for your value: log(maximum number of needed values)/log(different characters in the set), rounded up to the nearest integer. Example: If you need 1.000.000.000 values and you have 32 characters in your set, then you will need the values to be of length log(100 * 1.000.000.000)/log(32) = 7.3, i.e. length 8. If you have 56 characters in your set, then you still need log(100 * 1.000.000.000)/log(56) = 6.3, i.e. length 8.

When you choose the characters in your set, please consider if the values are at risk of human errors. Here are some examples to help you in that process.

Use case	Character set	Notes
Copy from print	ABCDEFGHJKLMNPQRSTUVWXYZ abcdefghijkmnpqrstuvwxyz 23456789	To help a person copy an identity from print, similar characters have been omitted. The set has 56 characters.
Repeat by voice	ABCDEFGHJKLMNPQRSTUVWXYZ 23456789	All lowercase characters have been omitted, to avoid having to declare if a character is in upper case or lower case. Similar sounding characters in your language, such as "M" and "N" in English and Swedish could also be considered to be omitted. The set has 32 characters.

Duplicates

For dynamic data sets, we will sometime duplicates that two values from the same system are duplicates, i.e. they identify the same object.

If there are two or more values from the same context that are considered to be the same, then we have a problem because if we need to translate to that context, which of the values should we pick? This is configured for the context by deciding if the first entered value or the latest entered value should be the preferred value, i.e. the value that is returned for instance when translating . The default configuration is to use the latest entered value.

Miscellaneous

For manually entered contexts (enumerations) you might find cases where one context lacks values that exists in other contexts. We do have support for this, it will be documented when a customer has the need for this feature.