In the design of data structures or objects, there are two different kinds of data, namely “Master Data” (german: Stammdaten) and “Variable Data” (german: Bewegungsdaten). The first kind, master data, are data fields that will change seldom over time and can sometimes be used to “identify” an object. The second kind are data fields that capture the current value of an object’s aspect, but are expected to change in the future. If you can categorize your data fields in this manner, think about separating them into different objects.
Let me make an actual example. An application we develop has a central instance (the “center”) that distributes situational data to several operation desks, powered by client applications, named the “clients”. Each client instance is registered in the center to enable the supervision and administration of clients. The data for each client is stored in a ClientInformation object that is mapped to a database relation. Let’s have a look at some of the data fields of ClientInformation:
- int internalIdentifier – the database primary key for the record
- String type – some type of the client application
- String instanceName – the given readable denotation of the operation desk
- String version – the currently installed version of the client application
- Date connectionDate – the last time this client application established a connection
- Date lastActionDate – the last time this client application issued an action command (“was active”)
We can start all kinds of (justified) discussion about primitive obsession, too much information at one place and so on, but for this blog entry, only the categorization in master data and variable data is of interest. My opinion on the example is that the first three data fields (internalIdentifier, type and instanceName) are definitely in the master data category. The last two data fields are clearly variable data, while the version field is something in between. My guts tell me to categorize the version as master data, because it won’t change on a daily schedule.
When separating the two categories of data, the ClientInformation object may turn into a reference holder object only. In this case, the ClientInformation holds two references, one to a new ClientMasterData object (holding internalIdentifier, type, instanceName and version) and another one to a new ClientVariableData object (holding connectionDate and lastActionDate).
A less radical modification would be to let the master data remain in the ClientInformation object and only extract the variable data into a new ClientConnectionData object. If a client connects, only the referenced ClientConnectionData object has to change.
If you separate your master data from the variable data, you can very easily concentrate on the variable data for performance optimizations. This is where the data changes will happen and a tuned storage strategy will pay off. The master data should be designed more carefully concerning the type information, so if we really start the discussion about primitive obsession, I would first tend to the master data fields and argue that the type shouldn’t be a String but an Enum and the version should be a more sophisticated Version type. This could be modelled even with a slow object/relational mapper because the data is only written/read once.
The next time you come across one of your data model objects that contain more than two data fields, have a look at their categorization in master and variable data. Perhaps you can see a good reason to split the object.