The IT architect, Part III: Improve your environment

If you happen to work on a system that scales to the size of an IT landscape, your worst bet is to let it evolve by circumstances. You want to have a plan and act upon that plan. The base for your plan could be a landscape map, which we talked about in the first part of this series. Upon drawing the map, you want to interpret it in order to find the strong points and weak spots. We’ve talked about assessing the map in the second part of this series.

In this blog article, we look at ways to improve our IT landscape towards the goal of overall stability.

Our mission statement

If we want to improve things, we need to know in what aspect the improvement should occur. At the scale of an IT landscape, overall stability is a commonly desired trait. This doesn’t mean rigidity, where you cannot change a thing in the landscape lest the whole thing breaks. It also doesn’t mean that ever part of our landscape needs to be stable itself. Overall stability means that even with the inevitable outage or replacement of a part, the whole system still works. The system is resilient to change and failure, at least resilient enough for the organization working with the system.

If our mission is to improve towards overall stability, we need to work on the relationships between our services (or assets, as we called them earlier, because “service” is a greatly overloaded term) more than we need to work on the service itself.

This doesn’t mean that individual stability of an asset isn’t important. It certainly is, but more often than not, you cannot improve this single value that much. What you can iterate on with recognizable effect is limiting the consequences of lacking individual stability.

Our mantra

The fundamental rule that brings overall stability is the “dependency rule” of the clean architecture that is meant for internal software application architecture. But if we see our IT landscape as one big application (software or not might make less of a difference than thought), we can apply the rule without modification:

All dependencies point towards the center (inside) and never in the opposite direction (outside).

That’s it. You define a center of your map and have all dependencies point towards it. This results in a structure of “rings” around the center that denote different levels of stability. The dependency rule can be rewritten as such:

All dependencies point from the less stable asset to the more stable asset and never in the opposite direction.

If you think of stability only in terms of “service availability” that tells you the percentage of time you can utilize the service without degradation, you’re thinking too short. Stability also means stable interface and stable implementation. You can have a really rock solid ISDN internet connection at the center of your IT landscape map, if your ISP discontinues the technology, the lack of implementation stability will force you to change the asset and hope that all dependent assets (basically your whole map) are not affected by the change.

Planning for obsolescence

Trying to bring the relationships between your assets in congruence with their significance for your IT landscape is the central work of an IT architect. The main question is always: What happens if this asset needs to be replaced?

In IT, there is no such thing as an “eternally working asset”. I’m not well-versed in more physical domains like mechanical engineering to say that this an univeral invariant, but in my field of speciality, everything changes eventually.

If you create an IT landscape where every asset can be replaced with manageable effort and predictable consequences, you’ve created an overall stable system. You can probably improve the availability of parts of it, but you won’t need to overhaul the whole thing over and over again. Your IT landscape is ready to grow, evolve and change, but it does so in a controlled manner and without compromising the mission.

Anti-obsolescence patterns

On your way from your current map to your anticipated one, you’ll recognize recurring patterns that you employ to solve dependency problems or improve the longevity of overall structures. Here are three patterns that have helped me in my endeavours:

Protected variation

If you have more than one implementation for basically the same service (like the example of an internet connection), you probably want the rest of the map to not know about the multiplicity. In this case, you introduce an additional asset that acts as a router between the implementations. Think of the router (or service interface) as a guarding wall for your service implementations. It acts as a “portal” to the real service and can be paper-thin (at least for now). If you want to improve the runtime availability of your service, the router can also act as a load balancer and a circuit breaker. The important rule is that all outside relationships only point to the router, not the actual implementations.

Opinionated interface

If you find an asset that has a lot of incoming dependencies, you’ve found a change risk. If you swap the service for a newer version with a similar, but not quite equivalent interface, you’ll find that you have to adjust lots of dependent assets in oftentimes surprising fashion. You can reduce your surprise by introducing a “portal” interface, just like you did in the protected variations, but without the variations. The portal or “opinionated service” interface offers everything your other assets require of the original service, but nothing more. It captures the “opinion” of your organization towards the service. When you introduce such a portal, it is nothing more than a forwarding service that maybe handles authentication itself. If you plan to swap the service, the portal becomes your requirement list and its new implementation will convert data back and forth.

If you find that your portal gets to big, you could think about multiple portals with their own separated “opinion” about the service, all forwarding to the same “source of truth”:

Now you need to maintain several interface services, but they separate different concerns or contexts into separate assets, which might help with future migrations. Chances are, if there are separate concerns, that they will be provided by separate assets in the future.

Circular portals

The most nasty thing to occur on your map is the ring (see part II of the series). In its smallest form, its just two assets requiring each other:

There are no easy ways out of this situation. But we can make some steps in the right direction and see where it takes us. The first step is introducing buffer assets that act as stand-ins for the real asset:

This doesn’t break the ring yet, but it gives us a chance to do so later. The service interfaces are opinionated and maybe even tailored specifically to the service using it. This reduces the “area of dependency” to its minimum. With a little luck, we find that Service A requires things of Service B that, if isolated, don’t require things of Service A themselves. If that’s the case, we can work on splitting Service B in two parts: One dependent on Service A, but not required by it and one indepedent of Service A, but needed by it. This would break the ring and give us a long chain that is much easier to work with. The problem is: None of this is guaranteed. In the worst case, you’ll still end up with your circular dependency with extra steps and nothing can be done about it.

Conclusion

When you begin to work with your IT landscape map, you begin to transform your assets from what they provide to what others actually require from them. Minimizing the relationships between assets, if not by number then at least by scope, is an appreciable improvement that gives you leeway to make changes in your actual IT setup without compromising the overall structure.

If you accompany your journey towards the best-fitting IT landscape with your map, you always have a plan at hands that you can show to people to form a shared understanding of the current state and the desired outcome. And if you keep old versions of your map in the archives, you can sometimes look back and see how far you’ve come yet.

The IT architect, Part II: Assess your situation

If you want to work on the scale of an IT landscape, you need to have a plan in the form of a map. In the first part of this series, we talked about creating such a map. This blog entry will give you the basic tools to make sense of all the things on it and how to convey meaning to other people while using the map.

The third part will talk about actionable steps that are a result of our interpretation of the map.

Making sense of the map

You’ve drawn the map of all your IT assets and given all the boxes names that you find useful. You’ve asked around to find relationships between your assets, represented by arrows between the boxes. You’ve moved the boxes around a bit to reduce arrow intersections. The map seems to be as “clean” as it can get at the moment.

Now is the time to apply meaning to the structures you see.

Interpreting loners

The first thing you want to look for are boxes without any relationships. These entities don’t interact with other things on your map and are not required by anything, too. Let’s think of them as independent value sources. If this asset brings your organization a describable and current advantage, you’ve found the ideal asset.

An example could be the blue box “L” in our example map. It isn’t coupled to any other asset. Let’s say it is a “customer relationship management” (CRM) system. Remember, boxes are not labeled by their actual implementation (in this case, maybe a vTiger or SugarCRM), but by the value they provide for the organization. If your organization needs a CRM (or benefits from its presence), then you have a “loner”, which is a good thing.

If the CRM stops working, the humans in the organization will be unhappy about it, but the outage itself will be limited to the CRM and not spread to other parts of your IT landscape (given that your map reflects the reality). If the outage lasts longer, your employees will adapt their work processes to circumvent the pothole in your IT. There will be a lot of post-it notes, at least for some time.

If the CRM is updated to a new version, you need to train your employees, but it won’t require other IT entities in your organization to match that update. The CRM can run on ancient hardware and software, as long as the human requirements are met. A loner on your map is a good thing.

Interpreting relicts

If you find a lonely box without a current use case, you’ve found a relict. Be glad that you’ve found it, because relicts tend to remain hidden and not show up on architect maps. If you can make sure that the relict serves no purpose for the organization anymore, you can eliminate it. Removing an asset from the map (and your real IT infrastructure) is a good thing, because you reduce complexity, costs and risks. There is no IT asset without associated costs and risks.

If, for example, the yellow box “P” represents a computer that provides a service that nobody uses anymore, the computer itself is still present in the network and can be used as a stepping stone for malicious itents. Let’s say the computer is a Raspberry Pi that isn’t included in the first tier of workhorse computers, its operating system might be outdated and susceptible to attacks. It doesn’t provide value for the organization anymore, but it increases the organization’s risk.

Revealing this kind of “dead weight” in your IT landscape is a real advantage, because you can cut it out rather easy.

Interpreting rings

A typical structure on your map could be a circular dependency. In its smallest form, it is just two boxes that both depend on each other. The more elaborate ring consists of several boxes that are connected without a clear start and end. This is the worst thing to find.

A ring in your entities means that you have to consider all elements in the ring as one big entity. You cannot modify them independently, neither on the technological level nor on in the temporal dimension. A ring is basically a mexican standoff situation for all included entities. You can also call it a deadlock. Whatever you call it, it is bad news. You probably want to break the ring as soon as possible.

Breaking a ring would warrant its own blog post altogether. A basic starting point might be the Acyclic dependencies principle of software design. You probably need to split at least one of your entities into smaller parts or introduce a new entity. The least favorable move would be to merge all entities into one bigger entity, creating a monolith. You will regret this move when the inevitable modernization pressure rises.

Interpreting chains

If your entities form “deep” dependency lines where A depends on B, B depends on C, C depends on D and so forth, you have discovered a chain. This structure is less troublesome compared to the ring, but worth a worry nonetheless. In terms of operational risk, the chain creates a meta-system with a failure rate that is the sum of the failure rates of the chain elements. To make a long story short, you’ll never get a reliable infrastructure with long chains.

The longer your chains are, the more ripple effects an outage will have on your IT landscape. Remember that a chain always breaks at its weakest link, but this link will bring down the whole line.

You can reduce the length of a chain of entities in your IT landscape by inserting buffer elements like read-only copies of central data sources. But more important is to think (and talk) about why the dependencies are there in the first place. Maybe your data storage strategy is too decentralized and you would gain some favorable dependency structures by pooling data together (essentially creating a data monolith if you overdo it).

Introducing zones

Recognizing the basic shapes on your map is important, but you also have to look at the forest and not only the trees. The basic layout of your boxes already tell you a lot about your IT landscape zones.

A zone on your map is a region of boxes that you can encircle and give it a superordinate name. The basic rule of a zone is that all entities in it should share a common property. The less technology-based this property is, the better is your zoning. A zone for “java web services” or “metal computers” is eventually useful, but won’t stand the test of time. Sooner or later, some java services are replaced by other programming languages and some real machines get virtualized. Do you move them to other zones on your map? What really changed for the users of your IT landscape?

If you concentrate on your users, you might be able to come up with properties that really affect them. Look at this example that takes our initial example and separates it into three zones:

And now, we find a user-oriented name for each zone. In our example, we’ve grouped the entities by user role and are now able to label our zones:

This grouping has the added advantage that the target audience for each modification to the map can be identified nearly immediately. It makes it easier to anticipate the effects of outages or problems and to identify non-cohesive usage of the same tool/entity.

In our example, each box in the “Both” zone is essential to the functioning of the organization. But just because a specific service is used by both other groups doesn’t mean they have overlapping requirements. Maybe it is better for everybody involved to actually divide an entity into two separate boxes in the respective zones, even if both boxes are implemented with the exact same tool/technology at the moment.

Identifying the zones takes your map to the next level. You end up with fewer, but bigger boxes and their dependencies. It’s the same IT landscape, but with less detail. Now you can start your discovery process again.

Conclusion

Your IT landscape map can be interpreted by looking for common structures (like loners, rings and chains) and by defining zones. This allows us to gather a list of problem points that we want to improve. It also allows to evaluate the expectable ramifications of changes to entities in our IT landscape. And there will be changes. The one (and probably only) constant in IT is that all things change.

In the next part of this series, we look at ways to transform the map from the current state towards a better one. Stay tuned!

The IT architect, Part I: Map your assets

When I’m tasked with commenting on a software architecture, my first step is to request or draw a map of all distinguishable elements of the software system and give them relationships to each other. This inevitably results in a boxes-and-arrows type of diagram that serves as a base for all future communication about the subject. Having a shared representation about a system is a great way to pinpoint discussions and focus on a particular area without forgetting the rest completely.

When I’m tasked with commenting on IT infrastructure, my first step is to request or draw a map of all distinguishable elements of the IT architecture (or “IT landscape”, a term that I actually prefer because it conveys better that a lot of things on this scale happen unplanned) and give them relationships to each other. Once again, we are drawing boxes and connecting them with arrows.

Being able to rely on this map is an essential base for all communication about IT architecture. And if you know how to read the map, it directs your efforts of consolidating your IT architecture nearly intuitively.

In this blog entry, we talk about drawing the map. The second part goes into interpretation of the map, the third part emphasizes actionable steps based on the map and our interpretation of it. Based on questions and discussion, there might even be a fourth part, but that’s not planned yet.

Your initial boxes

Beginning the IT architecture map is easy: Draw a box and give it a name. The name should correspond to an element of your work environment that is distinguishable from other elements. Note how I don’t say “system” or “service” or “server”. For an IT architecture, these words describe an implementation, a particular manifestation of the architecture. They don’t belong on the map (or this map). If you cannot see the difference yet, think about the floor plan of a house. It doesn’t tell you about the material the house is made of and you can use the same floor plan for a wooden cabin or a marble mansion (barring some pesky statics limitations that I don’t have a clue of). In our IT architecture map, each box represents a “thing” that will at the latest get a name the minute it stops working.

There will be boxes in your IT architecture map that don’t relate to anything else. That’s fine and not a problem, as long as the box relates to humans. If you cannot find a meaningful relationship between the box and humans or other boxes, you’ve found a relict. This is in fact one of the hardest tasks in IT architecture analysis, so congratulations!

Adding relationships

Every other box interacts with its environment in some manner. Again, the concrete implementation of that interaction is not important for our map. For our current view on the landscape, it makes no difference if a software system uses HTTP calls to a server or a computer tranfers bytes over RS232 wire to an appliance box. The fact that one box relies on the availability of another box is all that matters. That’s the essence of our arrows: Box 1 requires box 2 to be “online” in order to perform its duties. Without box 2, the functionality offered by box 1 will be limited, down to a point where it is no longer useful to others. Our arrows denote dependencies between boxes. If you happen to be a software developer: we don’t talk about code dependencies here. Also, even if closely related, we don’t mean format or protocol dependencies. We just state that if box 2 “goes down”, box 1 will follow closely.

This is the base for a rule of thumb about dependency arrows: Don’t draw them bidirectionally. Each arrow has one clear direction (like box 1 –> box 2). If you find that box 2 also depends on box 1, you should draw two arrows in opposite directions. As a preview for the interpretation step: This dependency cycle is a sore spot in your current architecture. It means that your two boxes appear as one to the outside. It means that you cannot replace one part without the other. The replaceability of single boxes is an important aspect of your landscape’s health.

Making it readable

When you’ve placed your boxes and drawn the arrows, it’s time to improve on the map’s layout. A guideline for the layout is that arrows shouldn’t intersect each other. Another guideline is that boxes that are semantically related should be near each other on the map. These two requirements alone often result in a lot of movement and experiments. You might want to use a software that allows for these experiments without much effort.

You’ll recognize a fitting layout when you see it. The map corresponds to your internal landscape representation enough to be useful in discussions. It might look like this real example:

First thing you’ll notice it that the names are replaced by denotations with zero meaning. In a real map, the box “C” might be named “time tracking” and box “D” could be labelled “issue tracking”. The name should indicate the responsibility of the element/box. You can also add the current implementation of that responsibility, if that makes things clearer. In our example, box “D”, indicating “issue tracking” might have “(JIRA)” added to the description. Just be aware that your organization probably needs another issue tracking system in that place even if JIRA falls out of favour. Following your arrows backwards, you’ll know which other elements of your landscape will be affected by this replacement. More on that in the next part about interpreting the map.

Evolving the map

Another thing you probably scoffed at is the intersecting arrows in the example. The map’s author came up with this layout as the best representation when the map had fewer boxes. With each subsequently added box, another arrow or two tried to reach the “center”. The intersections are a direct consequence of the emergence of a “center”. This is an important finding of your map: Being able to identify your map’s center and deduce meaning from it. To spoiler a bit: If your center is “time tracking” and “issue tracking”, you probably charge money per hour to solve other people’s problems.

Conclusion

You’ve probably seen how drawing an IT landscape map can benefit your organization and your discussions about its present and future. One thing you should keep in mind is that the map should reflect the current state and not your desired state of your organization’s IT architecture. That’s what will be addressed in part 3 of this series. Stay tuned!

Want to read more? Head over to part II of this series.