An important aspect for systems, which are backed by a database storage, is to maintain data integrity. Most relational databases offer the possibility to define constraints in order to maintain data integrity, usually referential integrity and entity integrity. Typical constraints are foreign key constraints, not-null constraints, unique constraints and primary key constraints.
SQL also provides the CHECK constraint, which allows you to specify a condition on each row in a table:
ALTER TABLE table_name ADD CONSTRAINT constraint_name CHECK ( predicate )
CHECK (AGE >= 18)
However, these check constraints are limited. They can’t be defined on views, they can’t refer to columns in other tables and they can’t include subqueries.
In order to monitor data integrity on a higher level that is closer to the business rules of the domain, we have deployed a technique that we call health checks in some of our applications.
These health checks are database queries, which check that certain constraints are met in accordance with the business rules. The queries are usually designed to return an empty result set on success and to return the faulty data records otherwise.
The health checks are run periodically. For example, we use a Jenkins job to trigger the health checks of one of our web applications every couple of hours. In this case we don’t directly query the database, but the application does and returns the success or failure states of the health checks in the response of a HTTP GET request.
This way we can detect problems in the stored data in a timely manner and take countermeasures. Of course, if the application is bug free these health checks should never fail, and in fact they rarely do. We mostly use the health checks as an addition to regression tests after a bug fix, to ensure and monitor that the unwanted state in the data will never happen again in the future.
One thought on “Monitoring data integrity with health checks”
I’m glad I found your article! It seems that the current monitoring literature mostly focuses on capturing errors that happen at runtime (http codes, application logs). Have you found any other resources talking about business-data-based monitoring? Cheers!