We are developing and running a distributed system which is deployed on-site at our client. Everything was running smoothly for years only some minor hick-ups related to network infrastructure problems occurred over time. Then one day our client told us the scheduled database backups were not working anymore. We immediately checked the database, all installed firewall programs and the like on that Windows 7 server machine. The Postgresql database was running and our local and remote application components were able to connect. Strangely though, neither pgAdmin nor psql or even telnet were able to make a connection locally to the database!! Adding more oddity we did not change or update any part of the system at the time things stopped working. Remote access to the database was working though leaving us even more confused. To sum up the situation:
- Some applications can connect to the database locally, others cannot
- Remote access to the database works without problems for all applications, even those that cannot connect locally on the server
- We did not change any of these applications, neither client side nor server side
- All firewalls were disabled and the problem persisted over reboots
The explanation
So we talked to our client again and depicted our complete analysis pinning the date of the breakage to a moment when we evidently did not change anything. Suddenly it struck him like lightning when he remembered that there was an automatic update of an anti-virus program. He removed the software from the machine and everything worked again as expected. Even reinstalling the anti-virus program did not break the system again. It was only this misbehaving automatic update somewhere in time that killed some part of our system in a most odd way…