Over the past year, we have been moving towards a new, broader architecture for our website and application hosting. Previously, we had at most four application servers that each hosted several websites and databases. This model has become unsustainable and does not support our principles of a distributed architecture and high performance. We've since been moving to a new model consisting of separate server instances for each website, database or client, depending on the needs of each project. We've chosen to launch these server instances on seperate providers, again supporting our distributed architecture goals. However, this means that we have many more systems and servers to be monitoring on a daily basis, which becomes a full-time task.
We've been evaluating several solutions for a more effective centralized and automated monitoring solution. There's many third-party options available for centralized log management and server monitoring, but each comes with its own costs and limitations. While many of these were very attractive, none of them offered the right balance of features and flexibility that we wanted. As we looked closely at the required pieces, we realized that our software platform already had the building blocks needed to create our own monitoring system that would remove all limitations imposed by the third-party solutions (eg. rate of monitoring), and give us a tremendously flexible way to set up the monitoring.
Introducing Mesh Monitoring
For a while, we were trying to think of a fun and unique name for a new application that would be the basis of our monitoring system, but then it occurred to us that we already had the perfect application for this – and it's called "System". This application currently looks after the event handling and background task processing for all websites and databases. We began designing the data model needed for our monitoring functionality, and it came together quickly.
Within the course of about 6 hours of work, we had a fully-working "System Monitoring" feature built into the System application. Since System is already running as part of every site and database that we have online, this means that each of them can monitor each other – mesh monitoring!
Mesh monitoring is the concept that a collection of disparate systems can monitor each other, and that's exactly what our solution provides. Rather than one external centralized system monitoring everything, we can set up cross-system monitoring. Of course, we can also set up a single, centralized monitor, as well. This is the beauty of our solution... it provides us with the flexibility to set up monitoring in whatever way best suits our needs, rather than adapting our business to the design of a third-party monitoring solution.
Now we can sleep at night knowing that our growing network of servers is being monitored on a regular basis. If any one of these servers goes offline, another one can notify us. This is the same resiliency that the internet itself is built on.