Once you get alerted by our platform that one of your Uptime Monitors is down, you should quickly begin to investigate what the issue is that’s causing your downtime.
1. Check the Location Fail Log
A very good place to start would be to look in your monitor’s Location Fail Log:
The Location Fail Log will contain the exact errors that our monitoring locations have encountered when checking up on your website or server. Ie: if it’s a PING Uptime Monitor the Location Fail Log will contain failed PING samples taken during the checkup of your server; or if it’s a Website Uptime Monitor the Location Fail Log will contain the exact errors that have been encountered when trying to reach your website, such as a bad HTTP code or a timeout.
2. Check the Network Diagnostics
If you want to make sure whether it’s a network related outage, you can also check up on the Network Diagnostics:
These diagnostics will contain network debugging samples, such as MTR and PING towards your monitored target. These samples are taken right after your Uptime Monitor is declared as being offline by our platform. If the downtime is indeed network related, you should be able to easily spot where the issues lay from the MTR samples.
It’s also worth noting the fact that the Network Diagnostics will be collected from all the monitored locations, that you’ve checked for this Uptime Monitor, which means that if the network issue is isolated to just some parts of the world (ie: issues with a transit provider) you should be able to notice that as well.
3. Take the appropriate action
Whether it’s a network related downtime or a server related downtime you should act fast in order to minimize the downtime for your website or server, or for your clients.
Here’s some possible scenarios that you should consider:
- if it’s a server related downtime you could open a support ticket with your datacenter or hosting provider, so they can look into the issue.
- if it’s just a website that’s down, but the server is responsive you should look into possible slow scripts, slow plugins, slow themes, unindexed/unoptimized database tables, etc; you should also check for high server usage, and even monitor the server resource usage using our Server Monitor: https://docs.hetrixtools.com/category/server-monitor/
- if it’s a network related outage, you should contact your upstream provider and provide them with the MTR samples you’ve collected from the Network Diagnostics; these will come in handy when trying to debug network issues.
4. Preparing for future downtimes
Once you’ve handled the current outage, you should definitely prepare for any upcoming outages. It may sound harsh, but outages are a part of the hosting industry, and it really matters how well you are prepared to handle them when they arise. Being prepared and acting fast will help you minimize future downtimes, and possibly save you money and reputation towards your clients. You should learn from this outage what you can improve in the future.
For instance, if you have a website running on a VPS or Dedicated Server, it would be a very good idea to monitor the both the website uptime and the resource usage of the VPS/Server it resides on. Why? Because, for instance, there could be cases where the website may load within the 10-15 seconds timeout frame, but the server may be experiencing some heavy CPU usage or heavy network usage, that you may want to be aware of.
Another example of being better prepared would be to use multiple notification methods; don’t use just one (ie: email). Our platform can notify you or your staff via a large number of services: https://docs.hetrixtools.com/category/contact-lists/
We advise that you use at least 2-3 in a contact list, to be sure that the important outage notifications will really reach you or your colleagues when it matters the most.