[July 1, 2018]
On 30 June 2018, our uptime monitoring system has sent out an unusually higher number of alerts due to multiple DNS failures, around the world, from the Google and CloudFlare DNS servers.
The issues began at 10:56AM (EST), when most of our 12 uptime monitoring locations have started failing to resolve hostnames that were under CloudFlare nameservers. The root cause seems to have been the fact that CloudFlare nameservers were randomly not resolving queries coming from Google’s DNS servers.
The ongoing issue seemed to have stopped at 11:32AM (EST) but restarted back again at 12:01PM (EST) and kept going on intensively until 01:48PM (EST) at which point the issues became more isolated (rather than global) to just certain locations such as Singapore, San Francisco and London.
CloudFlare acknowledged the issue on their status page and had started to investigate it a few good hours after it had began, at 06:10PM (EST) and has declared it solved at 10:09PM (EST) even though we were still seeing some DNS failures from London and Singapore at that time:
These multiple DNS failures for mostly all of our monitoring locations have caused alerts being sent out for some uptime monitors that are using CloudFlare nameservers, check up on these errors in your Location Fail Log, they will appear as “Error 28: Resolving timed out after X milliseconds”.
Unfortunately, compared to the last similar incident from 2 months ago which was isolated to just 2 of our locations, today’s incident has affected most of our monitoring locations, resulting in a much larger number of affected monitors/users.
During these DNS issues, our uptime monitoring system has registered 300% more outages than normally expected over this time of the day.
We estimate that around 10% of our current active uptime monitors have been affected by alerts that were a direct result of these DNS issues.
This issue has only affected the uptime monitors with domains that are using CloudFlare’s nameservers. No other monitors have been affected.
[Update 01 July 2018]Another short wave of CloudFlare DNS failures just hit, starting at 09:39PM (EST) until 10:09PM (EST). Same as described above, if your monitored website/hostname is using CloudFlare nameservers you might have received downtime notifications about this.
And an update on the matter from the CloudFlare CEO:
A few tweets describing the incident from a user’s perspective: