A DNS outage impacts Microsoft Azure and Microsoft 365 services
While it doesn't appear that the outage will last long, reports suggest that the incident's impact is widespread.
An ongoing DNS outage affected Microsoft customers worldwide, preventing them from accessing Microsoft Azure and Microsoft 365 services on October 29.
"Starting at approximately 16:00 UTC, we began experiencing DNS issues resulting in availability degradation of some services. Customers may experience issues accessing the Azure Portal. We have taken action that is expected to address the portal access issues here shortly," according to the Microsoft Azure status page.
On its service health status blog, it said that "users may be unable to access the Microsoft 365 admin center and see delays when accessing other Microsoft 365 services", including Microsoft Entra, Microsoft Purview, Microsoft Defender, Microsoft Power Apps and Microsoft Intune functions, as well as issues with add-ins and network connectivity in Outlook.
The company says it is investigating the issues causing the disruption, and further notes that it has "halted the rollout of the impacting configuration change", and is "continuing our efforts to route service traffic away from the affected infrastructure, where the change was already applied, to recover service availability as quickly as possible."
But while it doesn't appear that the outage will last long, reports suggest that the incident's impact is widespread, affecting Microsoft customers worldwide, including the Dutch railway system, which is reportedly experiencing issues with its online travel planning platforms and ticket machines.
What caused the problem and how did Microsoft fix the issue?
The root cause was a simple, accidental configuration change inside the Azure Front Door system, which helps deliver internet traffic quickly and securely. The system made a large number of the network's key points, called "nodes," fail or load incorrectly, according to an update on its Azure status page.
Since so many points failed, the remaining healthy ones became overloaded with too much traffic, which made the problems, like slow speeds and errors, even worse globally. To fix it, Microsoft stopped all further changes, restored a previous safe setup, and slowly brought systems back online to avoid crashes.
The total outage of Microsoft's cloud network lasted for about 8 and a half hours, impacting services like Azure Portal, parts of Microsoft Entra ID, Azure SQL Database, and many others.
Last week, Amazon’s cloud service suffered a 15-hour outage which impacted 11 million users globally and took scores of websites and apps offline, showing just how fragile modern digital infrastructure really is.