Data CentersEnterprise News

A single point of failure triggered the recent AWS outage, company confirms

Data CentersEnterprise News

Editors Team

Business
3 November, 2025

Two weeks earlier, an outage at Amazon Web Services (AWS) disrupted dozens of major online platforms worldwide, temporarily taking down services like Amazon, Alexa, Ring, Snapchat, Reddit, Fortnite, ChatGPT, and the Epic Games Store. The incident lasted about two-thirds of a day, spreading globally before AWS restored operations after addressing what it called a series of “cascading” system failures.

According to Amazon, the disruption started at 11:48 PM PDT on October 19 (10:48 AM on October 20 in the UAE), when users started experiencing widespread latency and connection errors in the US-EAST-1 region (Northern Virginia). The company identified the problem as an issue with DNS resolution on the “DynamoDB API” endpoint.

Engineers repaired the failure in about 11 hours, but it took roughly 16 hours for all impacted services to fully recover, causing outages across different time zones and countries.

This type of outage is not unusual among large cloud providers. However, its scope and reach raised many tough questions about how a disruption at this level could still happen in a supposedly distributed network.

According to Ars Technica, the outage was caused by a software bug in a DNS management component used by DynamoDB, one of AWS’s main database systems. The flaw caused a race condition—a timing conflict between overlapping processes—that allowed an outdated DNS plan to overwrite a newer one, erasing IP addresses for a critical regional endpoint.

That single error caused DynamoDB to stop, which then led to many dependent services malfunctioning as well. This resulted in widespread disruptions across services that relied on it for data access, authentication, or routing.

Data from Ookla’s Downdetector revealed the scale of the disruption. The platform documented over 17 million user reports from 60 countries. The most affected included Snapchat, Roblox, Reddit, and Amazon’s retail and Ring services. However, the outage also disrupted other services, such as government websites, financial institutions, and educational tools, which faced outages due to the same regional failure.

The US-EAST-1 is Amazon’s oldest and most heavily used data center region, serving as the backbone of AWS’s cloud infrastructure. While the region’s name might be associated with a specific part of the United States, many global services still depend on it. This clearly introduces a weak point for the internet, which was originally envisioned as decentralized.

After fixing the original error, Amazon quickly addressed the underlying vulnerability. The company soon suspended DynamoDB DNS automation globally and outlined its plans to redesign the flawed system to prevent similar race conditions.

Although limited in timeframe and impact, this outage serves as a stark reminder of the importance of diversifying infrastructure. AWS’s data center infrastructure is among the most geographically distributed in the world. However, the outage revealed that physical diversity alone is insufficient; software redundancies are essential, as exemplified by the 2024 CrowdStrike outage, which shook the world and was caused by a single bad update to security software.

This event further reinforces the need for sovereign clouds, which have become an essential part of the recent digital transformation efforts in the UAE and Saudi Arabia. Regional governments have long championed the idea of the sovereign cloud, and the outage underscored the importance of that initiative, as critical services in both countries experienced little to no disruption from the blackout, thanks to most of them being independently hosted and operated from local servers.

When the precursors of the internet began in the 1970s, nearly all connected devices were located in a few research centers and universities. Despite its small size, the global network was relatively diverse at that time. Contrary to what you might expect, the growth of users didn’t increase the diversity of the internet; instead, it had the opposite effect, prompting industry consolidation. Soon, local servers hosted by individuals or companies were replaced with more centralized solutions. Today, this centralization calls for a reconsideration, both by sovereign governments and service providers.

Since online services are no longer considered a luxury, especially in the business sector, disruptions should be regarded as seriously as issues with electricity or other critical infrastructure. Therefore, experts recommend that providers adopt multi-region architectures, diversify their dependencies, and conduct realistic disaster simulations to better prevent cascading failures. Reports indicate that AWS is already implementing these strategies, but only time will reveal how effective they will be.

THE BRIEF - Curated regional news every Monday

MENA TECH’s weekly newsletter keeps you updated on all major tech and business news.

By subscribing, you confirm you are 18+ years old, will receive newsletter and promotional content, and agree to our terms of use and privacy policy. You may unsubscribe at any time.

A single point of failure triggered the recent AWS outage, company confirms

Editors Team

Solutions Beyond the Basics: Anker’s Philosophy in Product Innovation | An Interview with Anker

Born to Play Philosophy: When Driving Becomes Pure Fun | An Interview with iCAUR

Ongoing push for leadership in smart home appliances: Dreame’s Cici Cheng shares lessons from scaling a global brand

Ongoing push for leadership in smart home appliances: Dreame’s Cici Cheng shares lessons from scaling a global brand

AI reshapes personal computers as data centers dominate Consumer Electronics Show 2026

RAM and wafer shortage to persist into 2030 – SK Group boss ...

Reports: Meta might soon lay off 20% of its workforce

“Wasn’t built right”: Musk vows to rebuild...

China is cracking down on open-source AI agents

UAE’s Kerno achieves Windows Server certification for its OK...

AI found +100 bugs in Firefox within a week, says Mozilla

Seagate unveils 44TB hard drive for AI-scale data centers

BYD showcases a system that charge EVs within 9 minutes

Middle Eastern wealth funds to participate in Japanese PayPa...

Intel unveils its long-awaited first 18A chip for data cente...

Google cuts Play Store commission to 20%, lowers subscriptio...

Nvidia says its huge investments in OpenAI and Anthropic wer...

New Tab

New Tab

A single point of failure triggered the recent AWS outage, company confirms

Editors Team

Solutions Beyond the Basics: Anker’s Philosophy in Product Innovation | An Interview with Anker

Born to Play Philosophy: When Driving Becomes Pure Fun | An Interview with iCAUR

Ongoing push for leadership in smart home appliances: Dreame’s Cici Cheng shares lessons from scaling a global brand

Ongoing push for leadership in smart home appliances: Dreame’s Cici Cheng shares lessons from scaling a global brand

AI reshapes personal computers as data centers dominate Consumer Electronics Show 2026

RAM and wafer shortage to persist into 2030 – SK Group boss ...

Reports: Meta might soon lay off 20% of its workforce

“Wasn’t built right”: Musk vows to rebuild...

China is cracking down on open-source AI agents

UAE’s Kerno achieves Windows Server certification for its OK...

AI found +100 bugs in Firefox within a week, says Mozilla

Seagate unveils 44TB hard drive for AI-scale data centers

BYD showcases a system that charge EVs within 9 minutes

Middle Eastern wealth funds to participate in Japanese PayPa...

Intel unveils its long-awaited first 18A chip for data cente...

Google cuts Play Store commission to 20%, lowers subscriptio...

Nvidia says its huge investments in OpenAI and Anthropic wer...

Reports: Meta might soon lay off 20% of its workforce

“Wasn’t built right”: Musk vows to rebuild xAI amid cofounder exodus

UAE’s Kerno achieves Windows Server certification for its OKAM Platform

AI found +100 bugs in Firefox within a week, says Mozilla

Microsoft boss: Xbox is not being killed, the company still “committed to gaming”

Seagate unveils 44TB hard drive for AI-scale data centers

Intel unveils its long-awaited first 18A chip for data centers

Nvidia says its huge investments in OpenAI and Anthropic were the last

Developing cybersecurity skills and academic education: Kaspersky collaborates with Qassim University

Nvidia to sell Meta $50 billion worth of chips in a multi-year deal

Online fraud surges during the 2026 Winter Olympics

Consulting firm fines a partner for using AI to cheat on a test about AI

UAE’s Kerno achieves Windows Server certification for its OKAM Platform

Seagate unveils 44TB hard drive for AI-scale data centers

AI reshapes personal computers as data centers dominate Consumer Electronics Show 2026

STC, Humain plan gigawatt AI data centres in Saudi

Making data move faster with S3 Vectors, AWS’s Andy Warfield on RAG and AI agents

UAE’s Khazna enters Saudi market with first 200MW hyperscale site

Nvidia’s boss likens the AI race to the cold war and the Manhattan project

To bring cloud-grade AI infrastructure on-premises AWS launches “AI Factories”

Google boss confirms plans for orbital data centers by 2027

AWS Transform expands to custom environments, with agentic AI modernizing code

Power, not chips, is the biggest obstacle for AI, says Microsoft chief

The world needs $4 trillion in annual data center and energy investments, says ADNOC’s CEO

Reports: Meta might soon lay off 20% of its workforce

“Wasn’t built right”: Musk vows to rebuild xAI amid cofounder exodus

AI found +100 bugs in Firefox within a week, says Mozilla

Microsoft boss: Xbox is not being killed, the company still “committed to gaming”

Intel unveils its long-awaited first 18A chip for data centers

Nvidia says its huge investments in OpenAI and Anthropic were the last

Developing cybersecurity skills and academic education: Kaspersky collaborates with Qassim University

Nvidia to sell Meta $50 billion worth of chips in a multi-year deal

Online fraud surges during the 2026 Winter Olympics

Consulting firm fines a partner for using AI to cheat on a test about AI

Our top engineers haven’t written code in months, says Spotify’s co-CEO

Anthropic raises a further $30 billion of investments, MGX among the backers