A single file doubling in size caused the recent Cloudflare blackout

Ali Wadi Hasan

Earlier this week, Cloudflare experienced one of its worst outages to date. Many users, along with the company’s own engineers initially thought it was a cyberattack, specifically a large-scale DDoS. Cloudflare later explained that the issue stemmed from an internal server update that caused a critical file to malfunction. This is the second major outage in a few weeks, highlighting the growing issue of increasing internet centralization.

Cloudflare CEO Matthew Prince later issued a detailed apology and explanation after the incident that disrupted access to many popular platforms. The blackout prevented access to numerous services, including Uber, ChatGPT, McDonald’s, League of Legends, X, and more. MENA TECH was also impacted by the outage, being unavailable for hours.

Because Cloudflare provides protection against DDoS attacks and other security threats for many other sites, the company’s first assumption was that it was dealing with a major cyber incident when servers began failing around 15:20 UAE time on Monday afternoon. The pattern of the outages also contributed to that belief, as services went down and came back up over a two-hour span before failing consistently around 17:00.

Cloudflare eventually identified that the main cause was an incorrect change to a database permission. The mistake led to a single file doubling in size, exceeding the set limit. Since the file was crucial to Cloudflare’s bot manager, things deteriorated quickly.

The outages appeared sporadically at first because it took time for the problematic file to propagate across Cloudflare’s network. Later, engineers resolved the issue by reverting to an earlier version of the file at 20:30, and all services were restored by midnight.

Prince called the event the company’s most significant failure since a major 2019 outage and pledged to review the systems involved to prevent a recurrence. Still, the incident is the latest example of how a small error can cause widespread disruption.

In late October, a malfunction in a single database server caused a major Amazon Web Services outage, taking ChatGPT, Fortnite, Reddit, Amazon, and other high-profile platforms offline. One of the most severe examples occurred in July 2024, when a faulty CrowdStrike security update triggered widespread Blue Screen of Death errors across Windows systems globally, impacting broadcasters, transportation providers, and numerous businesses.

THE BRIEF - Curated regional news every Monday
MENA TECH’s weekly newsletter keeps you updated on all major tech and business news across the region and around the world.
By subscribing, you confirm you are 18+ years old, will receive newsletter and promotional content, and agree to our terms of use and privacy policy. You may unsubscribe at any time.
Read More
MENA TECH – The leading Arabic-language media platform for technology and business
MENA TECH – The leading Arabic-language media platform for technology and business
Copyright © 2025 MenaTech. All rights reserved.