The Brittle Web Why Everything Broke (and Why It Didn’t Have To)

October 25, 2025

💥 The Brittle Web: Why Everything Broke (and Why It Didn’t Have To)

Every time the internet hiccups, headlines make it sound like the web itself is collapsing.
But here’s the uncomfortable truth — the web isn’t brittle by design.
It’s brittle because we cut corners, over-optimized for cost, and treated resilience as optional.

🧨 Why Everything Went Down

When us-east-1 sneezes, half the internet catches a cold.
It’s not entirely AWS’s fault — it’s ours.

Too many organizations put everything in a single region to “save money.”
Few thought about fallback regions, cross-region replication, or disaster recovery drills.
And when major platforms like Confluence or GitHub went dark, some companies couldn’t even access their own recovery plans — because the tools holding those plans were part of the outage.

That’s not resilience. That’s wishful thinking wrapped in a cost-optimization spreadsheet.

🧭 What Real Resilience Looks Like

True resilience doesn’t mean your system never fails — it means it recovers gracefully when it does.

Fallback Regions: You can fail over automatically or quickly with minimal disruption.
Disaster Recovery Drills: You test the plan, not just document it.
Multi-Regional Deployments: You accept that “high availability” in one region isn’t high availability at all.

Let’s touch on multi-cloud for a second.
My take? It’s usually overkill.
Running workloads across clouds sounds great on a slide deck, but in practice, it’s complex, expensive, and doubles your operational burden.
If you build truly resilient, multi-region systems in a single cloud, you get 99% of the benefit without 200% of the cost.

💭 The Questions Worth Asking

Before you throw money at redundancy, stop and ask yourself:

Is it really that big of a deal if this service goes down?
Measure “care” in business terms — cost, forgone revenue, or SLA impact.
Is there ROI in going multi-region or multi-cloud?
Not everything needs global failover. Be intentional.
Do you have the engineering maturity to maintain it?
Multi-region isn’t a checkbox. It’s a practice.
Would you even know something broke?
Solid alerting and observability are your best insurance policies.

💡 My Suggestions

If you want to build systems that bend without breaking:

Be redundant where it makes sense — guided by cost and business value, not fear.
Keep your DR plans with your code — not in a tool that might go down with you.
Have regular “what if” conversations — ask, “Do we care if this crashes?” and answer honestly.

Resilience isn’t a budget line item. It’s a mindset.

⚙️ The Takeaway

The web isn’t fragile. It’s just been engineered like it’s disposable.
If we start treating reliability as a feature — tested, maintained, and budgeted for — outages won’t feel like doomsday.

So next time us-east-1 goes dark, ask yourself:

Do we even blink?

🧩 Engineer in the Loop is where I share my thoughts on AI, cloud, and engineering culture — for developers who build with models (and resilience) in the mix.