Back to all questions

Why Does Availability Alone Fail CDN Routing Decisions?

Shana Vernon
CDN Routing
February 18, 2026

CDN availability is a binary check, but CDN routing is a quality decision. If you steer traffic based on “is this PoP up?” you only prove it can answer a health probe. You do not prove it can serve a real user fast and reliably right now. 

A PoP can be “available” while overloaded, stuck behind a bad ISP path, dropping packets, or painfully slow on cache misses. Users experience that as “your site is broken,” even if your monitoring says everything is green.

The fix is to treat availability as a safety gate, then route using CDN performance signals like network latency, error rate, packet loss, and capacity.

Why Availability Alone Fails CDN Routing Decisions

Availability answers a narrow question: can the edge respond at all? Most checks are ping, TCP, TLS, or a synthetic HTTP fetch. That is useful, but it compresses reality into up or down.

Your users don’t live in that world. They live in “fast enough” or “I’m refreshing the page.”

Availability-based decisions miss problems that are common and user-visible:

  • Overload and queueing: the edge responds, but slowly.
  • Partial degradation: some nodes or paths fail, so issues look random.
  • Upstream pain: cache misses fetch from a mid-tier or origin route that is struggling.
  • ISP-specific routing: one network gets a clean path, another gets a detour.

I think of it like this: availability tells you the shop is open. It does not tell you whether there is a two-hour line.

CDN Architecture Makes “Up” An Incomplete Signal

A modern CDN architecture is a chain. Even if the front door works, the rest of the chain can be hurting.

A real request often goes:

  1. User connects to an edge PoP
  2. Edge checks cache
  3. Miss triggers a parent, mid-tier, or origin fetch
  4. Response is streamed back to the user

Many “healthy” checks mostly validate step 1, and sometimes a tiny cached object in step 2. Real traffic hits steps 3 and 4 constantly, especially for dynamic pages and long-tail assets.

So you can route to an “available” PoP where every cache miss is slow because the upstream path is congested or your origin is rate-limiting. The edge stays up, the experience still collapses.

Gray Failures Beat Availability Checks

Most incidents are not full outages. They are gray failures: the system is technically responding, but the experience is degraded for a meaningful slice of users.

Availability-only routing is blind here because nothing crosses the “down” threshold.

Common patterns:

  • Tail latency spikes where p50 looks fine but p95 is painful
  • Packet loss or jitter that causes stalls and retries
  • Peering issues that affect only certain ISPs or regions
  • Hot spots where one PoP is saturated and nearby PoPs are calm

This is why CDN performance has to be measured like a user feels it, not like a server answers a probe.

How it Works

If you only ask “is it up,” every row below can still be routed into.

Situation Edge “Available”? What Users Feel What Availability Misses
PoP overloaded at peak Yes Slow loads, timeouts Queueing, saturation
Bad ISP path to that PoP Yes Random stalls, high RTT Path quality per ISP
Packet loss on transit Yes Retries, buffering Loss and retransmits
Slow origin fetch on misses Yes High TTFB on some requests Upstream dependency health
Partial node degradation Yes Flaky, inconsistent errors Cluster variance

What Better CDN Routing Uses Instead

Keep CDN availability, but demote it to a guardrail. First exclude PoPs that are truly unhealthy, then rank the remaining ones by user-relevant signals.

The signals that usually matter most:

  • network latency (including p95, not just average)
  • connection setup time (TCP, TLS, QUIC)
  • edge error rate (timeouts, resets, 5xx)
  • packet loss and retransmits
  • cache hit ratio plus origin fetch latency on misses
  • PoP headroom (CPU, bandwidth, queue depth)

Where do you get those signals? Ideally from a mix of real user monitoring (what browsers and apps actually saw), targeted synthetic probes from multiple regions, and internal PoP telemetry. I’ll trust a “real users on ISP X are spiking” alert far more than a single green health check from one vantage point.

This is where CDN routing becomes personal: the “best” PoP depends on the user’s network, location, and current internet conditions.

Practical CDN Strategies That Actually Work

Here are CDN strategies that prevent “available but awful” routing without making your system fragile.

  1. Gate With Availability, Then Steer With Performance
    Availability removes broken targets. Performance chooses the best target. If you blend everything into one fuzzy score, you can accidentally send traffic to a barely-alive PoP.
  2. Optimize For The Tail, Not The Median
    Users complain about p95 and p99. Route using tail latency and real error rates, not just “it responded once.”
  3. Add Stability Controls
    Fast switching can cause flapping and herding. Use hysteresis, cooldowns, and gradual shifts so routing changes are calm and reversible.
  4. Respect Capacity And ISP Reality
    A PoP near saturation is a future incident, even if it is up. Also, do not assume one map fits all: ISP-aware steering avoids sending certain networks into consistently bad paths.

Do that, and availability stops being your decision-maker and becomes what it should be: the minimum bar, while routing focuses on delivering a good experience.

Outages Don’t Wait for Contracts to End
The Future of Delivery Is Multi-Edge
Switching CDNs Is Easy. Migrating Safely Isn’t.