Reliability & TCO

Rate limiting that won't take your API down

Set your sustained load and how hard it spikes. Rate limiting has to hold at peak — it's the thing meant to catch the spike — so we size the Redis fleet you'd run for that peak, the engineer who babysits it, and the rewrite it needs past ~50,000 rps. Then show what Ratelimitly does instead: absorb spikes to 1M rps at sub-millisecond p99, isolated so it can't take your service down, with nothing to integrate.

Requests per month 1B

1M 100M 10B 1T

≈ 385 rps sustained · 3,850 rps at peak Max spike (×) 10× → 3,850 rps at peak Regions Engineer time on DIY (FTE) 0.5 FTE babysitting Redis Cloud

Advanced assumptions

The point

DIY rate limiting shares fate with your stack. A bad Redis deploy, a noisy neighbor, an OOM — and either the limiter fails (overload storms in) or your API stalls waiting on it.

Ratelimitly runs outside that blast radius: isolated, in-kernel, answered over UDP. It can't take your service down.

And it absorbs spikes: a Redis fleet sized for your average falls over exactly when the burst hits. The per-request price is a rounding error next to surviving that.

And at 385 rps it's $3,324/mo cheaper than DIY.

Same workload, three ways to run it (lower bar = cheaper):

DIY Redis / ValKey

$8,301/mo

2 nodes, provisioned for 3,850 rps peak: $1,800
Cross-AZ bandwidth (149 GB/mo): $1
Engineer time: $6,500

⚠ A TCP round-trip per check — by Little's Law in-flight = λ·W, so as load nears capacity W (and p99) blows up.
⚠ Share it with another feature and one failure cascades into the other — teams end up paying for dedicated instances.

Ratelimitly

$4,976/mo Pro · HA (2 instances) included

✓ Handles 385 rps today — up to 1M rps per key with no change. A single key is serialized, so beyond 1M you partition the keyspace across queues/instances (each key ≤ 1M) — linear scale, no rewrite.
✓ Sub-millisecond p99 over UDP, in the kernel via XDP — even at 1M rps.
✓ Isolated. It can't take your API down, and your API can't take it down.
✓ Drops in at the reverse proxy (nginx & friends) — often zero app code. Retrofit rate limiting onto anything, even closed source.
✓ HA included. Every plan runs 2 instances — high availability is built into the price, not a paid add-on.

Ratelimitly On-prem

$1,966/mo Starter · up to 100k rps

2 servers (2/region · your hardware): $1,800
License (Starter, ≤ 100k rps): $166

✓ On your hardware — no egress, lowest latency, traffic never leaves your network.
✓ No on-call, no rewrite — same XDP engine we run, supported under the license.
✓ Rate-capped from $2,000/server/yr — bump the cap with a license key, up to 64M rps/box.

License + your servers · air-gap friendly. Talk to us

Never interrupts your service

Isolated by design. A shared Redis can fail and take rate limiting with it — or the reverse. Ratelimitly runs outside your stack's blast radius — with HA (2 instances) included on every plan.

Scales to 1M rps per key

Redis tops out near 50k rps (then a C rewrite); past ~500k you're into accelerated-networking R&D — kernel bypass, specialized knowledge, a year-plus of work. We did that with XDP: a million rps on one key, day one — partition the keyspace to go beyond.

Sub-millisecond p99

~1 µs of in-kernel service over UDP keeps p99 under a millisecond even at full tilt (Little's Law: tiny W ⇒ tiny in-flight). Redis adds a TCP round-trip to every check.

No developer required

Deploy at the reverse proxy and limit traffic to systems that don't support it — including closed-source products you can't change.

Correct by design

No read-modify-write races, no lost counters, no double-spend — the failure modes a homemade Redis limiter is full of.

Built for spikes

Absorbs bursts to 1M rps with nothing to pre-provision. A Redis fleet sized for your average falls over when the spike hits — and the spike is when rate limiting matters most.

Method — the laws behind the numbers

Little's Law (L = λ·W). A serialized resource's max throughput is 1 ÷ service time. A Redis check ≈ 20 µs ⇒ ~50,000 rps/key; our in-kernel XDP check ≈ 1 µs ⇒ 1M rps/key — the edge is just less service time (no syscall, TCP, or copy). In-flight N = λ·W, so sub-ms W also means far less concurrency to hold the same rate.

Universal Scalability Law X(N) = N ÷ (1 + α(N−1) + βN(N−1)). One exact counter is serialized (α→1) and a distributed one adds coherency (β>0) ⇒ X(N) ≤ 1 and retrograde: more Redis nodes cannot raise a single key's throughput. Independent keys have α,β ≈ 0 ⇒ X(N) ≈ N, so aggregate scales by partitioning the keyspace — one pipeline per NIC queue for us.

Two physical walls. ~50,000 rps/key (serialization) and ~500,000 rps, where a conventional kernel network stack saturates on packet rate and needs kernel bypass (XDP/DPDK).

Per-key ceiling from Little's Law: 1e6 ÷ 20 µs Redis service ≈ 50,000 rps; our XDP path ≈ 1 µs ⇒ 1M. DIY = (1 region × (1 primary + 1 replica)) at $1/hr, cross-AZ egress at $0/GB on a Redis payload ~5× our compact binary, 0.5 engineer FTE at $13,000/mo, plus a 4-month C rewrite (amortized) once a region's peak (10× sustained) crosses the per-key ceiling — or ~12 months of accelerated networking past ~500,000 rps. Ratelimitly priced from the public plans, HA (2 instances) included; DIY pays for its replicas separately. Estimates for comparison, not a quote.