Engineering Guide · Performance

Redis QR Code Redirects at Sub-50ms

A QR code is a promise. Someone points a phone at a poster, a bus stop, a product label — and they expect to land somewhere instantly. Every extra millisecond of redirect latency is a chance for them to second-guess and pocket the phone. Here's how the Qrius redirect pipeline is built to stay fast under load.

Why redirect speed actually matters

Mobile browsers are impatient. If a QR redirect takes more than a second, some phones will show a spinner — which is enough visual friction to cause pogo-sticking: the user closes the tab and goes back to whatever they were doing. That's a lost conversion that never shows up in your scan analytics because the destination page never loaded.

The redirect itself is a 302 with a Location header. There's no HTML, no JSON, no payload. The only thing standing between the scan and the destination page is your server's ability to look up the slug and return that header fast. Redis is the obvious lever here.

Two-layer architecture: hot cache + cold storage

The design is straightforward: Redis sits in front of Postgres. On every redirect request, we check Redis first. If the slug's there, we skip the database entirely — no query, no ORM overhead, no round-trip latency. A cache hit resolves in ~5ms. A miss falls through to Prisma and costs ~50ms.

For most QR codes, the traffic distribution is heavily skewed — a small number of active codes get the vast majority of scans. Redis's in-memory lookup handles that pattern well. The cold path (Prisma) exists for first-time hits and for codes that haven't been scanned recently enough to stay warm.

Redis (hot)
qr:slug:{slug} · ~5ms · TTL 300s
Postgres via Prisma (cold)
findBySlug() · ~50ms · source of truth

Cache key design and TTL choices

The primary cache key format is qr:slug:{slug} with a 300-second TTL. Five minutes. Not an hour, not a day. Here's the thinking: QR codes can be paused or have their destination updated at any time. If you cache for an hour and someone updates their campaign URL, the old URL keeps serving for up to 60 minutes. Five minutes keeps staleness tolerable while still handling thousands of scans per cache load.

There's also a legacy key format — redirect:{slug} — that gets a 1-hour TTL. It's kept for backward compatibility with older integrations that set entries in that format. The handler checks both keys, with qr:slug: taking priority. Eventually the legacy format will be deprecated.

// Cache key constants
const QR_CACHE_PREFIX = 'qr:slug:'
const QR_CACHE_TTL    = 300    // 5 minutes
const LEGACY_PREFIX   = 'redirect:'
const LEGACY_TTL      = 3600   // 1 hour (being phased out)

function cacheKey(slug: string): string {
  return `${QR_CACHE_PREFIX}${slug}`
}

Slug validation runs before any cache or DB lookup: the regex /^[a-z0-9-]{3,50}$/ rejects anything that isn't a valid slug. This blocks Redis injection attempts and keeps garbage out of the cache entirely.

The redirect handler, step by step

Here's the full flow. It's intentionally linear — each step either short-circuits with a response or passes through to the next. The important parts are the cache-then-DB pattern and the fire-and-forget logging at the end.

// Fastify route handler — simplified
fastify.get('/r/:slug', async (request, reply) => {
  const { slug } = request.params
  const start = Date.now()

  // 1. Validate slug format
  if (!/^[a-z0-9-]{3,50}$/.test(slug)) {
    return reply.code(404).send({ error: 'Not found' })
  }

  // 2. Check Redis cache
  let qrData = await cacheService.getCachedQRCode(slug)
  const cacheHit = qrData !== null

  // 3. Cache miss — query Postgres, warm the cache
  if (!qrData) {
    qrData = await qrService.findBySlug(slug)
    if (!qrData) return reply.code(404).send({ error: 'Not found' })

    // Fire-and-forget cache write — don't await
    cacheService.cacheQRCode(slug, qrData).catch(() => {})
  }

  // 4. Check status flags
  if (!qrData.isActive || qrData.isPaused) {
    return reply.code(404).send({ error: 'QR code inactive' })
  }

  const duration = Date.now() - start

  // 5. Dev headers
  reply.header('X-Cache', cacheHit ? 'HIT' : 'MISS')
  reply.header('X-Response-Time', `${duration}ms`)

  // 6. Fire-and-forget scan logging — non-blocking
  scanService.logScanAsync(qrData.id, request).catch(() => {})
  qrService.incrementScans(qrData.id).catch(() => {})

  // 7. Redirect
  return reply.redirect(302, qrData.destinationUrl)
})

The .catch(() => {}) on the fire-and-forget calls isn't laziness — it's intentional. If Redis is down, the scan log fails silently. That's acceptable. An unhandled rejection that bubbles up would be much worse.

Why fire-and-forget logging is correct here

The redirect and the scan log are two different concerns with different latency requirements. The user needs the redirect immediately. The analytics database can wait 100ms. Coupling them with await would add 20–40ms of Postgres write latency to every single scan. That's a bad trade.

The tradeoff is honesty: if your Node.js process crashes between the 302 and the scan write, that scan event disappears. You won't see it in analytics. For a QR campaign, missing 0.1% of scan events due to process restarts is tolerable. Missing 50ms of latency on every scan is not.

Both scanService.logScanAsync() and qrService.incrementScans() are kicked off after the reply is sent. They're not in a queue or a worker thread — just plain async functions that run to completion in the background while the event loop processes the next request.

ioredis retry strategy

We're using ioredis v5.4.1. Out of the box, ioredis retries on connection failure, but the default backoff is too aggressive for a production environment where Redis restarts briefly during a deploy. Here's the configuration we use:

import Redis from 'ioredis'

const redis = new Redis({
  host: process.env.REDIS_HOST ?? '127.0.0.1',
  port: 6379,
  lazyConnect: true,
  retryStrategy(count) {
    if (count > 3) return null           // give up after 3 attempts
    return Math.min(50 * count, 2000)    // 50ms, 100ms, 150ms — capped at 2s
  },
})

Returning null from retryStrategy tells ioredis to stop retrying and emit an error. That error surfaces in our Prometheus metrics and our structured logs, and the handler falls through to the Postgres path rather than hanging.

The lazyConnect: true option means ioredis won't attempt a connection until the first command is issued. Useful during local development when Redis might not be running, and it avoids a startup error that masks the real service starting fine.

The KEYS vs SCAN gotcha for bulk invalidation

If you ever need to invalidate all cached QR codes — say, after a bulk destination update — you might reach for KEYS qr:slug:*. Don't.

KEYS is O(n) and blocks the Redis event loop for the entire scan duration. On a Redis instance with 100,000 keys, a KEYS * call can stall the server for several hundred milliseconds — during which every other command queues up behind it. You'll see latency spikes across every service sharing that Redis instance.

// ❌ Don't do this in production
const keys = await redis.keys('qr:slug:*')
await redis.del(...keys)

// ✓ Use SCAN with a cursor instead
async function bulkInvalidateQRCache(redis: Redis): Promise<number> {
  let cursor = '0'
  let deleted = 0

  do {
    const [nextCursor, keys] = await redis.scan(
      cursor,
      'MATCH', 'qr:slug:*',
      'COUNT', 100          // process 100 keys per iteration
    )
    cursor = nextCursor

    if (keys.length > 0) {
      await redis.del(...keys)
      deleted += keys.length
    }
  } while (cursor !== '0')

  return deleted
}

SCAN iterates in batches and yields control back to the event loop between each batch. It's not instantaneous — a full sweep of a large keyspace takes longer than a single KEYS call — but it keeps Redis responsive throughout. For the vast majority of invalidation needs (a single QR update), just DEL qr:slug:{slug} directly.

Monitoring with Prometheus

Knowing the p50 is fast isn't enough. You need to know what p95 and p99 look like, because that's where cache misses, DB slow queries, and connection pool exhaustion show up. We use a Prometheus histogram with these buckets:

// Prometheus histogram — redirect request duration
const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'route', 'status_code', 'cache'],
  buckets: [0.001, 0.005, 0.010, 0.050, 0.100, 0.500],
  //          1ms    5ms   10ms   50ms  100ms  500ms
})

// In the redirect handler, after sending the reply:
httpRequestDuration
  .labels('GET', '/r/:slug', '302', cacheHit ? 'HIT' : 'MISS')
  .observe(duration / 1000)

The bucket boundaries are chosen to match our two performance targets: cache hits should land in the 5ms bucket, cache misses should land in the 50ms bucket. If you start seeing significant mass above 100ms on the HIT label, something is wrong — Redis latency spiking, network congestion, or the event loop being blocked.

Redis itself is checked in the /readyz endpoint via redis.ping(). If ping fails, the health check returns 503, which pulls the instance out of the load balancer rotation before it starts returning degraded responses.

Wrapping up

The redirect path is the most latency-sensitive part of a QR platform — and also the simplest. Redis in front of Postgres, a sensible TTL, fire-and-forget logging, and a histogram to catch regressions. None of this is novel. The value is in doing it correctly and not cutting corners (don't block the reply on scan logging, don't use KEYS on a large keyspace).

If you want the full OpenAPI spec for the Qrius redirect and QR management endpoints, the API docs have every request and response shape documented. And if you want to run QR codes without managing infrastructure, sign up free — the redirect pipeline described here is what runs behind every Qrius QR code.

Common questions

How fast is a Redis QR code redirect?

On a cache hit, around 5ms end-to-end. On a cache miss, the Prisma query adds 40–50ms. The 300-second TTL means most production traffic stays warm, so you'll spend the majority of your time in the fast path.

What cache key format should I use?

Namespace your keys: qr:slug:{slug} rather than just the slug. It prevents collisions in a shared Redis instance and makes SCAN-based invalidation patterns clean and predictable.

Should scan logging block the 302 redirect?

No. Log asynchronously and don't await it before sending the reply. The tiny risk of losing a scan event on process crash is worth the 20–40ms you save on every single scan.

KEYS or SCAN for bulk cache invalidation?

SCAN, always. KEYS blocks the Redis event loop for the full duration of the scan, which degrades every other operation hitting that instance. SCAN iterates in small batches and keeps the server responsive.

Skip the infrastructure work

This redirect pipeline is what runs behind every Qrius QR code. Sub-50ms redirects, real-time scan analytics, GDPR-safe by default. Free to start.