Home
Building a Privacy-First, Serverless Analytics Engine with Next.js and Redis
6 min read

Building a Privacy-First, Serverless Analytics Engine with Next.js and Redis

NextJSRedisAnalyticsWebDevArchitecture

When it comes to website analytics, the default choice is often Google Analytics or a paid privacy-focused alternative. I decided to build a custom solution to better understand the infrastructure requirements of real-time data processing and to maintain full control over the data lifecycle.

Architecture Overview

The system followed a decoupled architecture where the tracking logic resided at the edge, and the dashboard functioned as a consumer of aggregated Redis data.

graph TD
    User([User Visit]) --> Buffer[Client-side Event Buffer]
    Buffer --> Edge[Next.js API /api/track - Batched]
    Edge --> Aggregator[Server-side Aggregator]
    Aggregator --> Redis{Upstash Redis}
    
    subgraph "Data Structures"
    Redis --> Events[List: visits:events - Rolling Buffer]
    Redis --> HLL[HyperLogLog: unique:visits - Cardinality]
    Redis --> Daily[Hash: visits:daily:date - Aggregates]
    end
    
    Dashboard[Next.js Dashboard] --> Redis

Architecture Decision Record (ADR)

To maintain clarity on design choices, I documented the following key decisions:

Feature Choice Rationale
Hot Path Storage Redis Required sub-millisecond writes for tracking and low-latency reads for the dashboard.
Counting Unique IDs HyperLogLog Traditional sets grow linearly with traffic; HLL maintains a constant 12KB footprint.
Data Retention Rolling Buffer Storing every raw event indefinitely is cost-prohibitive. A fixed-size List preserves recent context.
Privacy Salted Hashing Ensures PII (IP addresses) is never stored, making the system GDPR-compliant by design.
Write Strategy Buffer & Flush Batched events from the client (10s intervals) to minimize Redis request counts and avoid rate limits.

Technical Challenges & Evolutions

Building this system from scratch revealed several non-obvious hurdles that required significant architectural pivots:

  1. The "Request Price" Challenge:

    • Problem: Each visitor interaction (scroll, click, pageview) originally triggered individual Redis commands. In a serverless/Upstash environment, this consumed the free tier quota rapidly.
    • Fix: Transitioned to a Client-side Event Buffer. By grouping events and sending them in "micro-batches" every 10 seconds, Redis command volume was reduced by ~80% without losing data fidelity.
  2. Type-Conflict & Migration Errors:

    • Problem: Moving from standard Sets (for unique visitor IDs) to HyperLogLog caused WRONGTYPE errors on existing keys, leading to persistent 500 API failures.
    • Fix: I cleared the legacy keys and implemented a Log-based Backfill. Since the system maintains a rolling buffer of raw events, I was able to re-process historical traffic and reconstruct the new HLL structures safely.
  3. The Pipeline Limit:

    • Problem: When fetching data for the dashboard, a single massive Redis Pipeline would sometimes exceed REST API payload limits or timeout.
    • Fix: Implemented Chunked Pipeline Execution, splitting large data fetches into smaller batches of 500 commands to ensure stability.

Step-by-Step Implementation Guide

If you're looking to build something similar, here's how to structure the core components of the "Buffer & Flush" analytics engine.

Step 1: Client-side Event Buffering

Most tracking scripts send an HTTP request immediately for every click or scroll. This is expensive and slow. Instead, use a memory buffer in your _app.js or a custom React Hook.

let eventBuffer = [];
let flushTimeout = null;

function trackEvent(pathname, data = {}) {
  eventBuffer.push({ pathname, timestamp: new Date().toISOString(), ...data });

  // Flush immediately for pageviews, buffer others for 10s
  if (data.type === 'pageview') {
    flushEvents();
  } else if (!flushTimeout) {
    flushTimeout = setTimeout(flushEvents, 10000);
  }
}

function flushEvents() {
  if (eventBuffer.length === 0) return;
  const payload = JSON.stringify(eventBuffer);
  eventBuffer = []; // Clear buffer BEFORE sending to prevent duplicates

  // Use sendBeacon for more reliability on page exit
  if (navigator.sendBeacon) {
    navigator.sendBeacon('/api/track', new Blob([payload], { type: 'application/json' }));
  } else {
    fetch('/api/track', { method: 'POST', body: payload, keepalive: true });
  }
}

Step 2: Server-side Command Aggregation

On the server, your /api/track endpoint shouldn't just dump events into a database. It should aggregate them first. If one user clicks three links in a 10-second window, you should send one command to Redis, not three.

// pages/api/track.js
export default async function handler(req, res) {
  const events = Array.isArray(req.body) ? req.body : [req.body];
  const pipe = redis.pipeline();
  const counters = {};

  for (const e of events) {
    const key = `visits:pages:${e.pathname}`;
    // Pre-aggregate increments locally
    counters[key] = (counters[key] || 0) + 1;
    
    // Use HyperLogLog for unique visitors (HLL)
    const ipHash = hashIP(req.headers['x-forwarded-for']);
    pipe.pfadd('visits:unique:all', ipHash);
  }

  // Execute one atomic batch write
  Object.entries(counters).forEach(([key, val]) => pipe.incrby(key, val));
  await pipe.exec();
}

Step 3: Fast Rollups with Redis

To keep the dashboard fast, we use Redis Daily Hashes for geo and referrer data. Instead of counting raw logs every time the dashboard loads, we "roll up" the data instantly during the write phase using HINCRBY.

Step 4: The Dashboard

The final dashboard consumes these aggregated keys using a single Redis pipeline, rendering metrics in milliseconds without any complex SQL joins or heavy query engines.


Dashboard Metrics

The dashboard provides a breakdown of traffic across several dimensions:

Analytics Dashboard

  • Geography: Visit distribution by country and city.
  • Referrer: Traffic sources (e.g., social media, GitHub, or search).
  • Device: Distribution between mobile and desktop users.

Inspirations & Similar Approaches

This architecture drew inspiration from several established patterns in the web analytics community:

  • Plausible Analytics: Their focus on simplicity and privacy over deep behavioral tracking served as a primary model for this project.
  • Tinybird (Real-time Analytics): Their approach to using ingestion pipelines informed how I structured the Redis rollups.
  • Redis "Fast Counter" Pattern: Using HINCRBY for atomic daily aggregates is a well-documented strategy for high-performance dashboards.

Privacy Considerations

IP addresses were hashed with a rotating salt, meaning the original data cannot be recovered from the database. This approach ensures that privacy is built into the architecture from the start.

References & Further Reading