When it comes to website analytics, the default choice is often Google Analytics or a paid privacy-focused alternative. I decided to build a custom solution to better understand the infrastructure requirements of real-time data processing and to maintain full control over the data lifecycle.
Architecture Overview
The system followed a decoupled architecture where the tracking logic resided at the edge, and the dashboard functioned as a consumer of aggregated Redis data.
graph TD
User([User Visit]) --> Buffer[Client-side Event Buffer]
Buffer --> Edge[Next.js API /api/track - Batched]
Edge --> Aggregator[Server-side Aggregator]
Aggregator --> Redis{Upstash Redis}
subgraph "Data Structures"
Redis --> Events[List: visits:events - Rolling Buffer]
Redis --> HLL[HyperLogLog: unique:visits - Cardinality]
Redis --> Daily[Hash: visits:daily:date - Aggregates]
end
Dashboard[Next.js Dashboard] --> Redis
Architecture Decision Record (ADR)
To maintain clarity on design choices, I documented the following key decisions:
| Feature | Choice | Rationale |
|---|---|---|
| Hot Path Storage | Redis | Required sub-millisecond writes for tracking and low-latency reads for the dashboard. |
| Counting Unique IDs | HyperLogLog | Traditional sets grow linearly with traffic; HLL maintains a constant 12KB footprint. |
| Data Retention | Rolling Buffer | Storing every raw event indefinitely is cost-prohibitive. A fixed-size List preserves recent context. |
| Privacy | Salted Hashing | Ensures PII (IP addresses) is never stored, making the system GDPR-compliant by design. |
| Write Strategy | Buffer & Flush | Batched events from the client (10s intervals) to minimize Redis request counts and avoid rate limits. |
Technical Challenges & Evolutions
Building this system from scratch revealed several non-obvious hurdles that required significant architectural pivots:
-
The "Request Price" Challenge:
- Problem: Each visitor interaction (scroll, click, pageview) originally triggered individual Redis commands. In a serverless/Upstash environment, this consumed the free tier quota rapidly.
- Fix: Transitioned to a Client-side Event Buffer. By grouping events and sending them in "micro-batches" every 10 seconds, Redis command volume was reduced by ~80% without losing data fidelity.
-
Type-Conflict & Migration Errors:
- Problem: Moving from standard Sets (for unique visitor IDs) to HyperLogLog caused
WRONGTYPEerrors on existing keys, leading to persistent 500 API failures. - Fix: I cleared the legacy keys and implemented a Log-based Backfill. Since the system maintains a rolling buffer of raw events, I was able to re-process historical traffic and reconstruct the new HLL structures safely.
- Problem: Moving from standard Sets (for unique visitor IDs) to HyperLogLog caused
-
The Pipeline Limit:
- Problem: When fetching data for the dashboard, a single massive Redis Pipeline would sometimes exceed REST API payload limits or timeout.
- Fix: Implemented Chunked Pipeline Execution, splitting large data fetches into smaller batches of 500 commands to ensure stability.
Step-by-Step Implementation Guide
If you're looking to build something similar, here's how to structure the core components of the "Buffer & Flush" analytics engine.
Step 1: Client-side Event Buffering
Most tracking scripts send an HTTP request immediately for every click or scroll. This is expensive and slow. Instead, use a memory buffer in your _app.js or a custom React Hook.
let eventBuffer = [];
let flushTimeout = null;
function trackEvent(pathname, data = {}) {
eventBuffer.push({ pathname, timestamp: new Date().toISOString(), ...data });
// Flush immediately for pageviews, buffer others for 10s
if (data.type === 'pageview') {
flushEvents();
} else if (!flushTimeout) {
flushTimeout = setTimeout(flushEvents, 10000);
}
}
function flushEvents() {
if (eventBuffer.length === 0) return;
const payload = JSON.stringify(eventBuffer);
eventBuffer = []; // Clear buffer BEFORE sending to prevent duplicates
// Use sendBeacon for more reliability on page exit
if (navigator.sendBeacon) {
navigator.sendBeacon('/api/track', new Blob([payload], { type: 'application/json' }));
} else {
fetch('/api/track', { method: 'POST', body: payload, keepalive: true });
}
}
Step 2: Server-side Command Aggregation
On the server, your /api/track endpoint shouldn't just dump events into a database. It should aggregate them first. If one user clicks three links in a 10-second window, you should send one command to Redis, not three.
// pages/api/track.js
export default async function handler(req, res) {
const events = Array.isArray(req.body) ? req.body : [req.body];
const pipe = redis.pipeline();
const counters = {};
for (const e of events) {
const key = `visits:pages:${e.pathname}`;
// Pre-aggregate increments locally
counters[key] = (counters[key] || 0) + 1;
// Use HyperLogLog for unique visitors (HLL)
const ipHash = hashIP(req.headers['x-forwarded-for']);
pipe.pfadd('visits:unique:all', ipHash);
}
// Execute one atomic batch write
Object.entries(counters).forEach(([key, val]) => pipe.incrby(key, val));
await pipe.exec();
}
Step 3: Fast Rollups with Redis
To keep the dashboard fast, we use Redis Daily Hashes for geo and referrer data. Instead of counting raw logs every time the dashboard loads, we "roll up" the data instantly during the write phase using HINCRBY.
Step 4: The Dashboard
The final dashboard consumes these aggregated keys using a single Redis pipeline, rendering metrics in milliseconds without any complex SQL joins or heavy query engines.
Dashboard Metrics
The dashboard provides a breakdown of traffic across several dimensions:

- Geography: Visit distribution by country and city.
- Referrer: Traffic sources (e.g., social media, GitHub, or search).
- Device: Distribution between mobile and desktop users.
Inspirations & Similar Approaches
This architecture drew inspiration from several established patterns in the web analytics community:
- Plausible Analytics: Their focus on simplicity and privacy over deep behavioral tracking served as a primary model for this project.
- Tinybird (Real-time Analytics): Their approach to using ingestion pipelines informed how I structured the Redis rollups.
- Redis "Fast Counter" Pattern: Using
HINCRBYfor atomic daily aggregates is a well-documented strategy for high-performance dashboards.
Privacy Considerations
IP addresses were hashed with a rotating salt, meaning the original data cannot be recovered from the database. This approach ensures that privacy is built into the architecture from the start.
References & Further Reading
- Redis HyperLogLog Documentation - Technical details on cardinality estimation.
- Upstash: Building Analytics with Redis - A reference guide for serverless data patterns.
- GDPR and IP Anonymization - Best practices for privacy-first tracking.
- Next.js Middleware - Exploring edge-side processing for lower latency.

