EKS Telemetry

Automated observability for microservices at scale.

tech stack

AWSEKSPrometheusGrafanaKubernetes

The Problem

Managing observability for a large 67-service microservices application on AWS EKS was a manual, error-prone process that left critical services unmonitored.

What I Built

Designed and deployed an automated telemetry stack using the Prometheus Operator and Grafana on EKS. The system uses auto-discovery to ensure every new service is immediately tracked and alerted on.

Architecture & Approach

Utilized Helm charts and Terraform to deploy a highly available OTel-Prometheus-Grafana pipeline. Implemented service-level objectives (SLOs) and automated alerting based on the "Golden Signals" of SRE.

Impact & Results

Achieved 100% monitoring coverage across all 67 microservices.

Reduced cloud infrastructure costs by 20% through resource optimization identified via telemetry.

Automated disaster recovery alerts, reducing downtime by 30% during regional outages.

Key Decisions & Tradeoffs

Selected Prometheus over managed cloud-native tools to avoid vendor lock-in and provide a standardized interface for developers across different environments.