Observability: Logs, Metrics & Traces
Observability is the ability to understand what a system is doing from its external outputs. The three pillars are logs (what happened), metrics (how much and how fast), and distributed traces (where time was spent across services). A system with all three lets you move from alert to root cause without guessing. Logs should be structured JSON — machine-readable and filterable. Metrics expose counters, gauges, and histograms — request rate, error rate, and latency percentiles (p50, p95, p99) are the minimum set. Distributed tracing assigns a trace ID to every request that propagates across service boundaries — each service records a span, and the trace view shows the full call tree with timing. OpenTelemetry is the vendor-neutral standard for all three; emit once, ship to any backend.