SaviourOps
← Blog·Engineering

How eBPF Changed Observability: Zero-Instrumentation Tracing Explained

Deep dive into how eBPF lets you trace HTTP flows, database queries, and TCP connections with zero code changes — and why that matters when you're debugging a production incident at 3 AM.

The standard advice for distributed tracing is: add the SDK, instrument your services, deploy, and observe. Reasonable advice. But if you've ever tried to retrofit OpenTelemetry into a 40-service microservices architecture owned by six different teams, you know how that story ends — six months later, three services are instrumented, two teams are blocked on a dependency review, and nobody has traced the service that's actually causing your P99 latency spikes.

eBPF takes a different approach entirely. Instead of asking application code to emit telemetry, it watches what the kernel sees — every syscall, every network packet, every file descriptor open and close — and reconstructs traces from that. No SDK. No code change. No dependency on your team's deployment schedule.

What is eBPF, actually?

eBPF stands for “extended Berkeley Packet Filter.” The name is historically accurate and practically useless for understanding what it does today.

In 2025, eBPF is a way to run sandboxed programs inside the Linux kernel — without writing a kernel module, without rebooting, and without risking a kernel panic from a bad pointer dereference. The kernel verifies eBPF programs before running them: no unbounded loops, no unsafe memory access, no calls to arbitrary kernel functions. Once verified, the program is JIT-compiled and attached to a kernel event — a network socket, a syscall entry point, a tracepoint, a kprobe.

From there, it runs at kernel speed, with access to the full context of whatever event it's attached to: the process ID, the network connection, the file being opened, the arguments to the syscall. It can write structured data to a ring buffer that userspace reads. No context switches per event. No ptrace. No overhead you can actually measure in production.

How it reconstructs HTTP traces

Here's the concrete mechanism. When your application makes an outbound HTTP request, it calls write() on a socket file descriptor. An eBPF program attached to that syscall sees:

  • The process ID and thread ID making the call
  • The file descriptor number
  • A pointer to the buffer being written (the raw HTTP bytes)
  • The timestamp

From the buffer, the eBPF program can parse the HTTP verb, path, and headers — including traceparentif it's there (W3C trace context). On the receive side, it does the same for the response. Now it has a span: start time, end time, source PID, destination socket.

A companion kprobe on tcp_connect tells it the remote IP and port. Cross-referencing with the Kubernetes node agent maps that IP to a pod name and service. The result: a distributed trace across services, reconstructed entirely from kernel events, without a single line of SDK code in either service.

Database queries, too

The same mechanism works for database protocols. PostgreSQL wire protocol over TCP is parsed by the eBPF program to extract the query text from the Parse or Query message. MySQL, Redis RESP, MongoDB wire protocol — each has a recognizable binary structure that an eBPF parser can identify from the first few bytes of a write syscall.

This is how SaviourOps can show you “checkout-svc called SELECT * FROM orders WHERE user_id = ?847 times in 200ms” without you having ever configured a database driver to emit traces. The kernel saw every byte sent to port 5432. The eBPF agent parsed it.

What eBPF does not replace

eBPF is not magic. There are things it cannot do that OpenTelemetry SDK instrumentation handles well.

Business-level spans

eBPF sees HTTP calls and DB queries. It cannot see that a particular HTTP call is 'ProcessPayment' unless your code names the route that way. Custom span names, attributes on spans, and business metrics require SDK code.

Encrypted payloads (TLS)

eBPF reads data at the syscall layer, after the kernel hands data to userspace. For TLS, the payload is still encrypted at that point. SSL/TLS tracing requires uprobes on the TLS library (OpenSSL, BoringSSL) — more complex but possible.

Application logs

Kernel-level tracing doesn't give you application log lines. You still need log collection (stdout/stderr from pods, or file-based). eBPF traces tell you what the network saw; logs tell you what the application thought.

Custom metrics

Business metrics — active subscriptions, payment success rate, queue depth — are application-domain concepts. They require the application to expose them. eBPF gives you system metrics; SDK gives you business metrics.

This is why SaviourOps supports both: eBPF for zero-effort baseline coverage across every service, OpenTelemetry (OTLP) for the services where you want deeper application-layer context. Most teams start with eBPF for the immediate coverage, then selectively add OTLP instrumentation to the 20% of services that generate 80% of their incidents.

The incident response case

The practical value of eBPF shows up most clearly during incidents involving services you've never instrumented.

In a typical microservices environment, some fraction of services are well-instrumented and some are not. The legacy payment processor. The third-party webhook handler. The internal tool someone wrote three years ago that nobody wants to touch. During an incident, these are often the first place to look — and the last place that has any telemetry.

With eBPF, every service on the cluster is automatically emitting trace data from the moment the agent is deployed. When the incident fires, you have network-level trace data for every service in the call chain — even the ones with zero SDK instrumentation. The trace may be less detailed than a fully-instrumented service, but it tells you latency, error rate, and call volume, which is usually enough to identify the broken link in the chain.

That's the 3 AM value proposition: you don't get to choose which service has the problem. eBPF makes sure you have at least baseline visibility into all of them.

Kernel version requirements

eBPF support in the kernel has been incrementally improving since Linux 3.18. For production-grade observability (CO-RE — Compile Once, Run Everywhere — support, ring buffers, BTF), you want kernel 5.8 or later. Most major cloud providers have been shipping 5.10+ or 5.15+ as their default kernel for two years now.

GKE defaults to Container-Optimized OS with kernel 5.15+. EKS defaults to Amazon Linux 2 (5.10+) and AL2023 (6.1+). AKS defaults to Ubuntu 22.04 (5.15+). If you're on any of these managed Kubernetes offerings with a reasonably recent node image, you have full eBPF support.

The agent runs as a DaemonSet — one pod per node — with the necessary kernel capabilities (CAP_BPF, CAP_PERFMON). No kernel modules. No host OS changes. Deploy the Helm chart, wait 60 seconds, and traces start flowing.

Try it

See eBPF tracing in your cluster

SaviourOps deploys in under 5 minutes. The free tier includes 5 GB/month of ingestion — enough to see eBPF traces across your whole cluster for most teams.

Your next incident is coming.
Answer it in seconds.

No dashboards to stitch together. No PagerDuty invoice. Deploy in minutes and stop dreading the pager.

Get early access

Free tier available. No credit card. No sales calls unless you want one.