SaviourOps

What’s new

Changelog

Every improvement, fix, and new capability — in the order we shipped it. Follow along on GitHub for commit-level detail.

April 8, 2025

v0.9.0

AI Incident Correlation Engine

AI correlation engine is now in general availability. It groups related alerts into a single incident, surfaces a probable root cause, and links the relevant traces and logs automatically. In internal testing, alert noise reduced by up to 70% on busy clusters.

  • NewAI correlation engine: groups related alerts into incidents using causal graph analysis. Reduces alert noise by up to 70% in our internal testing.
  • NewIncident timeline view: auto-generated chronological reconstruction of events from logs, traces, and metric anomalies.
  • NewRoot cause hypothesis panel: the AI surfaces up to 3 ranked hypotheses with supporting evidence from your telemetry.
  • ImprovedeBPF agent startup time reduced by 40% — kernel probes attach in under 200ms on modern kernels (5.15+).
  • ImprovedIngestion pipeline throughput improved to 2M events/sec per node (up from 1.2M). Backpressure handling is now adaptive.
  • FixedMemory leak in the eBPF userspace buffer when kernel perf ring buffers fill under sustained high load.
  • FixedAlert state machine could get stuck in 'firing' if the metrics backend returned a transient empty series.

March 18, 2025

v0.8.2

On-Call Scheduling Overhaul

On-call management gets a complete scheduling engine rewrite. Rotations now support multi-layer overrides, follow-the-sun configs, and a real-time schedule preview before you publish. We also shipped Slack-native incident commands.

  • NewFollow-the-sun rotation support: define shifts by timezone and SaviourOps hands off automatically at shift boundaries.
  • NewSlack integration v2: acknowledge, escalate, or resolve incidents directly from Slack without opening the dashboard.
  • NewSchedule override UI: drag-and-drop overrides on the calendar view with conflict detection.
  • NewOn-call health score: weekly digest showing MTTA, MTTR, and alert volume per engineer to identify toil concentration.
  • ImprovedEscalation policy editor redesigned. Multi-step escalations with per-step delays and fallback contacts.
  • ImprovedPagerDuty import tool now handles nested teams and custom escalation policies without manual cleanup.
  • FixedEdge case where simultaneous acknowledgement from two engineers could leave an incident in a split-brain acknowledged/unacknowledged state.

February 24, 2025

v0.8.0

eBPF Agent: Kubernetes Auto-Discovery

The eBPF agent now understands Kubernetes natively — it reads pod labels, namespace metadata, and service topology from the kube-apiserver and attaches that context to every trace span and network flow automatically. Zero YAML changes required.

  • NewKubernetes auto-discovery: eBPF agent reads pod/service/namespace labels and enriches all telemetry automatically. No annotation changes needed.
  • NewNetwork flow topology map: visualize east-west traffic between services derived from eBPF socket probes — no service mesh required.
  • NewHTTP/2 and gRPC tracing support in the eBPF agent (previously HTTP/1.1 only).
  • NewOpenTelemetry Collector compatibility: SaviourOps now accepts OTLP/gRPC and OTLP/HTTP on standard ports. Drop-in replacement for any OTLP-compatible collector.
  • ImprovedeBPF agent CPU overhead reduced by 18% through batched BPF map reads and reduced context switch frequency.
  • ImprovedHelm chart now supports topology spread constraints and PodDisruptionBudget for production deployments.
  • FixedeBPF agent crashed on kernel 6.6+ due to a renamed struct field in the socket filter program. Now tested against kernels 5.4 through 6.8.
  • FixedTrace context propagation dropped on HTTP requests with non-standard capitalization of the traceparent header.

January 30, 2025

v0.7.1

Log Explorer and Query Performance

Log Explorer ships with full-text search, structured field filtering, and a live-tail mode. Under the hood we rewrote the ClickHouse query layer to push filter predicates closer to storage — p99 log query latency dropped by 60% on large datasets.

  • NewLog Explorer: full-text search with AND/OR/NOT operators, structured field filtering, and saved searches.
  • NewLive-tail mode: stream log lines in real-time from the browser, filtered by service, severity, or arbitrary fields.
  • NewLog-to-trace linking: every log line with a trace_id is automatically linked to the parent span — click through without copy-pasting IDs.
  • ImprovedClickHouse query layer rewritten to push filter predicates to the storage layer. p99 query latency on 10B+ row datasets down from 4.2s to 1.7s.
  • ImprovedIngestion pipeline now automatically parses JSON log bodies and promotes top-level keys to indexed columns.
  • ImprovedAPI rate limit errors now return Retry-After headers and a structured JSON error body with request_id for easier debugging.
  • FixedDashboard time-range picker incorrectly adjusted timestamps for users in UTC+5:30 and similar half-hour offset timezones.
  • FixedSelf-hosted installer failed silently if the target host had less than 4 GB of RAM instead of surfacing a pre-flight error.

January 6, 2025

v0.7.0

Public Beta Launch

SaviourOps is now in public beta. This release includes the foundational observability pipeline, alerting engine, basic on-call scheduling, and the eBPF agent for zero-instrumentation Linux and Kubernetes deployments.

  • NewPublic beta open. Sign up at saviourops.com — free tier includes 5 GB/month ingestion.
  • NeweBPF agent: zero-instrumentation observability for Linux processes and Kubernetes workloads.
  • NewAlerting engine with multi-condition rules, severity levels, and notification routing to email, Slack, and PagerDuty.
  • NewBasic on-call scheduling with rotation management and escalation policies.
  • NewSelf-hosted deployment option via Helm chart and Docker Compose.