SLOs And Alerting¶

Purpose¶

This document defines the first practical health targets for the supported self-hosted topology.

These are self-hosted operator targets, not cloud-SaaS promises.

For the supported topology, the most important questions are:

Target:

Operator expectation:

any sustained 5xx or connection failure on /api/healthz is page-worthy even in a single-node install

Target:

normal steady-state operation should keep pending_events = 0 or near-zero for active tenants
stale projections should self-correct quickly after brief write bursts

Operator watch threshold:

any projection that remains stale for more than a few poll cycles deserves investigation

Using the current Phase 8 local-hosting capture as a baseline:

GET /api/v1/traces/:trace_id should remain comfortably sub-100 ms p95 on the supported warm local profile
GET /api/v1/projections/health should remain comfortably sub-150 ms p95 on the same profile
POST /api/v1/events should remain comfortably sub-50 ms p95 on the same profile
POST /api/v1/admin/replays acceptance should remain comfortably sub-50 ms p95 on the same profile

These are intentionally conservative compared to the current measured numbers.

Target:

replay acceptance remains cheap
rebuild and catch-up durations remain predictable enough to reason about maintenance windows

Operator watch threshold:

replay or rebuild taking materially longer than the recorded benchmark profile on the same data scale should trigger investigation

The first self-hosted alert posture should be simple:

For a single-node deployment, these can start as operator-watch rules instead of automated paging if the installation is small.

/api/healthz
GET /api/v1/projections/health
the operator console projection health cards
the operator console replay panel
workflow inbox counts and escalation indicators
local logs with request_id, trace_id, tenant_id, projection, job_id, and workflow_id

Phase 8 does not require:

The target is one operator being able to tell whether the self-hosted node is healthy.