API Security¶

Purpose¶

This document captures the Phase 5 security boundary for the Phoenix-backed DecisionGraph service.

The public API is multi-tenant, authenticated, and intentionally conservative around replay and rebuild controls.

Trust Boundary¶

The first HTTP service boundary assumes:

clients authenticate with static service-account bearer tokens
every versioned API request includes x-tenant-id
tenant authorization happens before controller logic
Phoenix controllers stay thin and delegate service logic to dg_api

The unversioned:

GET /api/healthz

is a deployment-health route and not a tenant-scoped product API.

Route Risk Levels¶

Lowest risk:

GET /api/v1/traces/:trace_id
GET /api/v1/graph/context
GET /api/v1/graph/edges
GET /api/v1/precedents
GET /api/v1/projections/health

Medium risk:

POST /api/v1/events

Highest risk:

POST /api/v1/admin/replays
GET /api/v1/admin/replays/:job_id
POST /api/v1/admin/replays/:job_id/cancel

Auth Model¶

Every versioned route requires:

Authorization: Bearer <token>
x-tenant-id: <tenant>

Service accounts are configured under:

beam/config/dev.exs
beam/config/test.exs

Each account carries:

roles
tenant_ids
permissions

Role checks gate route families:

reader for read APIs
writer for ingestion
admin for replay controls

Permission checks harden sensitive admin actions further:

projection_replay for catch-up replay, replay status, and replay cancel
projection_rebuild for rebuild creation, rebuild status, and rebuild cancel

Tenant Isolation¶

Tenant isolation is enforced twice:

the auth plug rejects accounts that are not allowed to use the requested tenant
replay status and cancel lookups are scoped to the requested tenant before any result is returned

That second rule matters because replay jobs are addressed by job_id. A caller that knows another tenant's job ID should still receive not_found.

Replay Safeguards¶

Replay and rebuild routes now enforce these safeguards:

admin role is required
an explicit replay permission is required
rebuild can be disabled per environment through :dg_api, :admin_controls
a human-readable reason is required by default for replay and rebuild requests
operator metadata is persisted into replay run metadata

The default global control lives in:

beam/config/config.exs

Current default:

allow_rebuild: false
require_reason: true

Development and test override rebuild to true so local flows remain usable.

Audit Capture¶

Sensitive admin actions emit audit records through:

logger entries with api_action, account_id, job_id, request_id, and tenant_id
telemetry events under [:decision_graph, :api, :admin, :audit]

Current audited actions:

replay/rebuild start
replay cancel

Audit metadata is also written into replay run metadata where available:

reason
request_id
requesting account identity and roles

Rate Limiting¶

The API uses simple fixed-window ETS rate limiting keyed by:

API scope
service-account ID
tenant ID
current minute window

Configured buckets:

read
write
admin

This is intentionally basic Phase 5 protection. It is enough to prevent accidental bursts and low-effort abuse, but it is not a replacement for upstream gateway controls.

Threat Notes¶

This Phase 5 boundary does not yet provide:

token rotation workflows
signed request bodies
IP allowlists
per-endpoint quota policies
durable audit exports
external identity provider integration

Those remain later hardening work. For now, shared environments should treat the BEAM service as an authenticated internal platform service rather than an internet-exposed public API.