Three planes, one signal bus
Stateless ingest, bounded reasoning, idempotent action. Every plane scales independently, and every tenant lives in its own slice of the data layer.
Ingestion plane
Stateless FastAPI workers accept signals from 88+ sources and push them onto Redis Streams.
- FastAPI / async / asyncpg
- Webhook normalization + JWT tenant extraction
- Bounded async dispatcher, Semaphore(64)
- Fingerprint deduplication via claim_once
Reasoning plane
LangGraph orchestrator routes signals through classify, diagnose, and confidence checks.
- LangGraph state machine, 4 stages
- IncidentAgent (RAG+LLM) + CMDBAgent
- Heuristic fallback when the LLM errors
- LiteLLM-backed: bring your own model
Action plane
Celery workers execute remediation, sync tickets, and write resolutions back to the graph.
- Celery 5 + Beat for periodic tasks
- Two-way Jira / ServiceNow sync
- 48-hour Resolution Quality Gate
- Audit log emitted on every action
The orchestrator
LangGraph state machine
Four stages, every transition observable in the dashboard. Errors fall back to a heuristic investigation rather than a user-facing failure.
classifyintent + severity + confidenceincident_or_cmdbbranch on signal kindconfidence_checkcompare vs. tenant thresholdexecute / clarify / escalateact, ask, or route to humansData layer
Three stores, one tenant boundary
Postgres 15
Tenant config, audit log, triage queue
- ~17 tenant tables with RLS
- asyncpg driver
- JSONB knowledge_files in tenant_rag_settings
Neo4j 5
Knowledge graph for CIs and Issues
- 1536-dim vector index
- Lucene full-text indexes
- Hybrid search with RRF
Redis 7
Signal bus, cache, rate limit, dedup
- Streams (alexus:signals)
- PubSub for cross-worker events
- Per-tenant key prefixing
Tenant isolation, end to end
JWT extraction at the edge sets the tenant context for every downstream call. From there, isolation is enforced at four layers — application, database, cache, and graph — with credential encryption keyed per tenant.
tenant_id from JWT._tenant_token filter on every query.
