Real-time event stream
Outcome
The realtime SSE feed keeps operator dashboards live without polling, and when a tenant reports stale data you can diagnose subscriber state and drop reasons in under a minute.
Prerequisites
metrics.read(or appropriate Prometheus scrape token) for health probes.PLATFORM_ADMINfor forced disconnects.
Best-effort guarantee
Every event is also persisted by its source domain (claims, denials, remittance batches, authorizations). The realtime path is a UI nicety — operators reload to recover from any missed delivery.
Health checks
# 1. Process-level subscriber count.
curl -sH "Authorization: Bearer $METRICS_BEARER" \
https://rcm-core.<env>/metrics \
| grep -E '^rcm_realtime_subscribers '
# Example: rcm_realtime_subscribers{tenant_id="acme"} 7
# 2. Dispatch volume — should track the underlying domain emit rate.
curl -sH "Authorization: Bearer $METRICS_BEARER" \
https://rcm-core.<env>/metrics \
| grep -E '^rcm_realtime_events_dispatched_total'
# 3. Drop reasons. A spike on `permission` likely means a deploy
# landed a new event type without seeding the matching scope.
curl -sH "Authorization: Bearer $METRICS_BEARER" \
https://rcm-core.<env>/metrics \
| grep -E '^rcm_realtime_dropped_total'
Force-disconnect a tenant during a deploy
The hub's close() is invoked from rcm-core's graceful shutdown path,
so a rolling deploy already drains every subscriber with
event: server_shutdown. To eject a single tenant outside a deploy:
Bounce the rcm-core replicas behind that tenant's affinity ring:
# Kuberneteskubectl rollout restart deploy/rcm-core-<region># Azure Container Appsaz containerapp revision restartThe supervisor re-picks up jobs, but every realtime subscriber is forced to reconnect against a fresh hub.
Confirm via
rcm_realtime_subscribers{tenant_id="<slug>"}that the gauge dropped to zero, then back up as clients reconnect.
Operator escalation — "tenant reports stale dashboards"
Check
rcm_realtime_subscribers{tenant_id="<slug>"}— zero means the client never connected; non-zero means the wire is up.Check
rcm_realtime_dropped_total{reason="permission",tenant_id=...}— non-zero indicates the user's JWT lacks the scope for the event family (deepest cause: stale role).Check
rcm_realtime_dropped_total{reason="backpressure"}— sustained drops mean the client TCP socket has been gone for a while; the hub eventually evicts itself, but a flapping network can blip the gauge.Realtime is best-effort. If the operator needs the data, they can refresh — every event is reflected in the underlying domain table.
Adding a new event family
Pick the event type string and emit it via
MonolithEventPublisherfrom the source module.Add a row to the catalog (
apps/rcm-core/src/modules/realtime/event-bus-catalog.ts) with the matchingrequiredPermission(mirror the REST guard) and a sensibledefaultUiSeverity. Tray-only events should besilent.Mirror the row in
packages/ui-common/src/realtime/event-types.tswith a human label.No frontend work past the catalog row —
RealtimeProviderautomatically routes the toast + tray entry.
Synthetic batch events
batch.completedis emitted by theBatchEventSynthesizerwhen a(tenantId, tradingPartnerId, ISA-13)bucket goes quiet for 5 seconds. Tune viarealtimeBatchQuietWindowMsif a clearinghouse legitimately submits in long pauses.batch.failedis not auto-derived. Failure call-sites that want operator visibility callcreateServer(...).realtimeSynthesizer.recordBatchFailure({...})with areasonstring and an optionalsourceId/message. The hub forwards through the normal catalog path so RBAC + drop counters apply.
Multi-replica fan-out
Each replica owns its own EventStreamHub. Cross-replica delivery
relies on Service Bus already publishing every event to every replica's
subscription, so an event emitted on replica A reaches clients
connected to replicas A, B, and C.
If you observe asymmetric delivery (clients on one replica missing events that others see), check the Service Bus subscription for that replica first — the local hub is the output of that subscription, not the fan-out mechanism.
Validation
| Check | Expected |
|---|---|
rcm_realtime_subscribers | Non-zero during active sessions |
rcm_realtime_dropped_total{reason="permission"} | Stable; a spike implies missing role config |
| Catalog row + UI label | Both updated for new event type |
Cross-references
- Distributed tracing for cross-service event correlation.