Skip to main content

Real-time event stream

Outcome

The realtime SSE feed keeps operator dashboards live without polling, and when a tenant reports stale data you can diagnose subscriber state and drop reasons in under a minute.

Prerequisites

  • metrics.read (or appropriate Prometheus scrape token) for health probes.
  • PLATFORM_ADMIN for forced disconnects.

Best-effort guarantee

Every event is also persisted by its source domain (claims, denials, remittance batches, authorizations). The realtime path is a UI nicety — operators reload to recover from any missed delivery.

Health checks

# 1. Process-level subscriber count.
curl -sH "Authorization: Bearer $METRICS_BEARER" \
https://rcm-core.<env>/metrics \
| grep -E '^rcm_realtime_subscribers '
# Example: rcm_realtime_subscribers{tenant_id="acme"} 7

# 2. Dispatch volume — should track the underlying domain emit rate.
curl -sH "Authorization: Bearer $METRICS_BEARER" \
https://rcm-core.<env>/metrics \
| grep -E '^rcm_realtime_events_dispatched_total'

# 3. Drop reasons. A spike on `permission` likely means a deploy
# landed a new event type without seeding the matching scope.
curl -sH "Authorization: Bearer $METRICS_BEARER" \
https://rcm-core.<env>/metrics \
| grep -E '^rcm_realtime_dropped_total'

Force-disconnect a tenant during a deploy

The hub's close() is invoked from rcm-core's graceful shutdown path, so a rolling deploy already drains every subscriber with event: server_shutdown. To eject a single tenant outside a deploy:

  1. Bounce the rcm-core replicas behind that tenant's affinity ring:

    # Kubernetes
    kubectl rollout restart deploy/rcm-core-<region>

    # Azure Container Apps
    az containerapp revision restart

    The supervisor re-picks up jobs, but every realtime subscriber is forced to reconnect against a fresh hub.

  2. Confirm via rcm_realtime_subscribers{tenant_id="<slug>"} that the gauge dropped to zero, then back up as clients reconnect.

Operator escalation — "tenant reports stale dashboards"

  1. Check rcm_realtime_subscribers{tenant_id="<slug>"} — zero means the client never connected; non-zero means the wire is up.

  2. Check rcm_realtime_dropped_total{reason="permission",tenant_id=...} — non-zero indicates the user's JWT lacks the scope for the event family (deepest cause: stale role).

  3. Check rcm_realtime_dropped_total{reason="backpressure"} — sustained drops mean the client TCP socket has been gone for a while; the hub eventually evicts itself, but a flapping network can blip the gauge.

  4. Realtime is best-effort. If the operator needs the data, they can refresh — every event is reflected in the underlying domain table.

Adding a new event family

  1. Pick the event type string and emit it via MonolithEventPublisher from the source module.

  2. Add a row to the catalog (apps/rcm-core/src/modules/realtime/event-bus-catalog.ts) with the matching requiredPermission (mirror the REST guard) and a sensible defaultUiSeverity. Tray-only events should be silent.

  3. Mirror the row in packages/ui-common/src/realtime/event-types.ts with a human label.

  4. No frontend work past the catalog rowRealtimeProvider automatically routes the toast + tray entry.

Synthetic batch events

  • batch.completed is emitted by the BatchEventSynthesizer when a (tenantId, tradingPartnerId, ISA-13) bucket goes quiet for 5 seconds. Tune via realtimeBatchQuietWindowMs if a clearinghouse legitimately submits in long pauses.
  • batch.failed is not auto-derived. Failure call-sites that want operator visibility call createServer(...).realtimeSynthesizer.recordBatchFailure({...}) with a reason string and an optional sourceId / message. The hub forwards through the normal catalog path so RBAC + drop counters apply.

Multi-replica fan-out

Each replica owns its own EventStreamHub. Cross-replica delivery relies on Service Bus already publishing every event to every replica's subscription, so an event emitted on replica A reaches clients connected to replicas A, B, and C.

If you observe asymmetric delivery (clients on one replica missing events that others see), check the Service Bus subscription for that replica first — the local hub is the output of that subscription, not the fan-out mechanism.

Validation

CheckExpected
rcm_realtime_subscribersNon-zero during active sessions
rcm_realtime_dropped_total{reason="permission"}Stable; a spike implies missing role config
Catalog row + UI labelBoth updated for new event type

Cross-references

Next

6.6 — Bulk charge entry