Skip to main content

1.8 Phase 1 Troubleshooting

Quick reference for common bootstrap-time failures. The full operations runbook (OPERATIONS_RUNBOOK.md in the repo) covers steady-state issues.

Database

SymptomCauseResolution
psql: connection refused from your laptopFirewall rule missing for your public IPRe-add via az postgres flexible-server firewall-rule create.
permission denied to create extension citextRole lacks Postgres-admin privilegeGRANT azure_pg_admin TO rcm_master_admin; then re-run migrations.
Migration X is "stuck"knex_migrations_lock.is_locked = true from a prior crashUPDATE knex_migrations_lock SET is_locked = 0; then re-run.
Master migration is idempotent but tries to re-runMissing row in knex_migrationsAdd the row manually if the migration was applied out-of-band.

Key Vault

SymptomCauseResolution
Cannot find secret X from container app at startupManaged identity missing roleAssign Key Vault Secrets User to the app's principal.
403 Forbidden for your CLI sessionRBAC propagation delayWait 1–2 min, retry.
Secret is present but app reads undefinedEnv-var name mismatchCheck the app's KV_SECRET_* env mapping in the container app config.

Service Bus

SymptomCauseResolution
Worker doesn't pick up jobsWrong namespace name in app configMatch SERVICE_BUS_NAMESPACE env.
Unauthorized from app to busManaged identity missing roleAzure Service Bus Data Receiver/Sender on the namespace.
Messages stuck in DLQVisible from Service Bus Explorer in the portalSee OPERATIONS_RUNBOOK.md §16 (pg-boss supervisor — note the platform uses both pg-boss for tenant jobs and Service Bus for cross-tenant events).

DNS / TLS

SymptomCauseResolution
Cert renewal not happeningCustom-domain validation brokenRe-issue via Static Web App settings.
*.rcm.medsuite.com returns 404Front Door wildcard route not configuredRecheck 1.6 step 4.
api.rcm returns 502 intermittentlyContainer app cold-startSet min replicas ≥ 1 for prod.

Auth

SymptomCauseResolution
Bootstrap script: duplicate key on emailRan twiceUse --reset-password, or delete the row and rerun.
Login returns 401 with correct passwordis_active = falseUPDATE security.platform_user SET is_active = true WHERE email = ....
Login succeeds, but /platform/tenants returns 403Missing PLATFORM_ADMIN role assignmentInsert into security.platform_role_assignment.

Where to look first

If you're truly stuck, escalate to engineering with: the failing endpoint URL, the Container App revision ID, and the last 200 lines of logs.