Backup and restore one tenant
Outcome
A logical pg_dump of one tenant, and a documented restore path that goes to a
parallel database first — never overwriting the live one.
Prerequisites
PLATFORM_ADMIN.pg_dumpandpg_restoreon PATH at the source server's major version.- Key Vault read access for the tenant secret.
- Brief
read_onlywindow OK (or pre-arranged with the customer for longer dumps).
When to use which backup
| Purpose | Tool |
|---|---|
| Disaster recovery, server-wide rollback | Azure Backup PITR (Postgres Flexible Server) — see PITR restore |
| Move one tenant to another server | move-tenant |
| Single-tenant export (auditor request, pre-destructive change, slug change) | pg_dump — this chapter |
Backup flow
Steps — backup
Resolve the tenant connection string from Key Vault.
TENANT=acmeSECRET_NAME="tenant-db-${TENANT}"CONN=$(az keyvault secret show --vault-name $KV --name $SECRET_NAME \--query value -o tsv)Flip the tenant to
read_only(recommended for dumps longer than a few seconds).pnpm --filter @rcm/rcm-core tenant-status \--slug $TENANT --to read_only --reason "pg_dump snapshot"Run
pg_dumpin the custom format. Excludepg_boss.*to keep the artifact small — the queue tables don't carry PHI worth restoring later.pg_dump --dbname="$CONN" \--format=custom \--exclude-schema=pg_boss \--file="${TENANT}-$(date -u +%Y%m%dT%H%M%SZ).dump"Restore the tenant to
active.pnpm --filter @rcm/rcm-core tenant-status \--slug $TENANT --to active --reason "pg_dump complete"
Steps — restore
The restore always lands in a parallel database. Never restore over the live one.
Provision a sibling tenant DB with a
-restore-<timestamp>slug:pnpm --filter @rcm/rcm-core provision-tenant \--slug acme-restore-20260427t1500z \--display-name "Acme (restore)" \--admin-email ops@medsuite.example(Or provision via the UI and skip this step if you'd rather restore in place — see step 4.)
Run
pg_restoreagainst the new tenant's connection string.RESTORE_CONN=$(az keyvault secret show --vault-name $KV \--name tenant-db-acme-restore-20260427t1500z --query value -o tsv)pg_restore --dbname="$RESTORE_CONN" \--no-owner --clean --if-exists \./acme-20260427T1500Z.dumpValidate by spot-checking key PHI tables and confirming row counts match the source. The same parity probe used by move-tenant covers the canonical canary tables:
SELECT(SELECT count(*) FROM identity.organization) AS org_count,(SELECT count(*) FROM security.app_user) AS user_count,(SELECT count(*) FROM members.member) AS member_count;Cut over (only if you're rebuilding the live tenant from this dump): update
identity.tenant.db_config_refto the restored Key Vault secret. The next resolver cache miss picks up the new DB. For an immediate cutover, flip the tenantread_onlyfirst, restart the rcm-core fleet, then flipactiveagain.Retain the original (now-stale) DB for at least 7 days before drop, in case you need to roll back.
Validation
| Check | Expected |
|---|---|
| Dump file size | Reasonable for tenant volume; not 0 bytes |
pg_restore exit code | 0 |
| Parity probe row counts | Match source within expected drift |
Restored DB readable via psql | Yes; key tables populate |
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
pg_dump: error: query failed: ERROR: permission denied for schema X | Tenant secret is the app role, not the admin role | Use the server-level admin connection string (resolve db_server.admin_secret_ref). |
pg_restore: error: could not execute query: ERROR: relation "..." already exists | Restore target already has the schema | Pass --clean --if-exists (already in the example). |
| Dump file is huge | pg_boss jobs not excluded, or member detail tables hot | Confirm --exclude-schema=pg_boss. For large per-tenant volumes, consider PITR instead. |
| Read traffic still served from old DB after cutover | Resolver pool caching | Rolling-restart the fleet, or wait for TENANT_METADATA_CACHE_TTL_MS (default 30 s). |