Skip to main content

Deployability — container images, Bicep IaC, and DR drill

Outcome

You can build the four service images, deploy them via Bicep IaC, and exercise per-tenant point-in-time restore against Postgres Flexible Server PITR.

Prerequisites

  • PLATFORM_ADMIN + Azure RBAC on the target subscription.
  • docker buildx for image work; az CLI for Bicep + PITR; gh for workflow dispatch.

Container images

Images build, scan, and smoke run via .github/workflows/build-images.yml on every PR + push to main.

AppDockerfileBasePort
rcm-coreapps/rcm-core/Dockerfilenode:22-bookworm-slim3000
edi-gatewayapps/edi-gateway/Dockerfilenode:22-bookworm-slim3011
rcm-appapps/rcm-app/Dockerfilenginx:1.27-alpine8080
edi-appapps/edi-app/Dockerfilenginx:1.27-alpine8080

Local helpers

ScriptPurpose
ops/scripts/build-images.shBuild all four images with docker buildx. Honors IMAGE_TAG and IMAGE_REGISTRY. INSTALL_CHROMIUM=true produces an rcm-core image bundling Playwright Chromium (~200 MB heavier) for the dashboard-email path.
ops/scripts/smoke-images.shBoots Postgres + the four images on a private network and curls each health endpoint until 200 or timeout. Used by the smoke job in CI.

Trivy scan in CI fails the workflow on any HIGH/CRITICAL fixable vulnerability. SARIF reports are uploaded to the GitHub security tab.

Bicep IaC

Lives under ops/iac/ with one main.bicep and a module per resource type. Three parameter files (dev, stage, prod); only dev is deploy-eligible today.

Key invariants

  • Postgres admin password is a @secure() parameter on main.bicep. In dev it's bound via readEnvironmentVariable('POSTGRES_ADMIN_PASSWORD', ''). In stage / prod the parameter binding must be switched to a Key Vault getSecret() reference (example pattern in parameters/stage.bicepparam).
  • Log Analytics shared key is fetched inside modules/container-app-env.bicep via existing + listKeys() — never exported as a module output.
  • Container Apps managed identities are system-assigned and granted Key Vault Secrets User on the deployment-created vault via in-module Microsoft.Authorization/roleAssignments. No out-of-band RBAC step required.
  • The IaC creates only the master Postgres Flexible Server. Tenant DBs are created at runtime by pnpm rcm-core provision-tenant, which writes the new server's connection string into Key Vault under tenant-db-<slug>.

CI gates

  • bicep-lint job in build-images.yml runs az bicep build against main.bicep and every parameter file on every PR that touches ops/iac/**.
  • Deploy gate: .github/workflows/deploy-dev.yml is workflow_dispatch only with two inputs (image_tag, apply). The whatif job runs unconditionally; the apply job runs only when the operator passes apply=true. Both jobs use OIDC federated credentials — secrets required: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID, DEV_POSTGRES_ADMIN_PASSWORD.

DR drill against Postgres Flexible PITR

The per-tenant DR drill exercises Azure's point-in-time restore against a single tenant database. RTO target = 1 hour; RPO target ≤ 5 minutes (PITR resolution).

Inputs

FlagPurpose
--tenant-id <uuid>The tenant to restore.
--target-time '2026-04-25T20:00:00Z'Desired PITR target. Must be within the configured PITR window (default 35 days).
--environment dev|stage|prodDrives the restored server name and the report row's environment column.
--applyExecute against Azure. Default is --dry-run so CI can exercise the report shape end-to-end.
--skip-rotateSkip the post-restore PHI DEK rotation.

Sample command

DATABASE_MASTER_URL=postgresql://... \
ops/scripts/dr-drill.sh \
--tenant-id 1234... \
--environment dev \
--target-time '2026-04-25T20:00:00Z' \
--resource-group rg-rcm-dev \
--apply \
| DATABASE_MASTER_URL=postgresql://... \
pnpm --filter @rcm/rcm-core dr-drill-report

Rehearsal cadence

CadenceAction
QuarterlyDrill against one synthetic tenant on dev. Record the row in identity.dr_drill.
AnnuallyFull prod tenant restore drill, scheduled out of band with the impacted customer.

Reading the audit

SELECT environment, started_at, rto_seconds, rpo_seconds, outcome
FROM identity.dr_drill
ORDER BY started_at DESC
LIMIT 10;

Rollback

SurfaceRollback
App revisionsaz containerapp revision activate flips traffic back. Previous revision stays warm for 1 h after a new deploy.
Bicep changesRe-deploy the previous parameter file. Use az deployment group what-if first.
PostgresPITR back to the moment before the bad change — see PITR restore.

Validation

CheckExpected
Trivy scan in CIClean for HIGH/CRITICAL fixables
whatif deploy runPlans match operator intent
Latest identity.dr_drill rowoutcome=SUCCESS
RTO measured≤ 1 hour

Cross-references

Next

9.4 — Secret rotation cadence