Shard tenants across servers and rebalance live
Outcome
Tenant databases are placed across multiple Postgres Flexible Servers according to capacity. Existing tenants can be moved between servers without data loss.
Prerequisites
PLATFORM_ADMIN.- More than one
identity.db_serverrow, or you're about to add the second. - For moves:
pg_dumpandpg_restoreon PATH at the server's major version. - Both source and target server admin secrets resolvable via Key Vault.
Concepts
capacity_hint — the soft cap
identity.db_server.capacity_hint is the soft upper bound on active +
read-only tenants on one server. Set it from the connection budget:
capacity_hint = floor(server_max_connections / TENANT_POOL_MAX / safety_factor)
Recommended safety_factor = 2 leaves headroom for connection spikes
(background migrations, impersonation, ad-hoc tooling).
Example: a server with max_connections=500, default TENANT_POOL_MAX=5,
safety factor 2 → capacity_hint = 50.
capacity_hint = NULL means unbounded — only acceptable for the dev server.
offboarding tenants do not count against capacity.
Placement algorithm
provision-tenant without --db-server calls the placement service:
- Select all
identity.db_serverwithis_active = truejoined with a live non-offboarding tenant count. - Drop any row where
current_tenant_count >= capacity_hint. - Pick the lowest count. Tie-break by
nameASC — deterministic.
If the operator passes --db-server <name>, placement only considers that
server. If it's at capacity, provisioning fails with E_NO_CAPACITY — it
does not silently fall back. Operators who name a server should hear
"that server is full" and act.
Add a new server
Stand up the Postgres Flexible Server in the desired region (Azure Portal or
az postgres flexible-server create).Store the admin connection string in Key Vault:
az keyvault secret set \--vault-name rcm-kv-prod \--name db-admin-use2 \--value 'postgres://admin:***@pgbouncer-use2.internal:6432/postgres'Insert the
identity.db_serverrow:INSERT INTO identity.db_server (name, pgbouncer_host, pgbouncer_port,admin_secret_ref, region, capacity_hint, is_active) VALUES ('pg-us-east-2', 'pgbouncer-use2.internal', 6432,'db-admin-use2', 'eastus2', 50, true);
Preview placement (place-tenant)
pnpm --filter @rcm/rcm-core place-tenant
pnpm --filter @rcm/rcm-core place-tenant --preferred-server pg-us-east-2
pnpm --filter @rcm/rcm-core place-tenant --format json
| Exit | Meaning |
|---|---|
| 0 | Placement succeeded; prints the chosen server and alternatives |
| 1 | no_capacity — every active server at capacity_hint |
| 2 | preferred_not_found or preferred_at_capacity |
| 3 | No active db_server rows |
| 4 | Argument error or master-connection failure |
Rebalance: move-tenant
UI surface:
Platform Admin → Tenants → <tenant> → Metrics → Move tenantdrives the same flow. The dialog previewsdecidePlacementfor the picked target, exposesdryRun/dropSource/noReadOnlyGate, and pollsGET /platform/jobs/:jobIdformove_started/dry_run/move_completephases.The CLI below is the canonical break-glass path — it requires only Postgres super-user +
pg_dump/pg_restoreon PATH, while the UI requires the rcm-core process to have those binaries on PATH. Both paths share the same library, audit events, and rollback semantics.
Move flow
Steps
Dry-run first:
pnpm --filter @rcm/rcm-core move-tenant \--slug acme --target-db-server pg-us-east-2 --dry-runRun the move:
pnpm --filter @rcm/rcm-core move-tenant \--slug acme --target-db-server pg-us-east-2Confirm with the parity probe (the move runs it automatically; you can re-run by hand if you want belt + braces):
SELECT(SELECT count(*) FROM identity.organization) AS org_count,(SELECT count(*) FROM security.app_user) AS user_count,(SELECT count(*) FROM members.member) AS member_count;After 7+ days, drop the source DB if you've confirmed nothing rolled back. Default behaviour leaves the source around — pass
--drop-sourceon the move only if you're certain.
Rollback semantics
| Failure point | Behaviour |
|---|---|
| Before Key Vault flip | Auto-rolls back: drops target DB, restores tenant status, writes TENANT_MOVE_FAILED. Source untouched. |
| Between Key Vault flip and master update | Documented-ambiguous: loud log + E_POST_FLIP_INCONSISTENT. Manual reconciliation: re-run the master update, OR flip Key Vault back to the source URL. Source is always retained by default. |
| After master update | Move complete. Roll back by running the move in reverse. |
Per-server monitoring
| Metric | Meaning |
|---|---|
rcm_tenants_on_server{server} | Non-offboarding tenant count per server |
rcm_db_server_capacity_hint{server} | Soft cap (0 = NULL = unbounded) |
rcm_connection_budget_remaining_per_server{server} | max(0, server_connection_limit − tenant_count × tenant_pool_max) |
| Alert | When | Action |
|---|---|---|
| Server fill ratio > 0.9 for 1 h | A server is filling up | Capacity-planning ticket; raise capacity_hint after a budget check, or stand up another server. |
rcm_connection_budget_remaining_per_server < 20 sustained | Connection budget tight | Either raise max_connections on the server (Azure Portal restart) or move tenants off. |
Troubleshooting
Symptom / error_code | Cause | Fix |
|---|---|---|
E_NO_CAPACITY | All active servers at capacity_hint | Add a server (above) or raise capacity_hint after a connection-budget check. |
E_BINARY_MISSING | pg_dump or pg_restore not on PATH | Install postgresql-client matching the server major version. |
E_POST_FLIP_INCONSISTENT | Failure between KV flip and master update | Re-run the master update OR flip KV back. Source is always retained. |
| Move stuck at "row count parity check" | Source had concurrent writes despite read_only | Confirm read_only propagated (cache TTL); re-run with a longer status-cache wait. |