|
| 1 | +# Multi-Study Support — Phase 2: Per-Study Access Control |
| 2 | + |
| 3 | +## Prerequisites |
| 4 | + |
| 5 | +Phase 1 ([multistudy_phase1.md](multistudy_phase1.md)) must be complete: `study` metadata flows from user-facing APIs through to job metadata and launcher plumbing. |
| 6 | + |
| 7 | +## Introduction |
| 8 | + |
| 9 | +Phase 2 adds per-study access control on top of Phase 1's plumbing. The existing role model (`project_admin`, `org_admin`, `lead`, `member`) is unchanged — the only difference is that roles become per-study instead of global. No new roles are introduced. |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## Core Idea |
| 14 | + |
| 15 | +Today, a user's role is global (baked into the X.509 cert). Phase 2 makes the same role **per-study**: a user can be `lead` in one study and `member` in another. |
| 16 | + |
| 17 | +The existing authorization rules (`authorization.json`) stay exactly as they are. The only new layer is a **study filter**: before evaluating existing RBAC, the server checks whether the resource belongs to the user's active study. If not, the resource is invisible. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## How Authorization Works Today |
| 22 | + |
| 23 | +### `authorization.json` |
| 24 | + |
| 25 | +Each deployment ships an `authorization.json` that maps roles to permissions. The default policy: |
| 26 | + |
| 27 | +```json |
| 28 | +{ |
| 29 | + "format_version": "1.0", |
| 30 | + "permissions": { |
| 31 | + "project_admin": "any", |
| 32 | + "org_admin": { |
| 33 | + "submit_job": "none", |
| 34 | + "clone_job": "none", |
| 35 | + "manage_job": "o:submitter", |
| 36 | + "download_job": "o:submitter", |
| 37 | + "view": "any", |
| 38 | + "operate": "o:site", |
| 39 | + "shell_commands": "o:site", |
| 40 | + "byoc": "none" |
| 41 | + }, |
| 42 | + "lead": { |
| 43 | + "submit_job": "any", |
| 44 | + "clone_job": "n:submitter", |
| 45 | + "manage_job": "n:submitter", |
| 46 | + "download_job": "n:submitter", |
| 47 | + "view": "any", |
| 48 | + "operate": "o:site", |
| 49 | + "shell_commands": "o:site", |
| 50 | + "byoc": "any" |
| 51 | + }, |
| 52 | + "member": { |
| 53 | + "view": "any" |
| 54 | + } |
| 55 | + } |
| 56 | +} |
| 57 | +``` |
| 58 | + |
| 59 | +Permission conditions: `"any"` = unrestricted, `"none"` = denied, `"n:submitter"` = only if user is the submitter, `"o:submitter"` = only if user is in the same org as the submitter, `"o:site"` = only if user is in the same org as the target site. |
| 60 | + |
| 61 | +### Authorization Flow |
| 62 | + |
| 63 | +1. A `Person(name, org, role)` is constructed from the user's X.509 cert |
| 64 | +2. An `AuthzContext(right, user, submitter)` wraps the request |
| 65 | +3. The `Authorizer` evaluates the policy: look up `permissions[person.role][right]` and check the condition against the context |
| 66 | + |
| 67 | +**Phase 2 changes only step 1**: the `role` used to construct `Person` can come from the per-study mapping instead of the cert. Steps 2 and 3 are untouched. `authorization.json` is unchanged. |
| 68 | + |
| 69 | +--- |
| 70 | + |
| 71 | +## Role Resolution |
| 72 | + |
| 73 | +No new roles. The existing four roles are reused: |
| 74 | + |
| 75 | +| Role | Capabilities (per `authorization.json`, unchanged) | |
| 76 | +|------|-----| |
| 77 | +| `project_admin` | All operations (`"any"`) | |
| 78 | +| `org_admin` | Manage/download own-org jobs, view all, operate own-org sites | |
| 79 | +| `lead` | Submit/manage/download own jobs, view all, operate own-org sites | |
| 80 | +| `member` | View only | |
| 81 | + |
| 82 | +**Resolution order:** |
| 83 | +1. If `project.yml` has a `studies:` section AND the user has a mapping for the active study → use that role |
| 84 | +2. Else if active study is `default` → fall back to cert-embedded role (legacy compatibility) |
| 85 | +3. Otherwise → deny |
| 86 | + |
| 87 | +### What the participant `role` means |
| 88 | + |
| 89 | +The `role` field on admin participants in `project.yml` serves two purposes: |
| 90 | +- It is baked into the X.509 cert at provisioning time (identity + authentication) |
| 91 | +- It is the **effective role for the `default` study** and for deployments without a `studies:` section |
| 92 | + |
| 93 | +When a `studies:` section is present, the per-study role overrides the cert role for that study. The cert role still applies to the `default` study as a fallback. |
| 94 | + |
| 95 | +Example: a user with `role: lead` in their participant entry and `member` in the `cancer-research` study mapping is `lead` when operating in the `default` study but `member` when operating in `cancer-research`. |
| 96 | + |
| 97 | +--- |
| 98 | + |
| 99 | +## Provisioning: `project.yml` |
| 100 | + |
| 101 | +Minimal addition: a `studies:` section that maps study names to enrolled sites and per-user role overrides. Everything else stays as-is. |
| 102 | + |
| 103 | +```yaml |
| 104 | +# Existing sections unchanged |
| 105 | +participants: |
| 106 | + - name: server1.example.com |
| 107 | + type: server |
| 108 | + org: nvidia |
| 109 | + - name: hospital-a |
| 110 | + type: client |
| 111 | + org: org_a |
| 112 | + - name: hospital-b |
| 113 | + type: client |
| 114 | + org: org_b |
| 115 | + - name: admin@nvidia.com |
| 116 | + type: admin |
| 117 | + org: nvidia |
| 118 | + role: project_admin # cert role; effective role for "default" study |
| 119 | + - name: trainer@org_a.com |
| 120 | + type: admin |
| 121 | + org: org_a |
| 122 | + role: lead # cert role; effective role for "default" study |
| 123 | + |
| 124 | +# New section — per-study role overrides |
| 125 | +studies: |
| 126 | + cancer-research: |
| 127 | + sites: [hospital-a, hospital-b] |
| 128 | + admins: |
| 129 | + trainer@org_a.com: lead # same as cert role here, but explicit |
| 130 | + |
| 131 | + multiple-sclerosis: |
| 132 | + sites: [hospital-a] |
| 133 | + admins: |
| 134 | + trainer@org_a.com: member # overrides cert role for this study |
| 135 | +``` |
| 136 | +
|
| 137 | +- If `studies:` is absent, the system behaves exactly as today (single-tenant, cert roles only). |
| 138 | +- Sites listed under a study must reference existing client-type participants. |
| 139 | +- Admins listed under a study must reference existing admin-type participants. |
| 140 | +- A user not listed under a study has no access to that study (except `default`, which falls back to cert role). |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +## Authorization Enforcement |
| 145 | + |
| 146 | +Two layers, evaluated in order for every command: |
| 147 | + |
| 148 | +1. **Study filter** (new): Does the target resource (job, client) belong to the user's active study? If no → invisible. |
| 149 | +2. **RBAC policy** (existing, unchanged): Construct `Person` with the resolved per-study role, evaluate `authorization.json` as today. |
| 150 | + |
| 151 | +The session's active study (set at session start via `--study` or the `study` API parameter) determines which study filter applies. The study is carried by the authenticated server session and preserved in the session token, so subsequent commands do not need to resend it. |
| 152 | + |
| 153 | +--- |
| 154 | + |
| 155 | +## Job Scheduler |
| 156 | + |
| 157 | +When a `studies:` section is present in `project.yml`: |
| 158 | + |
| 159 | +1. **Site filtering**: Only schedule jobs to sites enrolled in the job's study |
| 160 | +2. **Validation**: `deploy_map` sites must be a subset of the study's enrolled sites |
| 161 | + |
| 162 | +No quota or priority changes. |
| 163 | + |
| 164 | +--- |
| 165 | + |
| 166 | +## Runtime Isolation |
| 167 | + |
| 168 | +### Kubernetes |
| 169 | + |
| 170 | +The K8s launcher reads `study` from job metadata and resolves study-specific workspace volumes: |
| 171 | +- Workspace volume resolved by `(study, client)` tuple |
| 172 | +- Each job pod mounts only its study's data volume |
| 173 | + |
| 174 | +### Docker |
| 175 | + |
| 176 | +The Docker launcher reads `study` from job metadata and mounts the corresponding host directory (e.g., `/data/<study>/`) as the workspace volume. |
| 177 | + |
| 178 | +--- |
| 179 | + |
| 180 | +## When to Use Multi-Study vs. Separate Deployments |
| 181 | + |
| 182 | +Multi-study is an operational convenience feature for shared-trust environments. If you need stronger isolation, use separate deployments on separate hardware or VMs. |
| 183 | + |
| 184 | +### Use multi-study when |
| 185 | + |
| 186 | +- One or more client sites participate in multiple studies and re-provisioning them for each study is operationally costly |
| 187 | +- All studies operate within the same operational trust boundary |
| 188 | +- You accept software-enforced isolation: study separation relies on correct authorization logic, session management, and launcher volume configuration |
| 189 | + |
| 190 | +### Use separate deployments when |
| 191 | + |
| 192 | +- You need stronger isolation than a shared multi-study deployment can provide |
| 193 | +- A problem in one study must not be able to affect another study |
| 194 | +- Studies have non-overlapping participants (no shared client sites), so separate deployments add little operational cost |
| 195 | + |
| 196 | +### Security Assumptions That Change |
| 197 | + |
| 198 | +With a single multi-study deployment, the following are shared across all studies: |
| 199 | + |
| 200 | +| Resource | Separate deployments | Multi-study | |
| 201 | +|----------|----------------------|-------------| |
| 202 | +| Server runtime | Separate per deployment | Shared | |
| 203 | +| Client runtime at each site | Separate per deployment | Shared | |
| 204 | +| PKI / cert root | Independent | Shared | |
| 205 | +| `project_admin` blast radius | One deployment | All studies in that deployment | |
| 206 | +| Isolation mechanism | Separate hardware/VM deployment boundary | Shared software authz and launcher configuration | |
| 207 | + |
| 208 | +A bug in authorization, session handling, or launcher volume configuration can affect multiple studies in the same deployment. |
| 209 | + |
| 210 | +### Summary |
| 211 | + |
| 212 | +Use multi-study to reduce operational overhead when client sites participate in multiple studies under a shared trust boundary. Use separate deployments when you need stronger isolation. |
| 213 | + |
| 214 | +--- |
| 215 | + |
| 216 | +## Migration / Backward Compatibility |
| 217 | + |
| 218 | +1. **No `studies:` section** → system behaves as today, single-tenant, cert roles only. |
| 219 | +2. **`studies:` section present** → per-study role enforcement enabled; the `default` study falls back to cert roles for compatibility. |
| 220 | +3. **Legacy jobs** (no `study` field) → treated as `default` study (Phase 1 behavior, unchanged). |
| 221 | +4. **No data migration** — job store layout is unchanged (`jobs/<uuid>/`). |
| 222 | + |
| 223 | +--- |
| 224 | + |
| 225 | +## Design Decisions |
| 226 | + |
| 227 | +| # | Question | Decision | |
| 228 | +|---|----------|----------| |
| 229 | +| D1 | Can clients participate in multiple studies? | **Yes.** Listed under multiple studies in `project.yml`. Data isolation via launcher (K8s PVs / Docker mounts). | |
| 230 | +| D2 | New roles needed? | **No.** Existing `project_admin` / `org_admin` / `lead` / `member` reused per-study. | |
| 231 | +| D3 | Study lifecycle management? | **Deferred.** Studies defined at provisioning time in `project.yml`. | |
| 232 | +| D4 | Per-study quotas? | **Deferred.** Rely on K8s-level resource controls. | |
| 233 | +| D5 | How do launchers know which volume to mount? | Job metadata carries the study; the launcher resolves the volume from that. | |
| 234 | +| D6 | What does the participant `role` mean when `studies:` exists? | It is baked into the cert and serves as the effective role for the `default` study. Per-study mappings override it for non-default studies. | |
0 commit comments