NVIDIA
diff --git a/‎docs/design/multistudy_phase1.md‎
Lines changed: 117 additions & 0 deletions b/‎docs/design/multistudy_phase1.md‎
Lines changed: 117 additions & 0 deletions
diff --git a/‎docs/design/multistudy_phase2.md‎
Lines changed: 234 additions & 0 deletions b/‎docs/design/multistudy_phase2.md‎
Lines changed: 234 additions & 0 deletions
diff --git a/‎docs/user_guide/admin_guide/deployment/operation.rst‎
Lines changed: 3 additions & 0 deletions b/‎docs/user_guide/admin_guide/deployment/operation.rst‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎nvflare/apis/job_def.py‎
Lines changed: 11 additions & 0 deletions b/‎nvflare/apis/job_def.py‎
Lines changed: 11 additions & 0 deletions
@@ -0,0 +1,117 @@
+# Multi-Study Support — Phase 1: Study Plumbing
+
+## Introduction
+
+Flare currently operates as a single-tenant system. Every authorized admin can see and act on every job. There is no data segregation between different collaborations running on the same infrastructure.
+
+Phase 1 introduces a **study** concept as lightweight metadata plumbing. Every job carries a `study` name (defaulting to `"default"`). The study flows from user-facing APIs into job metadata and runtime launchers so that K8s deployments can mount study-specific workspace volumes immediately — without any access-control, provisioning, or job-store changes.
+
+See [multistudy_phase2.md](multistudy_phase2.md) for the full multi-tenancy design (access control, study registry, job-store partitioning, etc.).
+
+> **Note:** Multi-study is an operational enhancement for shared-trust environments.
+> It does not provide the same isolation guarantees as separate deployments.
+> See [When to Use Multi-Study vs. Separate Deployments](multistudy_phase2.md#when-to-use-multi-study-vs-separate-deployments)
+> in [multistudy_phase2.md](multistudy_phase2.md).
+
+### Design Principles
+
+1. **Backward compatible** — a `default` study preserves current single-tenant behavior; legacy jobs missing a `study` field are treated as `default` on read
+2. **Phased rollout** — Phase 1 delivers plumbing only; access-control enforcement is deferred to Phase 2
+3. **Minimal footprint** — no authorization, no provisioning, no job-store layout changes
+
+---
+
+## Scope
+
+1. `study: str = "default"` parameter on `ProdEnv`, `Session`, `new_session`, `new_secure_session`, `new_insecure_session`.
+2. `Session` carries the active study context; `list_jobs` inherits it and returns only jobs in that study.
+3. Study is passed through to job metadata at submission time, with syntax validation before persistence.
+4. Clone preserves the source job's study (not the session's study).
+5. `K8sJobLauncher` reads `study` from job metadata and selects the corresponding study workspace volume (TODO in code).
+6. `DockerJobLauncher` unchanged; TODO marker for future study-aware settings resolution.
+7. Admin console (`fl_admin.sh`) accepts `--study` at launch time; the study is established when the admin session logs in and then inherited by `submit_job` and `list_jobs`.
+8. No changes to authorization, job store paths, `project.yml` schema, scheduler, or provisioning.
+
+---
+
+## User Experience
+
+### Data Scientist (Recipe API)
+
+The recipe is unchanged. The study is specified via `ProdEnv`:
+
+```python
+env = ProdEnv(
+    startup_kit_location=args.startup_kit_location,
+    study="cancer-research",
+)
+run = recipe.execute(env)
+```
+
+If `study` is omitted, it defaults to `"default"`.
+
+### Admin (FLARE API)
+
+The `Session` gains a study context:
+
+```python
+sess = new_secure_session(
+    username="admin@org_a.com",
+    startup_kit_location="./startup",
+    study="cancer-research",
+)
+jobs = sess.list_jobs()        # only jobs in cancer-research
+sess.submit_job("./my_job")   # tagged to cancer-research
+```
+
+The study is session-scoped, not a per-command filter. For HCI/admin sessions, the study is sent during login, stored in the authenticated server session, and preserved in the session token so later commands inherit the same active study.
+
+### Admin Console
+
+```
+$ ./startup/fl_admin.sh --study cancer-research
+
+> list_jobs
+... only shows jobs in cancer-research ...
+```
+
+If `--study` is omitted, the admin terminal uses `default`.
+
+---
+
+## Data Model
+
+### Job Metadata
+
+`study` is a first-class field on every job (`JobMetaKey.STUDY`). Set at submission time from the session's active study. Immutable after creation.
+
+The study value is syntactically validated at the API layer (client-side) and again on the server before persistence. The regex pattern is `^[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?$` — lowercase alphanumeric with hyphens, 1–63 characters.
+
+### Session Transport
+
+For `flare_api.Session` and the admin terminal, `study` is not passed on each command. It is established at session creation/login time, stored on the server-side authenticated session, and carried in the signed session token so recreated sessions keep the same active study.
+
+### Legacy Jobs
+
+Jobs created before Phase 1 have no `study` field. `get_job_meta_study()` returns `"default"` for these jobs, so they appear in the `default` study transparently.
+
+### Default Study Constant
+
+`DEFAULT_STUDY = "default"` in `nvflare.apis.job_def` — single source of truth for the default value.
+
+---
+
+## What This Enables
+
+- Data scientists tag jobs with a study and get physical data isolation on K8s immediately (once K8s launcher TODO is implemented).
+- Admin/API sessions operate inside one active study context, so Phase 2 authz can validate study access when sessions enter a study.
+- Legacy single-tenant deployments are unaffected — everything defaults to `"default"`.
+
+## What This Does NOT Do
+
+- No access control — any user can submit to any valid study name
+- No job store partitioning (`jobs/<uuid>/` path unchanged)
+- No `project.yml` parsing or `StudyRegistry`
+- No Docker launcher behavior change yet
+- No `set_study` / `list_studies` admin commands
+- Subprocess launcher unchanged (single-tenant/trusted only)
@@ -0,0 +1,234 @@
+# Multi-Study Support — Phase 2: Per-Study Access Control
+
+## Prerequisites
+
+Phase 1 ([multistudy_phase1.md](multistudy_phase1.md)) must be complete: `study` metadata flows from user-facing APIs through to job metadata and launcher plumbing.
+
+## Introduction
+
+Phase 2 adds per-study access control on top of Phase 1's plumbing. The existing role model (`project_admin`, `org_admin`, `lead`, `member`) is unchanged — the only difference is that roles become per-study instead of global. No new roles are introduced.
+
+---
+
+## Core Idea
+
+Today, a user's role is global (baked into the X.509 cert). Phase 2 makes the same role **per-study**: a user can be `lead` in one study and `member` in another.
+
+The existing authorization rules (`authorization.json`) stay exactly as they are. The only new layer is a **study filter**: before evaluating existing RBAC, the server checks whether the resource belongs to the user's active study. If not, the resource is invisible.
+
+---
+
+## How Authorization Works Today
+
+### `authorization.json`
+
+Each deployment ships an `authorization.json` that maps roles to permissions. The default policy:
+
+```json
+{
+  "format_version": "1.0",
+  "permissions": {
+    "project_admin": "any",
+    "org_admin": {
+      "submit_job": "none",
+      "clone_job": "none",
+      "manage_job": "o:submitter",
+      "download_job": "o:submitter",
+      "view": "any",
+      "operate": "o:site",
+      "shell_commands": "o:site",
+      "byoc": "none"
+    },
+    "lead": {
+      "submit_job": "any",
+      "clone_job": "n:submitter",
+      "manage_job": "n:submitter",
+      "download_job": "n:submitter",
+      "view": "any",
+      "operate": "o:site",
+      "shell_commands": "o:site",
+      "byoc": "any"
+    },
+    "member": {
+      "view": "any"
+    }
+  }
+}
+```
+
+Permission conditions: `"any"` = unrestricted, `"none"` = denied, `"n:submitter"` = only if user is the submitter, `"o:submitter"` = only if user is in the same org as the submitter, `"o:site"` = only if user is in the same org as the target site.
+
+### Authorization Flow
+
+1. A `Person(name, org, role)` is constructed from the user's X.509 cert
+2. An `AuthzContext(right, user, submitter)` wraps the request
+3. The `Authorizer` evaluates the policy: look up `permissions[person.role][right]` and check the condition against the context
+
+**Phase 2 changes only step 1**: the `role` used to construct `Person` can come from the per-study mapping instead of the cert. Steps 2 and 3 are untouched. `authorization.json` is unchanged.
+
+---
+
+## Role Resolution
+
+No new roles. The existing four roles are reused:
+
+| Role | Capabilities (per `authorization.json`, unchanged) |
+|------|-----|
+| `project_admin` | All operations (`"any"`) |
+| `org_admin` | Manage/download own-org jobs, view all, operate own-org sites |
+| `lead` | Submit/manage/download own jobs, view all, operate own-org sites |
+| `member` | View only |
+
+**Resolution order:**
+1. If `project.yml` has a `studies:` section AND the user has a mapping for the active study → use that role
+2. Else if active study is `default` → fall back to cert-embedded role (legacy compatibility)
+3. Otherwise → deny
+
+### What the participant `role` means
+
+The `role` field on admin participants in `project.yml` serves two purposes:
+- It is baked into the X.509 cert at provisioning time (identity + authentication)
+- It is the **effective role for the `default` study** and for deployments without a `studies:` section
+
+When a `studies:` section is present, the per-study role overrides the cert role for that study. The cert role still applies to the `default` study as a fallback.
+
+Example: a user with `role: lead` in their participant entry and `member` in the `cancer-research` study mapping is `lead` when operating in the `default` study but `member` when operating in `cancer-research`.
+
+---
+
+## Provisioning: `project.yml`
+
+Minimal addition: a `studies:` section that maps study names to enrolled sites and per-user role overrides. Everything else stays as-is.
+
+```yaml
+# Existing sections unchanged
+participants:
+  - name: server1.example.com
+    type: server
+    org: nvidia
+  - name: hospital-a
+    type: client
+    org: org_a
+  - name: hospital-b
+    type: client
+    org: org_b
+  - name: admin@nvidia.com
+    type: admin
+    org: nvidia
+    role: project_admin    # cert role; effective role for "default" study
+  - name: trainer@org_a.com
+    type: admin
+    org: org_a
+    role: lead             # cert role; effective role for "default" study
+
+# New section — per-study role overrides
+studies:
+  cancer-research:
+    sites: [hospital-a, hospital-b]
+    admins:
+      trainer@org_a.com: lead        # same as cert role here, but explicit
+
+  multiple-sclerosis:
+    sites: [hospital-a]
+    admins:
+      trainer@org_a.com: member      # overrides cert role for this study
+```
+
+- If `studies:` is absent, the system behaves exactly as today (single-tenant, cert roles only).
+- Sites listed under a study must reference existing client-type participants.
+- Admins listed under a study must reference existing admin-type participants.
+- A user not listed under a study has no access to that study (except `default`, which falls back to cert role).
+
+---
+
+## Authorization Enforcement
+
+Two layers, evaluated in order for every command:
+
+1. **Study filter** (new): Does the target resource (job, client) belong to the user's active study? If no → invisible.
+2. **RBAC policy** (existing, unchanged): Construct `Person` with the resolved per-study role, evaluate `authorization.json` as today.
+
+The session's active study (set at session start via `--study` or the `study` API parameter) determines which study filter applies. The study is carried by the authenticated server session and preserved in the session token, so subsequent commands do not need to resend it.
+
+---
+
+## Job Scheduler
+
+When a `studies:` section is present in `project.yml`:
+
+1. **Site filtering**: Only schedule jobs to sites enrolled in the job's study
+2. **Validation**: `deploy_map` sites must be a subset of the study's enrolled sites
+
+No quota or priority changes.
+
+---
+
+## Runtime Isolation
+
+### Kubernetes
+
+The K8s launcher reads `study` from job metadata and resolves study-specific workspace volumes:
+- Workspace volume resolved by `(study, client)` tuple
+- Each job pod mounts only its study's data volume
+
+### Docker
+
+The Docker launcher reads `study` from job metadata and mounts the corresponding host directory (e.g., `/data/<study>/`) as the workspace volume.
+
+---
+
+## When to Use Multi-Study vs. Separate Deployments
+
+Multi-study is an operational convenience feature for shared-trust environments. If you need stronger isolation, use separate deployments on separate hardware or VMs.
+
+### Use multi-study when
+
+- One or more client sites participate in multiple studies and re-provisioning them for each study is operationally costly
+- All studies operate within the same operational trust boundary
+- You accept software-enforced isolation: study separation relies on correct authorization logic, session management, and launcher volume configuration
+
+### Use separate deployments when
+
+- You need stronger isolation than a shared multi-study deployment can provide
+- A problem in one study must not be able to affect another study
+- Studies have non-overlapping participants (no shared client sites), so separate deployments add little operational cost
+
+### Security Assumptions That Change
+
+With a single multi-study deployment, the following are shared across all studies:
+
+| Resource | Separate deployments | Multi-study |
+|----------|----------------------|-------------|
+| Server runtime | Separate per deployment | Shared |
+| Client runtime at each site | Separate per deployment | Shared |
+| PKI / cert root | Independent | Shared |
+| `project_admin` blast radius | One deployment | All studies in that deployment |
+| Isolation mechanism | Separate hardware/VM deployment boundary | Shared software authz and launcher configuration |
+
+A bug in authorization, session handling, or launcher volume configuration can affect multiple studies in the same deployment.
+
+### Summary
+
+Use multi-study to reduce operational overhead when client sites participate in multiple studies under a shared trust boundary. Use separate deployments when you need stronger isolation.
+
+---
+
+## Migration / Backward Compatibility
+
+1. **No `studies:` section** → system behaves as today, single-tenant, cert roles only.
+2. **`studies:` section present** → per-study role enforcement enabled; the `default` study falls back to cert roles for compatibility.
+3. **Legacy jobs** (no `study` field) → treated as `default` study (Phase 1 behavior, unchanged).
+4. **No data migration** — job store layout is unchanged (`jobs/<uuid>/`).
+
+---
+
+## Design Decisions
+
+| # | Question | Decision |
+|---|----------|----------|
+| D1 | Can clients participate in multiple studies? | **Yes.** Listed under multiple studies in `project.yml`. Data isolation via launcher (K8s PVs / Docker mounts). |
+| D2 | New roles needed? | **No.** Existing `project_admin` / `org_admin` / `lead` / `member` reused per-study. |
+| D3 | Study lifecycle management? | **Deferred.** Studies defined at provisioning time in `project.yml`. |
+| D4 | Per-study quotas? | **Deferred.** Rely on K8s-level resource controls. |
+| D5 | How do launchers know which volume to mount? | Job metadata carries the study; the launcher resolves the volume from that. |
+| D6 | What does the participant `role` mean when `studies:` exists? | It is baked into the cert and serves as the effective role for the `default` study. Per-study mappings override it for non-default studies. |
@@ -13,6 +13,9 @@ Admin command prompt
 After running ``fl_admin.sh``, log in by following the prompt and entering the name of the participant that the admin
 package was provisioned for (or for poc mode, "admin" as the name and password).
 
+To scope the terminal to a study, launch it with ``fl_admin.sh --study cancer-research``. If ``--study`` is omitted,
+the admin terminal uses the ``default`` study for that terminal session.
+
 Typing "help" or "?" will display a list of the commands and a brief description for each. Typing "? " before a command
 like "? check_status" or "?ls" will provide additional details for the usage of a command. Provided below is a list of
 commands shown as examples of how they may be run with a description.
 
@@ -21,6 +21,7 @@
 # this is treated as all online sites in job deploy_map
 ALL_SITES = "@ALL"
 SERVER_SITE_NAME = "server"
+DEFAULT_STUDY = "default"
 
 
 class RunStatus(str, Enum):
@@ -76,6 +77,7 @@ class JobMetaKey(str, Enum):
     CUSTOM_PROPS = "custom_props"
     EDGE_METHOD = "edge_method"
     JOB_CLIENTS = "job_clients"  # clients that participated the job
+    STUDY = "study"
 
     def __repr__(self):
         return self.value
@@ -214,6 +216,15 @@ def is_valid_job_id(jid: str) -> bool:
     return val.hex == jid.replace("-", "")
 
 
+def get_job_meta_study(meta: dict) -> str:
+    if not isinstance(meta, dict):
+        return DEFAULT_STUDY
+    study = meta.get(JobMetaKey.STUDY.value)
+    if isinstance(study, str) and study:
+        return study
+    return DEFAULT_STUDY
+
+
 def get_custom_prop(meta: dict, prop_key: str, default=None):
     props = meta.get(JobMetaKey.CUSTOM_PROPS)
     if not props: