datum-cloud
diff --git a/‎CLAUDE.md‎
Lines changed: 70 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 70 additions & 0 deletions
diff --git a/‎cmd/search/indexer/command.go‎
Lines changed: 8 additions & 1 deletion b/‎cmd/search/indexer/command.go‎
Lines changed: 8 additions & 1 deletion
diff --git a/‎cmd/search/manager/command.go‎
Lines changed: 49 additions & 0 deletions b/‎cmd/search/manager/command.go‎
Lines changed: 49 additions & 0 deletions
diff --git a/‎config/base/controller-manager/deployment.yaml‎
Lines changed: 6 additions & 0 deletions b/‎config/base/controller-manager/deployment.yaml‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎config/base/resource-indexer/deployment.yaml‎
Lines changed: 3 additions & 0 deletions b/‎config/base/resource-indexer/deployment.yaml‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎config/overlays/controller-manager/core-control-plane/patches/deployment.yaml‎
Lines changed: 7 additions & 0 deletions b/‎config/overlays/controller-manager/core-control-plane/patches/deployment.yaml‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎config/overlays/controller-manager/core-control-plane/rbac/role.yaml‎
Lines changed: 8 additions & 0 deletions b/‎config/overlays/controller-manager/core-control-plane/rbac/role.yaml‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎config/overlays/resource-indexer/core-control-plane/patches/deployment.yaml‎
Lines changed: 5 additions & 0 deletions b/‎config/overlays/resource-indexer/core-control-plane/patches/deployment.yaml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎config/samples/policy_v1alpha1_dnszone_index_policy.yaml‎
Lines changed: 23 additions & 0 deletions b/‎config/samples/policy_v1alpha1_dnszone_index_policy.yaml‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎config/samples/policy_v1alpha1_domain_index_policy.yaml‎
Lines changed: 29 additions & 0 deletions b/‎config/samples/policy_v1alpha1_domain_index_policy.yaml‎
Lines changed: 29 additions & 0 deletions
@@ -0,0 +1,70 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+> **Context Optimization**: This file is structured for efficient agent usage. The "Agent Routing" section defines what context each agent needs. When spawning subagents, pass only relevant sections—not the entire file. Sections marked `<!-- reference -->` are lookup tables; don't include them in agent prompts unless specifically needed.
+
+
+## Agent Routing
+
+**MANDATORY: All implementation work MUST be performed by subagents.** Never directly edit code, configuration, or documentation in the parent conversation. Instead, always delegate to the appropriate specialized agent from the table below. The parent conversation should only coordinate agents, pass context between them, and communicate results to the user.
+
+Do NOT ask the user which agent to use - pick the appropriate one based on what files or features are being modified.
+
+| Task Type | Agent | When to Use |
+|-----------|-------|-------------|
+| UI/Frontend | `datum-platform:frontend-dev` | React, TypeScript, CSS, anything in `ui/` directory |
+| Go Backend | `datum-platform:api-dev` | Go code in `cmd/`, `internal/`, `pkg/` directories |
+| Infrastructure | `datum-platform:sre` | Kustomize, Dockerfile, CI/CD, `config/` directory, `.infra/` for deployment |
+| Tests | `datum-platform:test-engineer` | Writing or fixing Go tests |
+| Code Review | `datum-platform:code-reviewer` | After implementation, before committing |
+| Documentation | `datum-platform:tech-writer` | README, docs/, guides, API documentation |
+| Architecture | `Plan` | Designing new features or significant refactors |
+| Exploration | `Explore` | Understanding codebase structure or finding code |
+
+**Key principles:**
+- **Always use subagents** — never write code, edit files, or run build/test commands directly in the parent conversation
+- Use agents proactively without being asked
+- For multi-step tasks, use the appropriate agent for each step (launch independent agents in parallel when possible)
+- After making code changes, always use `code-reviewer` to validate
+- For UI changes, run `npm run build` and `npm run test:e2e` to verify
+- **Always test infrastructure changes in a test environment before opening a PR** - Deploy to the test-infra KIND cluster (`task test-infra:cluster-up`) and verify resources work correctly before pushing changes to staging/production repos
+- **Use Telepresence for debugging staging issues** - When investigating bugs that only reproduce in staging, intercept the service and run it locally with `task test-infra:telepresence:intercept SERVICE=<name>`. See "Remote Debugging with Telepresence" section.
+
+### Agent Context Requirements
+
+Each agent only needs specific context. When spawning agents, pass minimal relevant info in prompts—don't repeat the entire CLAUDE.md:
+
+| Agent | Required Context | Skip (don't include in prompt) |
+|-------|-----------------|--------------------------------|
+| `frontend-dev` | UI commands, file paths in `ui/` | Go architecture, ClickHouse, NATS, data pipeline |
+| `api-dev` | Go patterns, API resource types, key directories | UI commands, dev environment setup, migrations |
+| `sre` | Config structure, build commands, deployment | Code architecture details, CEL patterns |
+| `test-engineer` | Test commands, package being tested | Full architecture, deployment, UI |
+| `Explore` | Key directories, architecture overview | Build commands, dev setup, deployment |
+| `code-reviewer` | Architecture, multi-tenancy model, conventions | Dev environment, build commands |
+| `tech-writer` | API resources, architecture overview | Implementation details, build commands |
+
+### Agent Output Guidelines
+
+Agents should return **concise summaries** to minimize context bloat in the parent conversation:
+
+| Agent | Return | Don't Return |
+|-------|--------|--------------|
+| `Explore` | File paths + 1-line descriptions | Full file contents, extensive code quotes |
+| `api-dev` | What was changed + file paths | Full diffs, unchanged code |
+| `frontend-dev` | Components modified + any build errors | Full file contents |
+| `code-reviewer` | Numbered findings list with file:line refs | Full code blocks for context |
+| `test-engineer` | Pass/fail summary + failure messages only | Full test output, passing test details |
+| `sre` | Changed manifests + deployment notes | Full YAML contents |
+
+### Multi-Step Task Decomposition
+
+For complex tasks, decompose to minimize per-agent context:
+
+1. **Explore first** (use `model: "haiku"`): Find relevant files → return only paths
+2. **Plan if needed**: Design approach → return bullet points only
+3. **Implement** (sonnet): Work on specific files identified in step 1
+4. **Review**: Check only the changed files
+
+**Critical**: Pass only what's needed between steps. Don't re-explore what's already known.
@@ -46,6 +46,9 @@ type ResourceIndexerOptions struct {
 	BatchSize                 int
 	FlushInterval             time.Duration
 	BatchMaxConcurrentUploads int
+
+	// Multi-tenancy settings.
+	EnableMultiTenancy bool
 }
 
 // NewResourceIndexerOptions creates a new ResourceIndexerOptions with default values.
@@ -65,6 +68,7 @@ func NewResourceIndexerOptions() *ResourceIndexerOptions {
 		MeilisearchMaxRetries:      3,
 		MeilisearchRetryDelay:      500 * time.Millisecond,
 		BatchMaxConcurrentUploads:  100,
+		EnableMultiTenancy:         false,
 	}
 }
 
@@ -89,6 +93,9 @@ func (o *ResourceIndexerOptions) AddFlags(fs *pflag.FlagSet) {
 	fs.IntVar(&o.MeilisearchMaxRetries, "meilisearch-max-retries", o.MeilisearchMaxRetries, "The maximum number of retries for transient Meilisearch errors.")
 	fs.DurationVar(&o.MeilisearchRetryDelay, "meilisearch-retry-delay", o.MeilisearchRetryDelay, "The base delay between Meilisearch retries.")
 	fs.IntVar(&o.BatchMaxConcurrentUploads, "batch-max-concurrent-uploads", o.BatchMaxConcurrentUploads, "The maximum number of concurrent uploads to Meilisearch.")
+
+	// Multi-tenancy
+	fs.BoolVar(&o.EnableMultiTenancy, "enable-multi-tenancy", o.EnableMultiTenancy, "Enable multi-tenant mode to index resources from all project control planes.")
 }
 
 // Validate checks if the resource indexer options are valid.
@@ -295,7 +302,7 @@ func Run(o *ResourceIndexerOptions, ctx context.Context) error {
 	auditBatcher.Start(ctx)
 	reindexBatcher.Start(ctx)
 
-	auditIdx := indexer.NewIndexer(auditConsumer, indexPolicyCache, auditBatcher)
+	auditIdx := indexer.NewIndexer(auditConsumer, indexPolicyCache, auditBatcher, o.EnableMultiTenancy)
 	reindexIdx := indexer.NewReindexConsumer(reindexJSConsumer, reindexPolicyCache, reindexBatcher)
 
 	klog.Info("Starting audit indexer and re-index consumer...")
 
@@ -3,6 +3,7 @@ package manager
 import (
 	"context"
 	"crypto/tls"
+	"errors"
 	"fmt"
 	"os"
 	"time"
@@ -12,6 +13,7 @@ import (
 	"github.com/spf13/cobra"
 	"github.com/spf13/pflag"
 	"go.miloapis.net/search/internal/indexer"
+	"go.miloapis.net/search/internal/tenant"
 	"go.miloapis.net/search/pkg/apis/search/install"
 	"k8s.io/apimachinery/pkg/runtime"
 	utilruntime "k8s.io/apimachinery/pkg/util/runtime"
@@ -59,6 +61,10 @@ type ControllerManagerOptions struct {
 	NatsTLSCA          string
 	NatsTLSCert        string
 	NatsTLSKey         string
+
+	// Multi-tenancy settings.
+	EnableMultiTenancy   bool
+	ProjectLabelSelector string
 }
 
 // NewControllerManagerOptions creates a new ControllerManagerOptions with default values
@@ -77,6 +83,7 @@ func NewControllerManagerOptions() *ControllerManagerOptions {
 		MeilisearchDomain:          "http://meilisearch.meilisearch-system.svc.cluster.local:7700",
 		NatsURL:                    "nats://nats.nats-system.svc.cluster.local:4222",
 		NatsReindexSubject:         "reindex.all",
+		EnableMultiTenancy:         false,
 	}
 }
 
@@ -107,6 +114,10 @@ func (o *ControllerManagerOptions) AddFlags(fs *pflag.FlagSet) {
 	fs.StringVar(&o.NatsTLSCA, "nats-tls-ca", o.NatsTLSCA, "The path to the NATS TLS CA file.")
 	fs.StringVar(&o.NatsTLSCert, "nats-tls-cert", o.NatsTLSCert, "The path to the NATS TLS certificate file.")
 	fs.StringVar(&o.NatsTLSKey, "nats-tls-key", o.NatsTLSKey, "The path to the NATS TLS key file.")
+
+	// Multi-tenancy
+	fs.BoolVar(&o.EnableMultiTenancy, "enable-multi-tenancy", o.EnableMultiTenancy, "Enable multi-tenant mode to index resources from all project control planes.")
+	fs.StringVar(&o.ProjectLabelSelector, "project-label-selector", o.ProjectLabelSelector, "Label selector to filter which projects are indexed (empty = all projects).")
 }
 
 // Validate validates the options
@@ -243,6 +254,43 @@ func Run(o *ControllerManagerOptions, ctx context.Context) error {
 
 	reindexPub := indexer.NewReindexPublisher(js, o.NatsReindexSubject)
 
+	// Build TenantRegistry based on deployment mode.
+	var registry tenant.TenantRegistry
+	if o.EnableMultiTenancy {
+		// Create a PolicyCache backed by the manager's shared informer cache.
+		// requireReadyCondition=true ensures only fully-initialized policies
+		// (index created, attributes synced) are included in the cache.
+		policyCache, err := indexer.NewPolicyCache(mgr.GetCache(), true)
+		if err != nil {
+			setupLog.Error(err, "unable to create policy cache")
+			os.Exit(1)
+		}
+		if err := policyCache.RegisterHandlers(ctx); err != nil {
+			setupLog.Error(err, "unable to register policy cache handlers")
+			os.Exit(1)
+		}
+
+		// ProjectWatcher handles tenant lifecycle: on disengagement it purges all
+		// tenant documents from each index.
+		projectWatcher := tenant.NewProjectWatcher(policyCache, searchSDK)
+
+		multiRegistry := tenant.NewMultiTenantRegistry(
+			cfg,
+			dynamicClient,
+			o.ProjectLabelSelector,
+			projectWatcher.OnTenantEngaged,
+			projectWatcher.OnTenantDisengaged,
+		)
+		go func() {
+			if err := multiRegistry.Run(ctx); err != nil && !errors.Is(err, context.Canceled) {
+				setupLog.Error(err, "MultiTenantRegistry stopped unexpectedly")
+			}
+		}()
+		registry = multiRegistry
+	} else {
+		registry = tenant.NewSingleTenantRegistry(dynamicClient)
+	}
+
 	if err = (&policycontroller.ResourceIndexPolicyReconciler{
 		Client:           mgr.GetClient(),
 		Scheme:           mgr.GetScheme(),
@@ -251,6 +299,7 @@ func Run(o *ControllerManagerOptions, ctx context.Context) error {
 		DynamicClient:    dynamicClient,
 		RESTMapper:       mgr.GetRESTMapper(),
 		ReindexPublisher: reindexPub,
+		TenantRegistry:   registry,
 	}).SetupWithManager(mgr); err != nil {
 		setupLog.Error(err, "unable to create controller", "controller", "ResourceIndexPolicy")
 		os.Exit(1)
 
@@ -48,6 +48,8 @@ spec:
         - --nats-tls-cert=$(NATS_TLS_CERT)
         - --nats-tls-key=$(NATS_TLS_KEY)
         - --leader-elect-resource-namespace=$(LEADER_ELECT_RESOURCE_NAMESPACE)
+        - --enable-multi-tenancy=$(ENABLE_MULTI_TENANCY)
+        - --project-label-selector=$(PROJECT_LABEL_SELECTOR)
         env:
         - name: POD_NAMESPACE
           valueFrom:
@@ -77,6 +79,10 @@ spec:
           value: ""
         - name: LEADER_ELECT_RESOURCE_NAMESPACE
           value: ""
+        - name: ENABLE_MULTI_TENANCY
+          value: "false"
+        - name: PROJECT_LABEL_SELECTOR
+          value: ""
         - name: MEILISEARCH_API_KEY
           valueFrom:
             secretKeyRef:
 
@@ -28,6 +28,7 @@ spec:
         - --nats-tls-cert=$(NATS_TLS_CERT)
         - --nats-tls-key=$(NATS_TLS_KEY)
         - --meilisearch-domain=$(MEILISEARCH_DOMAIN)
+        - --enable-multi-tenancy=$(ENABLE_MULTI_TENANCY)
         env:
         - name: NATS_URL
           value: "nats://nats.nats-system.svc.cluster.local:4222"
@@ -47,6 +48,8 @@ spec:
           value: "AUDIT_EVENTS"
         - name: MEILISEARCH_DOMAIN
           value: "http://meilisearch.meilisearch-system.svc.cluster.local:7700"
+        - name: ENABLE_MULTI_TENANCY
+          value: "false"
         - name: MEILISEARCH_API_KEY
           valueFrom:
             secretKeyRef:
 
@@ -7,3 +7,10 @@ spec:
     spec:
       serviceAccountName: search-controller-manager
       automountServiceAccountToken: true
+      containers:
+      - name: manager
+        env:
+        - name: ENABLE_MULTI_TENANCY
+          value: "true"
+        - name: PROJECT_LABEL_SELECTOR
+          value: ""
@@ -12,6 +12,14 @@ rules:
   - get
   - list
   - watch
+- apiGroups:
+  - resourcemanager.miloapis.com
+  resources:
+  - projects
+  verbs:
+  - get
+  - list
+  - watch
 - apiGroups:
   - search.miloapis.com
   resources:
 
@@ -7,3 +7,8 @@ spec:
     spec:
       serviceAccountName: resource-indexer
       automountServiceAccountToken: true
+      containers:
+      - name: indexer
+        env:
+        - name: ENABLE_MULTI_TENANCY
+          value: "true"
@@ -0,0 +1,23 @@
+apiVersion: search.miloapis.com/v1alpha1
+kind: ResourceIndexPolicy
+metadata:
+  name: dnszone-index-policy
+spec:
+  targetResource:
+    group: dns.networking.miloapis.com
+    version: v1alpha1
+    kind: DNSZone
+
+  conditions:
+    - name: has-name
+      expression: "metadata.name != ''"
+
+  fields:
+    - path: ".metadata.name"
+      searchable: true
+    - path: ".metadata.namespace"
+      searchable: true
+    - path: ".spec.domainName"
+      searchable: true
+    - path: ".spec.dnsZoneClassName"
+      searchable: true
@@ -0,0 +1,29 @@
+apiVersion: search.miloapis.com/v1alpha1
+kind: ResourceIndexPolicy
+metadata:
+  name: domain-index-policy
+spec:
+  targetResource:
+    group: networking.datumapis.com
+    version: v1alpha
+    kind: Domain
+
+  conditions:
+    - name: has-name
+      expression: "metadata.name != ''"
+
+  fields:
+    - path: ".metadata.name"
+      searchable: true
+    - path: ".metadata.namespace"
+      searchable: true
+    - path: ".spec.domainName"
+      searchable: true
+    - path: ".status.apex"
+      searchable: true
+    - path: ".status.nameservers[0].hostname"
+      searchable: true
+    - path: ".status.registration.registrar.name"
+      searchable: true
+    - path: ".status.registration.registry.name"
+      searchable: true