Skip to content

Commit ec65ab5

Browse files
committed
feat: Implement multi-tenant search capabilities, including tenant-aware indexing, querying, and deletion.
1 parent 2c16c64 commit ec65ab5

File tree

26 files changed

+1909
-29
lines changed

26 files changed

+1909
-29
lines changed

CLAUDE.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
> **Context Optimization**: This file is structured for efficient agent usage. The "Agent Routing" section defines what context each agent needs. When spawning subagents, pass only relevant sections—not the entire file. Sections marked `<!-- reference -->` are lookup tables; don't include them in agent prompts unless specifically needed.
6+
7+
8+
## Agent Routing
9+
10+
**MANDATORY: All implementation work MUST be performed by subagents.** Never directly edit code, configuration, or documentation in the parent conversation. Instead, always delegate to the appropriate specialized agent from the table below. The parent conversation should only coordinate agents, pass context between them, and communicate results to the user.
11+
12+
Do NOT ask the user which agent to use - pick the appropriate one based on what files or features are being modified.
13+
14+
| Task Type | Agent | When to Use |
15+
|-----------|-------|-------------|
16+
| UI/Frontend | `datum-platform:frontend-dev` | React, TypeScript, CSS, anything in `ui/` directory |
17+
| Go Backend | `datum-platform:api-dev` | Go code in `cmd/`, `internal/`, `pkg/` directories |
18+
| Infrastructure | `datum-platform:sre` | Kustomize, Dockerfile, CI/CD, `config/` directory, `.infra/` for deployment |
19+
| Tests | `datum-platform:test-engineer` | Writing or fixing Go tests |
20+
| Code Review | `datum-platform:code-reviewer` | After implementation, before committing |
21+
| Documentation | `datum-platform:tech-writer` | README, docs/, guides, API documentation |
22+
| Architecture | `Plan` | Designing new features or significant refactors |
23+
| Exploration | `Explore` | Understanding codebase structure or finding code |
24+
25+
**Key principles:**
26+
- **Always use subagents** — never write code, edit files, or run build/test commands directly in the parent conversation
27+
- Use agents proactively without being asked
28+
- For multi-step tasks, use the appropriate agent for each step (launch independent agents in parallel when possible)
29+
- After making code changes, always use `code-reviewer` to validate
30+
- For UI changes, run `npm run build` and `npm run test:e2e` to verify
31+
- **Always test infrastructure changes in a test environment before opening a PR** - Deploy to the test-infra KIND cluster (`task test-infra:cluster-up`) and verify resources work correctly before pushing changes to staging/production repos
32+
- **Use Telepresence for debugging staging issues** - When investigating bugs that only reproduce in staging, intercept the service and run it locally with `task test-infra:telepresence:intercept SERVICE=<name>`. See "Remote Debugging with Telepresence" section.
33+
34+
### Agent Context Requirements
35+
36+
Each agent only needs specific context. When spawning agents, pass minimal relevant info in prompts—don't repeat the entire CLAUDE.md:
37+
38+
| Agent | Required Context | Skip (don't include in prompt) |
39+
|-------|-----------------|--------------------------------|
40+
| `frontend-dev` | UI commands, file paths in `ui/` | Go architecture, ClickHouse, NATS, data pipeline |
41+
| `api-dev` | Go patterns, API resource types, key directories | UI commands, dev environment setup, migrations |
42+
| `sre` | Config structure, build commands, deployment | Code architecture details, CEL patterns |
43+
| `test-engineer` | Test commands, package being tested | Full architecture, deployment, UI |
44+
| `Explore` | Key directories, architecture overview | Build commands, dev setup, deployment |
45+
| `code-reviewer` | Architecture, multi-tenancy model, conventions | Dev environment, build commands |
46+
| `tech-writer` | API resources, architecture overview | Implementation details, build commands |
47+
48+
### Agent Output Guidelines
49+
50+
Agents should return **concise summaries** to minimize context bloat in the parent conversation:
51+
52+
| Agent | Return | Don't Return |
53+
|-------|--------|--------------|
54+
| `Explore` | File paths + 1-line descriptions | Full file contents, extensive code quotes |
55+
| `api-dev` | What was changed + file paths | Full diffs, unchanged code |
56+
| `frontend-dev` | Components modified + any build errors | Full file contents |
57+
| `code-reviewer` | Numbered findings list with file:line refs | Full code blocks for context |
58+
| `test-engineer` | Pass/fail summary + failure messages only | Full test output, passing test details |
59+
| `sre` | Changed manifests + deployment notes | Full YAML contents |
60+
61+
### Multi-Step Task Decomposition
62+
63+
For complex tasks, decompose to minimize per-agent context:
64+
65+
1. **Explore first** (use `model: "haiku"`): Find relevant files → return only paths
66+
2. **Plan if needed**: Design approach → return bullet points only
67+
3. **Implement** (sonnet): Work on specific files identified in step 1
68+
4. **Review**: Check only the changed files
69+
70+
**Critical**: Pass only what's needed between steps. Don't re-explore what's already known.

cmd/search/indexer/command.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,9 @@ type ResourceIndexerOptions struct {
4646
BatchSize int
4747
FlushInterval time.Duration
4848
BatchMaxConcurrentUploads int
49+
50+
// Multi-tenancy settings.
51+
MultiTenant bool
4952
}
5053

5154
// NewResourceIndexerOptions creates a new ResourceIndexerOptions with default values.
@@ -65,6 +68,7 @@ func NewResourceIndexerOptions() *ResourceIndexerOptions {
6568
MeilisearchMaxRetries: 3,
6669
MeilisearchRetryDelay: 500 * time.Millisecond,
6770
BatchMaxConcurrentUploads: 100,
71+
MultiTenant: false,
6872
}
6973
}
7074

@@ -89,6 +93,9 @@ func (o *ResourceIndexerOptions) AddFlags(fs *pflag.FlagSet) {
8993
fs.IntVar(&o.MeilisearchMaxRetries, "meilisearch-max-retries", o.MeilisearchMaxRetries, "The maximum number of retries for transient Meilisearch errors.")
9094
fs.DurationVar(&o.MeilisearchRetryDelay, "meilisearch-retry-delay", o.MeilisearchRetryDelay, "The base delay between Meilisearch retries.")
9195
fs.IntVar(&o.BatchMaxConcurrentUploads, "batch-max-concurrent-uploads", o.BatchMaxConcurrentUploads, "The maximum number of concurrent uploads to Meilisearch.")
96+
97+
// Multi-tenancy
98+
fs.BoolVar(&o.MultiTenant, "multi-tenant", o.MultiTenant, "Enable multi-tenant mode to index resources from all project control planes.")
9299
}
93100

94101
// Validate checks if the resource indexer options are valid.

cmd/search/manager/command.go

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ package manager
33
import (
44
"context"
55
"crypto/tls"
6+
"errors"
67
"fmt"
78
"os"
89
"time"
@@ -12,6 +13,7 @@ import (
1213
"github.com/spf13/cobra"
1314
"github.com/spf13/pflag"
1415
"go.miloapis.net/search/internal/indexer"
16+
"go.miloapis.net/search/internal/tenant"
1517
"go.miloapis.net/search/pkg/apis/search/install"
1618
"k8s.io/apimachinery/pkg/runtime"
1719
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
@@ -59,6 +61,10 @@ type ControllerManagerOptions struct {
5961
NatsTLSCA string
6062
NatsTLSCert string
6163
NatsTLSKey string
64+
65+
// Multi-tenancy settings.
66+
MultiTenant bool
67+
ProjectLabelSelector string
6268
}
6369

6470
// NewControllerManagerOptions creates a new ControllerManagerOptions with default values
@@ -77,6 +83,7 @@ func NewControllerManagerOptions() *ControllerManagerOptions {
7783
MeilisearchDomain: "http://meilisearch.meilisearch-system.svc.cluster.local:7700",
7884
NatsURL: "nats://nats.nats-system.svc.cluster.local:4222",
7985
NatsReindexSubject: "reindex.all",
86+
MultiTenant: false,
8087
}
8188
}
8289

@@ -107,6 +114,10 @@ func (o *ControllerManagerOptions) AddFlags(fs *pflag.FlagSet) {
107114
fs.StringVar(&o.NatsTLSCA, "nats-tls-ca", o.NatsTLSCA, "The path to the NATS TLS CA file.")
108115
fs.StringVar(&o.NatsTLSCert, "nats-tls-cert", o.NatsTLSCert, "The path to the NATS TLS certificate file.")
109116
fs.StringVar(&o.NatsTLSKey, "nats-tls-key", o.NatsTLSKey, "The path to the NATS TLS key file.")
117+
118+
// Multi-tenancy
119+
fs.BoolVar(&o.MultiTenant, "multi-tenant", o.MultiTenant, "Enable multi-tenant mode to index resources from all project control planes.")
120+
fs.StringVar(&o.ProjectLabelSelector, "project-label-selector", o.ProjectLabelSelector, "Label selector to filter which projects are indexed (empty = all projects).")
110121
}
111122

112123
// Validate validates the options
@@ -243,6 +254,46 @@ func Run(o *ControllerManagerOptions, ctx context.Context) error {
243254

244255
reindexPub := indexer.NewReindexPublisher(js, o.NatsReindexSubject)
245256

257+
// Build TenantRegistry based on deployment mode.
258+
var registry tenant.TenantRegistry
259+
if o.MultiTenant {
260+
// Create a PolicyCache backed by the manager's shared informer cache.
261+
// requireReadyCondition=true ensures the ProjectWatcher only bootstraps
262+
// policies that are fully initialised (index created, attributes synced).
263+
policyCache, err := indexer.NewPolicyCache(mgr.GetCache(), true)
264+
if err != nil {
265+
setupLog.Error(err, "unable to create policy cache")
266+
os.Exit(1)
267+
}
268+
if err := policyCache.RegisterHandlers(ctx); err != nil {
269+
setupLog.Error(err, "unable to register policy cache handlers")
270+
os.Exit(1)
271+
}
272+
273+
// ProjectWatcher handles tenant lifecycle: on engagement it bootstraps
274+
// ready policies for the new project; on disengagement it purges all
275+
// tenant documents from each index.
276+
// bootstrapFunc is nil for MVP — the policy controller will re-trigger
277+
// indexing on its next reconcile, making bootstrap best-effort.
278+
projectWatcher := tenant.NewProjectWatcher(policyCache, searchSDK, nil)
279+
280+
multiRegistry := tenant.NewMultiTenantRegistry(
281+
cfg,
282+
dynamicClient,
283+
o.ProjectLabelSelector,
284+
projectWatcher.OnTenantEngaged,
285+
projectWatcher.OnTenantDisengaged,
286+
)
287+
go func() {
288+
if err := multiRegistry.Run(ctx); err != nil && !errors.Is(err, context.Canceled) {
289+
setupLog.Error(err, "MultiTenantRegistry stopped unexpectedly")
290+
}
291+
}()
292+
registry = multiRegistry
293+
} else {
294+
registry = tenant.NewSingleTenantRegistry(dynamicClient)
295+
}
296+
246297
if err = (&policycontroller.ResourceIndexPolicyReconciler{
247298
Client: mgr.GetClient(),
248299
Scheme: mgr.GetScheme(),
@@ -251,6 +302,7 @@ func Run(o *ControllerManagerOptions, ctx context.Context) error {
251302
DynamicClient: dynamicClient,
252303
RESTMapper: mgr.GetRESTMapper(),
253304
ReindexPublisher: reindexPub,
305+
TenantRegistry: registry,
254306
}).SetupWithManager(mgr); err != nil {
255307
setupLog.Error(err, "unable to create controller", "controller", "ResourceIndexPolicy")
256308
os.Exit(1)

config/base/controller-manager/deployment.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ spec:
4848
- --nats-tls-cert=$(NATS_TLS_CERT)
4949
- --nats-tls-key=$(NATS_TLS_KEY)
5050
- --leader-elect-resource-namespace=$(LEADER_ELECT_RESOURCE_NAMESPACE)
51+
- --multi-tenant=$(MULTI_TENANT)
52+
- --project-label-selector=$(PROJECT_LABEL_SELECTOR)
5153
env:
5254
- name: POD_NAMESPACE
5355
valueFrom:
@@ -77,6 +79,10 @@ spec:
7779
value: ""
7880
- name: LEADER_ELECT_RESOURCE_NAMESPACE
7981
value: ""
82+
- name: MULTI_TENANT
83+
value: "false"
84+
- name: PROJECT_LABEL_SELECTOR
85+
value: ""
8086
- name: MEILISEARCH_API_KEY
8187
valueFrom:
8288
secretKeyRef:

config/overlays/controller-manager/core-control-plane/patches/deployment.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,10 @@ spec:
77
spec:
88
serviceAccountName: search-controller-manager
99
automountServiceAccountToken: true
10+
containers:
11+
- name: manager
12+
env:
13+
- name: MULTI_TENANT
14+
value: "true"
15+
- name: PROJECT_LABEL_SELECTOR
16+
value: ""

config/overlays/controller-manager/core-control-plane/rbac/role.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,14 @@ rules:
1212
- get
1313
- list
1414
- watch
15+
- apiGroups:
16+
- resourcemanager.miloapis.com
17+
resources:
18+
- projects
19+
verbs:
20+
- get
21+
- list
22+
- watch
1523
- apiGroups:
1624
- search.miloapis.com
1725
resources:

0 commit comments

Comments
 (0)