Skip to content

Commit 8a6ccc2

Browse files
authored
Merge pull request #72 from datum-cloud/70-implement-multi-tenant-search-support
feat: implement multi-tenant search support
2 parents 0bb17e8 + 814dbeb commit 8a6ccc2

File tree

30 files changed

+1982
-43
lines changed

30 files changed

+1982
-43
lines changed

CLAUDE.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
> **Context Optimization**: This file is structured for efficient agent usage. The "Agent Routing" section defines what context each agent needs. When spawning subagents, pass only relevant sections—not the entire file. Sections marked `<!-- reference -->` are lookup tables; don't include them in agent prompts unless specifically needed.
6+
7+
8+
## Agent Routing
9+
10+
**MANDATORY: All implementation work MUST be performed by subagents.** Never directly edit code, configuration, or documentation in the parent conversation. Instead, always delegate to the appropriate specialized agent from the table below. The parent conversation should only coordinate agents, pass context between them, and communicate results to the user.
11+
12+
Do NOT ask the user which agent to use - pick the appropriate one based on what files or features are being modified.
13+
14+
| Task Type | Agent | When to Use |
15+
|-----------|-------|-------------|
16+
| UI/Frontend | `datum-platform:frontend-dev` | React, TypeScript, CSS, anything in `ui/` directory |
17+
| Go Backend | `datum-platform:api-dev` | Go code in `cmd/`, `internal/`, `pkg/` directories |
18+
| Infrastructure | `datum-platform:sre` | Kustomize, Dockerfile, CI/CD, `config/` directory, `.infra/` for deployment |
19+
| Tests | `datum-platform:test-engineer` | Writing or fixing Go tests |
20+
| Code Review | `datum-platform:code-reviewer` | After implementation, before committing |
21+
| Documentation | `datum-platform:tech-writer` | README, docs/, guides, API documentation |
22+
| Architecture | `Plan` | Designing new features or significant refactors |
23+
| Exploration | `Explore` | Understanding codebase structure or finding code |
24+
25+
**Key principles:**
26+
- **Always use subagents** — never write code, edit files, or run build/test commands directly in the parent conversation
27+
- Use agents proactively without being asked
28+
- For multi-step tasks, use the appropriate agent for each step (launch independent agents in parallel when possible)
29+
- After making code changes, always use `code-reviewer` to validate
30+
- For UI changes, run `npm run build` and `npm run test:e2e` to verify
31+
- **Always test infrastructure changes in a test environment before opening a PR** - Deploy to the test-infra KIND cluster (`task test-infra:cluster-up`) and verify resources work correctly before pushing changes to staging/production repos
32+
- **Use Telepresence for debugging staging issues** - When investigating bugs that only reproduce in staging, intercept the service and run it locally with `task test-infra:telepresence:intercept SERVICE=<name>`. See "Remote Debugging with Telepresence" section.
33+
34+
### Agent Context Requirements
35+
36+
Each agent only needs specific context. When spawning agents, pass minimal relevant info in prompts—don't repeat the entire CLAUDE.md:
37+
38+
| Agent | Required Context | Skip (don't include in prompt) |
39+
|-------|-----------------|--------------------------------|
40+
| `frontend-dev` | UI commands, file paths in `ui/` | Go architecture, ClickHouse, NATS, data pipeline |
41+
| `api-dev` | Go patterns, API resource types, key directories | UI commands, dev environment setup, migrations |
42+
| `sre` | Config structure, build commands, deployment | Code architecture details, CEL patterns |
43+
| `test-engineer` | Test commands, package being tested | Full architecture, deployment, UI |
44+
| `Explore` | Key directories, architecture overview | Build commands, dev setup, deployment |
45+
| `code-reviewer` | Architecture, multi-tenancy model, conventions | Dev environment, build commands |
46+
| `tech-writer` | API resources, architecture overview | Implementation details, build commands |
47+
48+
### Agent Output Guidelines
49+
50+
Agents should return **concise summaries** to minimize context bloat in the parent conversation:
51+
52+
| Agent | Return | Don't Return |
53+
|-------|--------|--------------|
54+
| `Explore` | File paths + 1-line descriptions | Full file contents, extensive code quotes |
55+
| `api-dev` | What was changed + file paths | Full diffs, unchanged code |
56+
| `frontend-dev` | Components modified + any build errors | Full file contents |
57+
| `code-reviewer` | Numbered findings list with file:line refs | Full code blocks for context |
58+
| `test-engineer` | Pass/fail summary + failure messages only | Full test output, passing test details |
59+
| `sre` | Changed manifests + deployment notes | Full YAML contents |
60+
61+
### Multi-Step Task Decomposition
62+
63+
For complex tasks, decompose to minimize per-agent context:
64+
65+
1. **Explore first** (use `model: "haiku"`): Find relevant files → return only paths
66+
2. **Plan if needed**: Design approach → return bullet points only
67+
3. **Implement** (sonnet): Work on specific files identified in step 1
68+
4. **Review**: Check only the changed files
69+
70+
**Critical**: Pass only what's needed between steps. Don't re-explore what's already known.

cmd/search/indexer/command.go

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,9 @@ type ResourceIndexerOptions struct {
4646
BatchSize int
4747
FlushInterval time.Duration
4848
BatchMaxConcurrentUploads int
49+
50+
// Multi-tenancy settings.
51+
EnableMultiTenancy bool
4952
}
5053

5154
// NewResourceIndexerOptions creates a new ResourceIndexerOptions with default values.
@@ -65,6 +68,7 @@ func NewResourceIndexerOptions() *ResourceIndexerOptions {
6568
MeilisearchMaxRetries: 3,
6669
MeilisearchRetryDelay: 500 * time.Millisecond,
6770
BatchMaxConcurrentUploads: 100,
71+
EnableMultiTenancy: false,
6872
}
6973
}
7074

@@ -89,6 +93,9 @@ func (o *ResourceIndexerOptions) AddFlags(fs *pflag.FlagSet) {
8993
fs.IntVar(&o.MeilisearchMaxRetries, "meilisearch-max-retries", o.MeilisearchMaxRetries, "The maximum number of retries for transient Meilisearch errors.")
9094
fs.DurationVar(&o.MeilisearchRetryDelay, "meilisearch-retry-delay", o.MeilisearchRetryDelay, "The base delay between Meilisearch retries.")
9195
fs.IntVar(&o.BatchMaxConcurrentUploads, "batch-max-concurrent-uploads", o.BatchMaxConcurrentUploads, "The maximum number of concurrent uploads to Meilisearch.")
96+
97+
// Multi-tenancy
98+
fs.BoolVar(&o.EnableMultiTenancy, "enable-multi-tenancy", o.EnableMultiTenancy, "Enable multi-tenant mode to index resources from all project control planes.")
9299
}
93100

94101
// Validate checks if the resource indexer options are valid.
@@ -295,7 +302,7 @@ func Run(o *ResourceIndexerOptions, ctx context.Context) error {
295302
auditBatcher.Start(ctx)
296303
reindexBatcher.Start(ctx)
297304

298-
auditIdx := indexer.NewIndexer(auditConsumer, indexPolicyCache, auditBatcher)
305+
auditIdx := indexer.NewIndexer(auditConsumer, indexPolicyCache, auditBatcher, o.EnableMultiTenancy)
299306
reindexIdx := indexer.NewReindexConsumer(reindexJSConsumer, reindexPolicyCache, reindexBatcher)
300307

301308
klog.Info("Starting audit indexer and re-index consumer...")

cmd/search/manager/command.go

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ package manager
33
import (
44
"context"
55
"crypto/tls"
6+
"errors"
67
"fmt"
78
"os"
89
"time"
@@ -12,6 +13,7 @@ import (
1213
"github.com/spf13/cobra"
1314
"github.com/spf13/pflag"
1415
"go.miloapis.net/search/internal/indexer"
16+
"go.miloapis.net/search/internal/tenant"
1517
"go.miloapis.net/search/pkg/apis/search/install"
1618
"k8s.io/apimachinery/pkg/runtime"
1719
utilruntime "k8s.io/apimachinery/pkg/util/runtime"
@@ -59,6 +61,10 @@ type ControllerManagerOptions struct {
5961
NatsTLSCA string
6062
NatsTLSCert string
6163
NatsTLSKey string
64+
65+
// Multi-tenancy settings.
66+
EnableMultiTenancy bool
67+
ProjectLabelSelector string
6268
}
6369

6470
// NewControllerManagerOptions creates a new ControllerManagerOptions with default values
@@ -77,6 +83,7 @@ func NewControllerManagerOptions() *ControllerManagerOptions {
7783
MeilisearchDomain: "http://meilisearch.meilisearch-system.svc.cluster.local:7700",
7884
NatsURL: "nats://nats.nats-system.svc.cluster.local:4222",
7985
NatsReindexSubject: "reindex.all",
86+
EnableMultiTenancy: false,
8087
}
8188
}
8289

@@ -107,6 +114,10 @@ func (o *ControllerManagerOptions) AddFlags(fs *pflag.FlagSet) {
107114
fs.StringVar(&o.NatsTLSCA, "nats-tls-ca", o.NatsTLSCA, "The path to the NATS TLS CA file.")
108115
fs.StringVar(&o.NatsTLSCert, "nats-tls-cert", o.NatsTLSCert, "The path to the NATS TLS certificate file.")
109116
fs.StringVar(&o.NatsTLSKey, "nats-tls-key", o.NatsTLSKey, "The path to the NATS TLS key file.")
117+
118+
// Multi-tenancy
119+
fs.BoolVar(&o.EnableMultiTenancy, "enable-multi-tenancy", o.EnableMultiTenancy, "Enable multi-tenant mode to index resources from all project control planes.")
120+
fs.StringVar(&o.ProjectLabelSelector, "project-label-selector", o.ProjectLabelSelector, "Label selector to filter which projects are indexed (empty = all projects).")
110121
}
111122

112123
// Validate validates the options
@@ -243,6 +254,43 @@ func Run(o *ControllerManagerOptions, ctx context.Context) error {
243254

244255
reindexPub := indexer.NewReindexPublisher(js, o.NatsReindexSubject)
245256

257+
// Build TenantRegistry based on deployment mode.
258+
var registry tenant.TenantRegistry
259+
if o.EnableMultiTenancy {
260+
// Create a PolicyCache backed by the manager's shared informer cache.
261+
// requireReadyCondition=true ensures only fully-initialized policies
262+
// (index created, attributes synced) are included in the cache.
263+
policyCache, err := indexer.NewPolicyCache(mgr.GetCache(), true)
264+
if err != nil {
265+
setupLog.Error(err, "unable to create policy cache")
266+
os.Exit(1)
267+
}
268+
if err := policyCache.RegisterHandlers(ctx); err != nil {
269+
setupLog.Error(err, "unable to register policy cache handlers")
270+
os.Exit(1)
271+
}
272+
273+
// ProjectWatcher handles tenant lifecycle: on disengagement it purges all
274+
// tenant documents from each index.
275+
projectWatcher := tenant.NewProjectWatcher(policyCache, searchSDK)
276+
277+
multiRegistry := tenant.NewMultiTenantRegistry(
278+
cfg,
279+
dynamicClient,
280+
o.ProjectLabelSelector,
281+
projectWatcher.OnTenantEngaged,
282+
projectWatcher.OnTenantDisengaged,
283+
)
284+
go func() {
285+
if err := multiRegistry.Run(ctx); err != nil && !errors.Is(err, context.Canceled) {
286+
setupLog.Error(err, "MultiTenantRegistry stopped unexpectedly")
287+
}
288+
}()
289+
registry = multiRegistry
290+
} else {
291+
registry = tenant.NewSingleTenantRegistry(dynamicClient)
292+
}
293+
246294
if err = (&policycontroller.ResourceIndexPolicyReconciler{
247295
Client: mgr.GetClient(),
248296
Scheme: mgr.GetScheme(),
@@ -251,6 +299,7 @@ func Run(o *ControllerManagerOptions, ctx context.Context) error {
251299
DynamicClient: dynamicClient,
252300
RESTMapper: mgr.GetRESTMapper(),
253301
ReindexPublisher: reindexPub,
302+
TenantRegistry: registry,
254303
}).SetupWithManager(mgr); err != nil {
255304
setupLog.Error(err, "unable to create controller", "controller", "ResourceIndexPolicy")
256305
os.Exit(1)

config/base/controller-manager/deployment.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,8 @@ spec:
4848
- --nats-tls-cert=$(NATS_TLS_CERT)
4949
- --nats-tls-key=$(NATS_TLS_KEY)
5050
- --leader-elect-resource-namespace=$(LEADER_ELECT_RESOURCE_NAMESPACE)
51+
- --enable-multi-tenancy=$(ENABLE_MULTI_TENANCY)
52+
- --project-label-selector=$(PROJECT_LABEL_SELECTOR)
5153
env:
5254
- name: POD_NAMESPACE
5355
valueFrom:
@@ -77,6 +79,10 @@ spec:
7779
value: ""
7880
- name: LEADER_ELECT_RESOURCE_NAMESPACE
7981
value: ""
82+
- name: ENABLE_MULTI_TENANCY
83+
value: "false"
84+
- name: PROJECT_LABEL_SELECTOR
85+
value: ""
8086
- name: MEILISEARCH_API_KEY
8187
valueFrom:
8288
secretKeyRef:

config/base/resource-indexer/deployment.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ spec:
2828
- --nats-tls-cert=$(NATS_TLS_CERT)
2929
- --nats-tls-key=$(NATS_TLS_KEY)
3030
- --meilisearch-domain=$(MEILISEARCH_DOMAIN)
31+
- --enable-multi-tenancy=$(ENABLE_MULTI_TENANCY)
3132
env:
3233
- name: NATS_URL
3334
value: "nats://nats.nats-system.svc.cluster.local:4222"
@@ -47,6 +48,8 @@ spec:
4748
value: "AUDIT_EVENTS"
4849
- name: MEILISEARCH_DOMAIN
4950
value: "http://meilisearch.meilisearch-system.svc.cluster.local:7700"
51+
- name: ENABLE_MULTI_TENANCY
52+
value: "false"
5053
- name: MEILISEARCH_API_KEY
5154
valueFrom:
5255
secretKeyRef:

config/overlays/controller-manager/core-control-plane/patches/deployment.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,10 @@ spec:
77
spec:
88
serviceAccountName: search-controller-manager
99
automountServiceAccountToken: true
10+
containers:
11+
- name: manager
12+
env:
13+
- name: ENABLE_MULTI_TENANCY
14+
value: "true"
15+
- name: PROJECT_LABEL_SELECTOR
16+
value: ""

config/overlays/controller-manager/core-control-plane/rbac/role.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,14 @@ rules:
1212
- get
1313
- list
1414
- watch
15+
- apiGroups:
16+
- resourcemanager.miloapis.com
17+
resources:
18+
- projects
19+
verbs:
20+
- get
21+
- list
22+
- watch
1523
- apiGroups:
1624
- search.miloapis.com
1725
resources:

config/overlays/resource-indexer/core-control-plane/patches/deployment.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,8 @@ spec:
77
spec:
88
serviceAccountName: resource-indexer
99
automountServiceAccountToken: true
10+
containers:
11+
- name: indexer
12+
env:
13+
- name: ENABLE_MULTI_TENANCY
14+
value: "true"
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
apiVersion: search.miloapis.com/v1alpha1
2+
kind: ResourceIndexPolicy
3+
metadata:
4+
name: dnszone-index-policy
5+
spec:
6+
targetResource:
7+
group: dns.networking.miloapis.com
8+
version: v1alpha1
9+
kind: DNSZone
10+
11+
conditions:
12+
- name: has-name
13+
expression: "metadata.name != ''"
14+
15+
fields:
16+
- path: ".metadata.name"
17+
searchable: true
18+
- path: ".metadata.namespace"
19+
searchable: true
20+
- path: ".spec.domainName"
21+
searchable: true
22+
- path: ".spec.dnsZoneClassName"
23+
searchable: true
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
apiVersion: search.miloapis.com/v1alpha1
2+
kind: ResourceIndexPolicy
3+
metadata:
4+
name: domain-index-policy
5+
spec:
6+
targetResource:
7+
group: networking.datumapis.com
8+
version: v1alpha
9+
kind: Domain
10+
11+
conditions:
12+
- name: has-name
13+
expression: "metadata.name != ''"
14+
15+
fields:
16+
- path: ".metadata.name"
17+
searchable: true
18+
- path: ".metadata.namespace"
19+
searchable: true
20+
- path: ".spec.domainName"
21+
searchable: true
22+
- path: ".status.apex"
23+
searchable: true
24+
- path: ".status.nameservers[0].hostname"
25+
searchable: true
26+
- path: ".status.registration.registrar.name"
27+
searchable: true
28+
- path: ".status.registration.registry.name"
29+
searchable: true

0 commit comments

Comments
 (0)