Skip to content

Commit 39314af

Browse files
Add first-principles audit suggestions with progress tracking
17 items (2 critical, 5 high, 5 medium, 5 low) from full codebase audit. Both critical items marked DONE: README NuGet update and TASK_BOARD status correction. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
1 parent e2c1fbb commit 39314af

File tree

1 file changed

+170
-0
lines changed

1 file changed

+170
-0
lines changed

SUGGESTIONS.md

Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
# mem.net — Suggestions
2+
3+
> Based on a first-principles code audit of every source file, test file, spec, and public contract. Organized by priority.
4+
5+
---
6+
7+
## Critical (fix before stable v1)
8+
9+
### 1. ~~Update README: Retrievo is now a NuGet package~~ ✅ DONE
10+
11+
**Where**: README.md line 238
12+
**Problem**: README still says *"Once Retrievo is published as a NuGet package, this sibling requirement will be removed."* Retrievo is now published. The sibling project reference instruction is outdated and will confuse users.
13+
**Fix**: Replace the sibling layout requirement with:
14+
```
15+
dotnet add package Retrievo --prerelease
16+
```
17+
**Fix**: ~~Replace the sibling layout requirement.~~ **Fixed**: README updated to state Retrievo is a NuGet dependency (0.2.0-preview.1) installed automatically via `dotnet restore`. Sibling directory layout requirement removed.
18+
19+
### 2. ~~Update TASK_BOARD: Retrievo NuGet task is done~~ ✅ DONE
20+
21+
**Where**: `docs/project/TASK_BOARD.md` line 13
22+
**Problem**: *"Publish Retrievo as NuGet package and replace relative project reference"* is still listed as `[ ]` not started. It's done.
23+
**Fix**: ~~Mark it `[x]` and move to Recently Completed.~~ **Fixed**: Marked `[x]` with version annotation (Retrievo 0.2.0-preview.1) in TASK_BOARD.md.
24+
25+
---
26+
27+
## High (strongly recommended)
28+
29+
### 3. Replace serialized lock in `RetrievoEventStore` with `ReaderWriterLockSlim`
30+
31+
**Where**: `RetrievoStores.cs``_writeLock` used in both `WriteAsync` and `QueryAsync`
32+
**Problem**: All queries are serialized behind the same `lock` as writes. Under concurrent load (multiple agents querying simultaneously), this becomes a bottleneck — every read waits for every other read and every write.
33+
**Fix**: Replace `object _writeLock` with `ReaderWriterLockSlim`:
34+
- `WriteAsync`: `_rwLock.EnterWriteLock()` / `ExitWriteLock()`
35+
- `QueryAsync`: `_rwLock.EnterReadLock()` / `ExitReadLock()`
36+
37+
This allows concurrent reads while still serializing writes. The `_digestCache` dictionary also needs to become a `ConcurrentDictionary` or be protected by the same lock scope.
38+
39+
### 4. Validate composite key components
40+
41+
**Where**: `RetrievoStores.cs``CompositeKey()` method (line 311–314)
42+
**Problem**: `CompositeKey` joins `tenantId/userId/eventId` with `/`. If any component contains `/`, keys will collide silently. Example: tenant `a/b` + user `c` = `a/b/c` = tenant `a` + user `b/c`.
43+
**Fix**: Either validate that components don't contain `/` (throw `ArgumentException`), or use a delimiter that's invalid in IDs (e.g., `\0` or a multi-char separator like `:::`).
44+
45+
### 5. Add OpenAPI schema export
46+
47+
**Where**: `Program.cs` (Minimal API)
48+
**Problem**: The REST API is well-designed but has no machine-readable schema. Agent framework authors who want to auto-generate clients for non-.NET languages have to read source code.
49+
**Fix**: Add `Microsoft.AspNetCore.OpenApi` (ships with .NET 8) and `builder.Services.AddOpenApi()`. Minimal API endpoints already have typed request/response — OpenAPI generation is nearly free. Consider also adding a Swagger UI endpoint for development.
50+
51+
### 6. Add Docker support
52+
53+
**Where**: Repository root
54+
**Problem**: No `Dockerfile` or `docker-compose.yml`. Running mem.net requires .NET 8 SDK installed locally. For agent developers who just want a memory service running, Docker is the expected deployment path.
55+
**Fix**: Add a minimal multi-stage Dockerfile:
56+
```dockerfile
57+
FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build
58+
WORKDIR /src
59+
COPY . .
60+
RUN dotnet publish src/MemNet.MemoryService -c Release -o /app
61+
62+
FROM mcr.microsoft.com/dotnet/aspnet:8.0
63+
WORKDIR /app
64+
COPY --from=build /app .
65+
ENTRYPOINT ["dotnet", "MemNet.MemoryService.dll"]
66+
```
67+
Add a `docker-compose.yml` for one-command startup with volume mount for `MEMNET_DATA_ROOT`.
68+
69+
### 7. Make `RebuildIndex()` async in `RetrievoEventStore`
70+
71+
**Where**: `RetrievoStores.cs``RebuildIndex()` called in constructor
72+
**Problem**: Reads all event JSON files from disk synchronously in the constructor. For a data root with thousands of events, this blocks startup for seconds. Constructor-time I/O is also an anti-pattern for DI containers.
73+
**Fix**: Convert to factory pattern or `IHostedService.StartAsync`:
74+
```csharp
75+
public static async Task<RetrievoEventStore> CreateAsync(StorageOptions options, CancellationToken ct)
76+
```
77+
Or defer index build to first query with lazy initialization.
78+
79+
---
80+
81+
## Medium (quality improvements)
82+
83+
### 8. Increase test coverage threshold from 40% to 60%
84+
85+
**Where**: `.github/workflows/ci.yml`
86+
**Problem**: 40% is a very low bar. The codebase already has 29 test files — actual coverage is likely higher than 40%. The threshold should reflect reality and prevent regression.
87+
**Fix**: Measure current coverage, then set threshold to `current - 5%` to prevent backsliding while leaving room for infrastructure code that's hard to unit test.
88+
89+
### 9. Add health check with backend connectivity validation
90+
91+
**Where**: `Program.cs` — health endpoint
92+
**Problem**: Current health check returns `{"status": "ok"}` unconditionally. It doesn't validate that the configured backend (filesystem directory exists, Azure connection works, Retrievo index is initialized) is actually functional.
93+
**Fix**: Add a `/health/ready` endpoint that verifies backend connectivity. Keep the existing `/` as a lightweight liveness probe.
94+
95+
### 10. Add request/response logging middleware
96+
97+
**Where**: `Program.cs`
98+
**Problem**: No request logging. When debugging agent integration issues, there's no visibility into what requests the service is receiving.
99+
**Fix**: Add `app.UseHttpLogging()` with configurable verbosity. For production, log method + path + status + latency. For development, log request/response bodies.
100+
101+
### 11. Add rate limiting / request size limits
102+
103+
**Where**: `Program.cs`
104+
**Problem**: No request size limits. A malicious or buggy agent could POST a 100MB event digest or a document with unlimited patch operations. No rate limiting on mutations.
105+
**Fix**: Add `app.UseRequestBodySizeLimit()` and consider `Microsoft.AspNetCore.RateLimiting` for mutation endpoints. Even permissive limits (10MB body, 100 req/s) prevent runaway scenarios.
106+
107+
### 12. Formalize error response contract in README
108+
109+
**Where**: README.md
110+
**Problem**: The SDK spec documents the error envelope format, but the README (which most users read first) doesn't show error response shapes. Users integrating from non-.NET languages need to know the error contract.
111+
**Fix**: Add an "Error Responses" section to README showing the standard envelope:
112+
```json
113+
{
114+
"error": {
115+
"code": "ETAG_MISMATCH",
116+
"message": "...",
117+
"request_id": "..."
118+
}
119+
}
120+
```
121+
122+
---
123+
124+
## Low (nice to have)
125+
126+
### 13. Add `IAsyncDisposable` to `RetrievoEventStore`
127+
128+
**Where**: `RetrievoStores.cs`
129+
**Problem**: `Dispose()` holds a lock and disposes the Retrievo index synchronously. If the index grows large, dispose could take non-trivial time. ASP.NET's DI container prefers `IAsyncDisposable` for graceful shutdown.
130+
131+
### 14. Add structured event metadata validation
132+
133+
**Where**: `MemoryCoordinator.cs` — event write path
134+
**Problem**: Event digests accept arbitrary `Keywords` and `ProjectIds` arrays with no size validation. An agent could write an event with 10,000 keywords, bloating the Retrievo index.
135+
**Fix**: Add configurable limits (e.g., max 50 keywords, max 20 project IDs, max 10KB digest text).
136+
137+
### 15. Add NuGet packages for Client and AgentMemory SDKs
138+
139+
**Where**: `MemNet.Client.csproj`, `MemNet.AgentMemory.csproj`
140+
**Problem**: Agent developers currently need to clone the repo and add project references. Publishing `MemNet.Client` and `MemNet.AgentMemory` to NuGet would dramatically lower the adoption barrier.
141+
**Fix**: Add NuGet packaging metadata to both `.csproj` files and publish. Follow the same pattern Retrievo used.
142+
143+
### 16. Add telemetry/metrics hooks
144+
145+
**Where**: `MemoryCoordinator.cs`
146+
**Problem**: No observability. In production, operators need to know request latency, error rates, cache hit rates, index size, etc.
147+
**Fix**: Add `System.Diagnostics.Metrics` counters and `ActivitySource` spans to `MemoryCoordinator`. These are zero-cost when no listener is attached (no external dependency needed).
148+
149+
### 17. Source Link for NuGet debugging
150+
151+
**Where**: `.csproj` files
152+
**Problem**: Same as Retrievo — when SDK packages are published, users debugging through mem.net code won't see source without Source Link.
153+
**Fix**: Add `Microsoft.SourceLink.GitHub` package and `<PublishRepositoryUrl>true</PublishRepositoryUrl>`.
154+
155+
---
156+
157+
## What's Already Great (don't change)
158+
159+
- **File-first mental model** — simple, debuggable, no hidden state
160+
- **ETag optimistic concurrency on every mutation** — correct distributed systems pattern
161+
- **SHA256-based ETags** — deterministic, content-addressable
162+
- **Filesystem as source of truth, search index as derived state** — survives crashes, enables backend swaps
163+
- **Clean backend abstraction** — Retrievo only swaps `IEventStore`, everything else stays filesystem
164+
- **Retrievo knows nothing about mem.net** — zero reverse coupling
165+
- **JSON Patch with correct RFC 6901 tokenization**`~1` before `~0` decode order is right
166+
- **Client retry with exponential backoff + jitter** — production-grade
167+
- **Agent SDK slot/policy system** — powerful without being mandatory
168+
- **Formal spec document** (`MEMORY_SERVICE_SPEC.md`) — rare and valuable for an early-stage project
169+
- **Azure dependencies are build-flag gated** — doesn't bloat the default build
170+
- **Honest coverage threshold** — 40% is low but real; no inflated metrics

0 commit comments

Comments
 (0)