|
| 1 | +# mem.net — Suggestions |
| 2 | + |
| 3 | +> Based on a first-principles code audit of every source file, test file, spec, and public contract. Organized by priority. |
| 4 | +
|
| 5 | +--- |
| 6 | + |
| 7 | +## Critical (fix before stable v1) |
| 8 | + |
| 9 | +### 1. ~~Update README: Retrievo is now a NuGet package~~ ✅ DONE |
| 10 | + |
| 11 | +**Where**: README.md line 238 |
| 12 | +**Problem**: README still says *"Once Retrievo is published as a NuGet package, this sibling requirement will be removed."* Retrievo is now published. The sibling project reference instruction is outdated and will confuse users. |
| 13 | +**Fix**: Replace the sibling layout requirement with: |
| 14 | +``` |
| 15 | +dotnet add package Retrievo --prerelease |
| 16 | +``` |
| 17 | +**Fix**: ~~Replace the sibling layout requirement.~~ **Fixed**: README updated to state Retrievo is a NuGet dependency (0.2.0-preview.1) installed automatically via `dotnet restore`. Sibling directory layout requirement removed. |
| 18 | + |
| 19 | +### 2. ~~Update TASK_BOARD: Retrievo NuGet task is done~~ ✅ DONE |
| 20 | + |
| 21 | +**Where**: `docs/project/TASK_BOARD.md` line 13 |
| 22 | +**Problem**: *"Publish Retrievo as NuGet package and replace relative project reference"* is still listed as `[ ]` not started. It's done. |
| 23 | +**Fix**: ~~Mark it `[x]` and move to Recently Completed.~~ **Fixed**: Marked `[x]` with version annotation (Retrievo 0.2.0-preview.1) in TASK_BOARD.md. |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## High (strongly recommended) |
| 28 | + |
| 29 | +### 3. Replace serialized lock in `RetrievoEventStore` with `ReaderWriterLockSlim` |
| 30 | + |
| 31 | +**Where**: `RetrievoStores.cs` — `_writeLock` used in both `WriteAsync` and `QueryAsync` |
| 32 | +**Problem**: All queries are serialized behind the same `lock` as writes. Under concurrent load (multiple agents querying simultaneously), this becomes a bottleneck — every read waits for every other read and every write. |
| 33 | +**Fix**: Replace `object _writeLock` with `ReaderWriterLockSlim`: |
| 34 | +- `WriteAsync`: `_rwLock.EnterWriteLock()` / `ExitWriteLock()` |
| 35 | +- `QueryAsync`: `_rwLock.EnterReadLock()` / `ExitReadLock()` |
| 36 | + |
| 37 | +This allows concurrent reads while still serializing writes. The `_digestCache` dictionary also needs to become a `ConcurrentDictionary` or be protected by the same lock scope. |
| 38 | + |
| 39 | +### 4. Validate composite key components |
| 40 | + |
| 41 | +**Where**: `RetrievoStores.cs` — `CompositeKey()` method (line 311–314) |
| 42 | +**Problem**: `CompositeKey` joins `tenantId/userId/eventId` with `/`. If any component contains `/`, keys will collide silently. Example: tenant `a/b` + user `c` = `a/b/c` = tenant `a` + user `b/c`. |
| 43 | +**Fix**: Either validate that components don't contain `/` (throw `ArgumentException`), or use a delimiter that's invalid in IDs (e.g., `\0` or a multi-char separator like `:::`). |
| 44 | + |
| 45 | +### 5. Add OpenAPI schema export |
| 46 | + |
| 47 | +**Where**: `Program.cs` (Minimal API) |
| 48 | +**Problem**: The REST API is well-designed but has no machine-readable schema. Agent framework authors who want to auto-generate clients for non-.NET languages have to read source code. |
| 49 | +**Fix**: Add `Microsoft.AspNetCore.OpenApi` (ships with .NET 8) and `builder.Services.AddOpenApi()`. Minimal API endpoints already have typed request/response — OpenAPI generation is nearly free. Consider also adding a Swagger UI endpoint for development. |
| 50 | + |
| 51 | +### 6. Add Docker support |
| 52 | + |
| 53 | +**Where**: Repository root |
| 54 | +**Problem**: No `Dockerfile` or `docker-compose.yml`. Running mem.net requires .NET 8 SDK installed locally. For agent developers who just want a memory service running, Docker is the expected deployment path. |
| 55 | +**Fix**: Add a minimal multi-stage Dockerfile: |
| 56 | +```dockerfile |
| 57 | +FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build |
| 58 | +WORKDIR /src |
| 59 | +COPY . . |
| 60 | +RUN dotnet publish src/MemNet.MemoryService -c Release -o /app |
| 61 | + |
| 62 | +FROM mcr.microsoft.com/dotnet/aspnet:8.0 |
| 63 | +WORKDIR /app |
| 64 | +COPY --from=build /app . |
| 65 | +ENTRYPOINT ["dotnet", "MemNet.MemoryService.dll"] |
| 66 | +``` |
| 67 | +Add a `docker-compose.yml` for one-command startup with volume mount for `MEMNET_DATA_ROOT`. |
| 68 | + |
| 69 | +### 7. Make `RebuildIndex()` async in `RetrievoEventStore` |
| 70 | + |
| 71 | +**Where**: `RetrievoStores.cs` — `RebuildIndex()` called in constructor |
| 72 | +**Problem**: Reads all event JSON files from disk synchronously in the constructor. For a data root with thousands of events, this blocks startup for seconds. Constructor-time I/O is also an anti-pattern for DI containers. |
| 73 | +**Fix**: Convert to factory pattern or `IHostedService.StartAsync`: |
| 74 | +```csharp |
| 75 | +public static async Task<RetrievoEventStore> CreateAsync(StorageOptions options, CancellationToken ct) |
| 76 | +``` |
| 77 | +Or defer index build to first query with lazy initialization. |
| 78 | + |
| 79 | +--- |
| 80 | + |
| 81 | +## Medium (quality improvements) |
| 82 | + |
| 83 | +### 8. Increase test coverage threshold from 40% to 60% |
| 84 | + |
| 85 | +**Where**: `.github/workflows/ci.yml` |
| 86 | +**Problem**: 40% is a very low bar. The codebase already has 29 test files — actual coverage is likely higher than 40%. The threshold should reflect reality and prevent regression. |
| 87 | +**Fix**: Measure current coverage, then set threshold to `current - 5%` to prevent backsliding while leaving room for infrastructure code that's hard to unit test. |
| 88 | + |
| 89 | +### 9. Add health check with backend connectivity validation |
| 90 | + |
| 91 | +**Where**: `Program.cs` — health endpoint |
| 92 | +**Problem**: Current health check returns `{"status": "ok"}` unconditionally. It doesn't validate that the configured backend (filesystem directory exists, Azure connection works, Retrievo index is initialized) is actually functional. |
| 93 | +**Fix**: Add a `/health/ready` endpoint that verifies backend connectivity. Keep the existing `/` as a lightweight liveness probe. |
| 94 | + |
| 95 | +### 10. Add request/response logging middleware |
| 96 | + |
| 97 | +**Where**: `Program.cs` |
| 98 | +**Problem**: No request logging. When debugging agent integration issues, there's no visibility into what requests the service is receiving. |
| 99 | +**Fix**: Add `app.UseHttpLogging()` with configurable verbosity. For production, log method + path + status + latency. For development, log request/response bodies. |
| 100 | + |
| 101 | +### 11. Add rate limiting / request size limits |
| 102 | + |
| 103 | +**Where**: `Program.cs` |
| 104 | +**Problem**: No request size limits. A malicious or buggy agent could POST a 100MB event digest or a document with unlimited patch operations. No rate limiting on mutations. |
| 105 | +**Fix**: Add `app.UseRequestBodySizeLimit()` and consider `Microsoft.AspNetCore.RateLimiting` for mutation endpoints. Even permissive limits (10MB body, 100 req/s) prevent runaway scenarios. |
| 106 | + |
| 107 | +### 12. Formalize error response contract in README |
| 108 | + |
| 109 | +**Where**: README.md |
| 110 | +**Problem**: The SDK spec documents the error envelope format, but the README (which most users read first) doesn't show error response shapes. Users integrating from non-.NET languages need to know the error contract. |
| 111 | +**Fix**: Add an "Error Responses" section to README showing the standard envelope: |
| 112 | +```json |
| 113 | +{ |
| 114 | + "error": { |
| 115 | + "code": "ETAG_MISMATCH", |
| 116 | + "message": "...", |
| 117 | + "request_id": "..." |
| 118 | + } |
| 119 | +} |
| 120 | +``` |
| 121 | + |
| 122 | +--- |
| 123 | + |
| 124 | +## Low (nice to have) |
| 125 | + |
| 126 | +### 13. Add `IAsyncDisposable` to `RetrievoEventStore` |
| 127 | + |
| 128 | +**Where**: `RetrievoStores.cs` |
| 129 | +**Problem**: `Dispose()` holds a lock and disposes the Retrievo index synchronously. If the index grows large, dispose could take non-trivial time. ASP.NET's DI container prefers `IAsyncDisposable` for graceful shutdown. |
| 130 | + |
| 131 | +### 14. Add structured event metadata validation |
| 132 | + |
| 133 | +**Where**: `MemoryCoordinator.cs` — event write path |
| 134 | +**Problem**: Event digests accept arbitrary `Keywords` and `ProjectIds` arrays with no size validation. An agent could write an event with 10,000 keywords, bloating the Retrievo index. |
| 135 | +**Fix**: Add configurable limits (e.g., max 50 keywords, max 20 project IDs, max 10KB digest text). |
| 136 | + |
| 137 | +### 15. Add NuGet packages for Client and AgentMemory SDKs |
| 138 | + |
| 139 | +**Where**: `MemNet.Client.csproj`, `MemNet.AgentMemory.csproj` |
| 140 | +**Problem**: Agent developers currently need to clone the repo and add project references. Publishing `MemNet.Client` and `MemNet.AgentMemory` to NuGet would dramatically lower the adoption barrier. |
| 141 | +**Fix**: Add NuGet packaging metadata to both `.csproj` files and publish. Follow the same pattern Retrievo used. |
| 142 | + |
| 143 | +### 16. Add telemetry/metrics hooks |
| 144 | + |
| 145 | +**Where**: `MemoryCoordinator.cs` |
| 146 | +**Problem**: No observability. In production, operators need to know request latency, error rates, cache hit rates, index size, etc. |
| 147 | +**Fix**: Add `System.Diagnostics.Metrics` counters and `ActivitySource` spans to `MemoryCoordinator`. These are zero-cost when no listener is attached (no external dependency needed). |
| 148 | + |
| 149 | +### 17. Source Link for NuGet debugging |
| 150 | + |
| 151 | +**Where**: `.csproj` files |
| 152 | +**Problem**: Same as Retrievo — when SDK packages are published, users debugging through mem.net code won't see source without Source Link. |
| 153 | +**Fix**: Add `Microsoft.SourceLink.GitHub` package and `<PublishRepositoryUrl>true</PublishRepositoryUrl>`. |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +## What's Already Great (don't change) |
| 158 | + |
| 159 | +- **File-first mental model** — simple, debuggable, no hidden state |
| 160 | +- **ETag optimistic concurrency on every mutation** — correct distributed systems pattern |
| 161 | +- **SHA256-based ETags** — deterministic, content-addressable |
| 162 | +- **Filesystem as source of truth, search index as derived state** — survives crashes, enables backend swaps |
| 163 | +- **Clean backend abstraction** — Retrievo only swaps `IEventStore`, everything else stays filesystem |
| 164 | +- **Retrievo knows nothing about mem.net** — zero reverse coupling |
| 165 | +- **JSON Patch with correct RFC 6901 tokenization** — `~1` before `~0` decode order is right |
| 166 | +- **Client retry with exponential backoff + jitter** — production-grade |
| 167 | +- **Agent SDK slot/policy system** — powerful without being mandatory |
| 168 | +- **Formal spec document** (`MEMORY_SERVICE_SPEC.md`) — rare and valuable for an early-stage project |
| 169 | +- **Azure dependencies are build-flag gated** — doesn't bloat the default build |
| 170 | +- **Honest coverage threshold** — 40% is low but real; no inflated metrics |
0 commit comments