A distributed, multi-tenant document-processing platform built with .NET 8, Clean Architecture, and Domain-Driven Design. It provides a full authentication and authorisation system with JWT bearer tokens and refresh token rotation, a role-based admin panel, and an asynchronous two-stage document pipeline that scans .txt files for dangerous content and structured code patterns — all orchestrated via RabbitMQ and deployed through Docker Compose.
- Architecture Overview
- Bounded Contexts
- Components
- Processing Pipeline
- Authentication Flow
- Prerequisites
- Setup & Running
- API Reference
- Configuration
- Running Tests
- Design Decisions
- Clean Architecture & Bounded Context Separation
- Document Status Flow
- Idempotency
- Optimistic Concurrency
- Fault Tolerance & Restart Safety
- Scalability
- Why There Is No Pagination on Matches
- Bytes vs Characters
- Two-Stage Worker Pipeline
- Refresh Token Rotation
- Password Hashing Abstraction
- JWT ClockSkew = Zero
- Account Lockout & Security Events
- Rate Limiting
- Multi-Tenancy
- Distributed Cache (Read-Through)
- HttpClient Usage in the Console Client
- Auto-Migration on Startup
┌──────────────────────────────────────────────────────────────────────┐
│ Console Client (DocuFlow.Client) │
│ Scans ./docs for *.txt → POST /api/documents (parallel, up to 64) │
└───────────────────────────────┬──────────────────────────────────────┘
│ HTTP + JWT Bearer
▼
┌──────────────────────────────────────────────────────────────────────┐
│ REST API (DocuFlowApi) │
│ │
│ IdentityAccess Bounded Context Documents Bounded Context │
│ ───────────────────────────── ──────────────────────────── │
│ POST /api/authentication/register POST /api/documents │
│ POST /api/authentication/login GET /api/documents/{id}/ │
│ POST /api/authentication/refresh status | matches | │
│ POST /api/authentication/logout content │
│ POST /api/authentication/ │
│ change-password Admin Bounded Context │
│ POST /api/authentication/ ───────────────────────── │
│ forgot-password GET /api/admin/dashboard │
│ POST /api/authentication/ GET /api/admin/users │
│ reset-password PUT /api/admin/users/{id}/│
│ POST /api/authentication/ lock | unlock │
│ revoke-all GET /api/admin/tenants │
│ GET /api/admin/security- │
│ events │
│ │
│ Persists document → publishes DocumentSubmittedMessage via │
│ MassTransit → RabbitMQ │
└──────┬─────────────────────────────┬──────────────────┬─────────────┘
│ SQL Server (EF Core) │ Redis │ RabbitMQ
▼ ▼ ▼
┌────────────────┐ ┌───────────────────────┐ ┌──────────────────────────┐
│ SQL Server │ │ Redis Cache │ │ Worker: DocumentProcess │
│ │ │ │ │ • Finds "dangerous" │
│ IdentityAccess│ │ matches:{docId} │ │ (case-insensitive) │
│ Documents │ │ content:{tid}:{docId}│ │ • Publishes scanned evt │
│ Logs │ │ (TTL from config) │ └──────────────────────────┘
└────────────────┘ └───────────────────────┘ │ RabbitMQ
▲ ▼
│ ┌──────────────────────────┐
└──────────────────────────────────────│ Worker: DocumentExtract │
│ • Extracts [A-Z]{3,5} │
│ -?[0-9]{3} patterns │
│ • Marks doc Available │
└──────────────────────────┘
DocuFlow is structured around three bounded contexts, each with its own Domain / Application / Infrastructure project triad. They share nothing except Shared.Contracts (domain error codes, domain messages, exception types, message contracts, and JWT constants).
| Bounded Context | Responsibility |
|---|---|
| IdentityAccess | User registration, login, JWT token generation, refresh token rotation, account lockout, password reset, email verification, tenant management, security event auditing |
| Documents | Document submission, per-tenant SHA-256 deduplication, asynchronous two-stage scan pipeline (dangerous word detection → regex extraction), tenant-scoped status and match querying |
| Admin (part of IdentityAccess.Application) | Admin dashboard aggregates, cross-tenant user management, tenant CRUD, security event queries — restricted to the Admin role |
| Project | Type | Description |
|---|---|---|
DocuFlowApi |
ASP.NET Core 8 Web API | REST endpoints for authentication, document processing, and admin operations |
DocuFlow.Client |
.NET Console App | Batch-submits .txt files from a local directory to the API (requires a valid JWT) |
Worker.DocumentProcessing |
.NET Worker Service | Consumes DocumentSubmittedMessage, scans for the word dangerous, publishes DocumentScannedMessage |
Worker.DocumentExtractor |
.NET Worker Service | Consumes DocumentScannedMessage, extracts [A-Z]{3,5}-?[0-9]{3} patterns, marks the document Available |
IdentityAccess.Domain |
Class Library | User, Tenant, RefreshToken, PasswordResetToken, EmailVerificationToken, SecurityEvent entities with domain behaviour — zero external dependencies |
IdentityAccess.Application |
Class Library | All authentication and admin handlers, handler interfaces, DTOs, and IPasswordHasher abstraction |
IdentityAccess.Infrastructure |
Class Library | EF Core persistence, JwtTokenService, BcryptPasswordHasher, repositories, unit of work |
Documents.Domain |
Class Library | Document, ScanMatch entities — zero external dependencies |
Documents.Application |
Class Library | Document-processing handlers, interfaces, DTOs |
Documents.Infrastructure |
Class Library | EF Core persistence, MassTransit/RabbitMQ setup, document repositories |
Shared.Contracts |
Class Library | DomainException, DomainErrorCodes, DomainMessages, JwtClaimTypes, JwtConfigurationKeys, message contracts |
DocuFlowTest |
xUnit Test Project | Unit tests for handlers, controllers, repositories, middleware, JWT service, and extensions |
- Client reads each
.txtfile (max 1 KB), obtains a JWT by callingPOST /api/authentication/login, then sendsPOST /api/documentswith the Bearer token attached. - API deduplicates by
SHA-256(fileName + "|" + content). If the document is new it is saved with statusProcessingand aDocumentSubmittedMessageis published to RabbitMQ via MassTransit. - Worker.DocumentProcessing consumes
DocumentSubmittedMessage, searches the content for the worddangerous(case-insensitive), persists each occurrence as aScanMatchwith its character position, then publishesDocumentScannedMessage. - Worker.DocumentExtractor consumes
DocumentScannedMessage, applies[A-Z]{3,5}-?[0-9]{3}, persists each regex match, and transitions the document status toAvailable. - Callers poll
GET /api/documents/{id}/statusuntilavailable, then retrieve results viaGET /api/documents/{id}/matches.
┌─────────┐ POST /register ┌──────────────────┐
│ Client │ ─────────────────────►│ Register │
│ │ │ BCrypt hash pwd │
│ │ POST /login │ Create user │
│ │ ─────────────────────►│ Validate creds │
│ │ ◄─────────────────────│ Issue JWT + │
│ │ { accessToken, │ refresh token │
│ │ refreshToken } └──────────────────┘
│ │
│ │ POST /refresh ┌──────────────────┐
│ │ ─────────────────────►│ Validate refresh │
│ │ ◄─────────────────────│ token (SHA-256 │
│ │ { new accessToken, │ hash lookup) │
│ │ new refreshToken } │ Rotate token │
│ │ └──────────────────┘
│ │
│ │ POST /logout ┌──────────────────┐
│ │ ─────────────────────►│ Revoke single │
│ │ │ refresh token │
│ │ POST /revoke-all └──────────────────┘
│ │ ─────────────────────►┌──────────────────┐
│ │ │ Revoke all user │
│ │ │ refresh tokens │
│ │ └──────────────────┘
│ │
│ │ POST /change-password ┌─────────────────┐
│ │ ──────────────────────►│ Verify current │
│ │ │ pwd, hash new, │
│ │ │ rotate stamp, │
│ │ │ optionally │
│ │ │ revoke all │
│ │ └─────────────────┘
│ │
│ │ POST /forgot-password ┌────────────────┐
│ │ ───────────────────────►│ Issue reset │
│ │ │ token (SHA-256 │
│ │ POST /reset-password │ stored hash) │
│ │ ───────────────────────►│ Validate token │
│ │ │ Set new pwd │
└─────────┘ └────────────────┘
Token lifecycle:
- Access token: short-lived JWT (15 minutes),
ClockSkew = TimeSpan.Zero - Refresh token: long-lived (7 days), stored as a SHA-256 hash in the database — the plaintext is only ever sent over the wire and never persisted
- Refresh token rotation: every call to
/refreshrevokes the old token and issues a new pair - Security stamp: every password change and token revocation increments the user's security stamp — issued tokens that predate the stamp change become invalid on the next validation
- Docker Desktop
- .NET 8 SDK (to run the client and tests)
All server-side services (SQL Server, RabbitMQ, Redis, API, workers) run via Docker Compose. The client is run manually once the stack is healthy.
# In the Infra/ directory
cp Infra/.env.template Infra/.env
# Edit Infra/.env and set your passwords for SQL_SA_PASSWORD, RABBITMQ_PASSWORD, REDIS_PASSWORDEach application project also needs its own appsettings.json (and optionally appsettings.Development.json) created from the committed templates:
# API
cp DocuFlow/DocuFlowApi/appsettings.template.json DocuFlow/DocuFlowApi/appsettings.json
cp DocuFlow/DocuFlowApi/appsettings.Development.template.json DocuFlow/DocuFlowApi/appsettings.Development.json
# Workers
cp DocuFlow/Worker.DocumentProcessing/appsettings.template.json DocuFlow/Worker.DocumentProcessing/appsettings.json
cp DocuFlow/Worker.DocumentProcessing/appsettings.Development.template.json DocuFlow/Worker.DocumentProcessing/appsettings.Development.json
cp DocuFlow/Worker.DocumentExtractor/appsettings.template.json DocuFlow/Worker.DocumentExtractor/appsettings.json
cp DocuFlow/Worker.DocumentExtractor/appsettings.Development.template.json DocuFlow/Worker.DocumentExtractor/appsettings.Development.jsonUpdate each copied file with the same passwords you set in Infra/.env.
# From the repo root
docker compose -f Infra/docker-compose.yml up --buildThe API is available at http://localhost:8080 (OpenAPI UI at http://localhost:8080/swagger).
The RabbitMQ management console is at http://localhost:15672.
EF Core migrations are applied automatically by the API on startup.
Once the stack is up, obtain a JWT first (via POST /api/authentication/login), then run:
cd DocuFlow/DocuFlow.Client
dotnet runBy default the client scans ./docs relative to its working directory. The docs/ folder at the root of the repository contains 50 pre-built test files covering a wide range of scenarios.
All endpoints return responses wrapped in a consistent envelope:
{
"success": true,
"data": { },
"requestId": "550e8400-e29b-41d4-a716-446655440000"
}Register a new user within a tenant.
Request body:
{
"email": "user@example.com",
"password": "P@ssw0rd123!",
"tenantId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"role": "Reader"
}Responses: 201 Created | 400 Bad Request | 409 Conflict (email already exists) | 429 Too Many Requests
Log in and receive a JWT access token and a refresh token.
Request body:
{
"email": "user@example.com",
"password": "P@ssw0rd123!",
"tenantId": "3fa85f64-5717-4562-b3fc-2c963f66afa6"
}Response 200 OK:
{
"accessToken": "eyJ...",
"refreshToken": "base64-opaque-token",
"expiresIn": 900,
"userId": "...",
"email": "user@example.com",
"role": "Reader"
}Responses: 200 OK | 401 Unauthorized (invalid credentials) | 403 Forbidden (account locked)
Exchange a valid refresh token for a new access + refresh token pair. The old refresh token is revoked.
Request body:
{ "refreshToken": "base64-opaque-token" }Responses: 200 OK (new token pair) | 401 Unauthorized (token invalid or expired)
Revoke a single refresh token.
Request body:
{ "refreshToken": "base64-opaque-token" }Responses: 200 OK | 404 Not Found
Revoke all active refresh tokens for the authenticated user.
Responses: 200 OK | 401 Unauthorized
Change the authenticated user's password. Optionally revokes all active tokens.
Request body:
{
"currentPassword": "OldP@ss!",
"newPassword": "NewP@ss!",
"revokeAllTokens": true
}Responses: 200 OK | 400 Bad Request (invalid current password) | 401 Unauthorized
Request a password-reset token (token is returned in the response in this demo; a production system would email it).
Request body:
{ "email": "user@example.com" }Responses: 200 OK
Reset a password using a reset token.
Request body:
{
"email": "user@example.com",
"token": "opaque-reset-token",
"newPassword": "NewP@ss!"
}Responses: 200 OK | 400 Bad Request (expired/used token) | 404 Not Found
Submit a new document for processing.
Request body:
{
"fileName": "report.txt",
"content": "The PSV-123 part is dangerous."
}Responses:
201 Created— document accepted and queued. Body:{ "id": "<guid>" }200 OK— duplicate detected (same file name + content already exists). Body:{ "id": "<guid>" }
Returns the current processing status.
Response 200 OK:
{ "status": "processing" }Possible values: unknown | processing | available
Returns all scan matches once the document is available.
Response 200 OK:
[
{ "position": 4, "matchType": "dangerous", "matchValue": "dangerous" },
{ "position": 18, "matchType": "pattern", "matchValue": "PSV-123" }
]202 Accepted— processing not yet complete; poll/statusand retry.404 Not Found— unknown document ID.
Returns the original document text.
Response 200 OK:
{ "fileName": "report.txt", "content": "The PSV-123 part is dangerous." }Returns aggregate platform statistics: total users, active tenants, recent security events, etc.
List all users or retrieve a specific user's details including lock status, role, tenant, and last login.
Admin — PUT /api/admin/users/{userId}/lock / PUT /api/admin/users/{userId}/unlock (requires Admin role)
Administratively lock or unlock a user account.
List all tenants or create a new tenant.
Query the security event audit trail with optional filtering by tenant, user, or event type.
Created from Infra/.env.template. Controls infrastructure passwords shared by all Docker containers.
| Variable | Default | Description |
|---|---|---|
SQL_SA_PASSWORD |
YourStrong!Password123 |
SQL Server SA password |
RABBITMQ_USER |
docuflow |
RabbitMQ username |
RABBITMQ_PASSWORD |
YourRabbitMQPassword123 |
RabbitMQ password |
REDIS_PASSWORD |
YourRedisPassword123 |
Redis password |
Created from DocuFlowApi/appsettings.template.json.
| Key | Description |
|---|---|
ConnectionStrings:DefaultConnection |
SQL Server connection string |
RabbitMQ:Host / Username / Password |
MassTransit broker connection |
Redis:ConnectionString |
StackExchange.Redis connection string |
Jwt:SecretKey |
HMAC-SHA256 signing key (minimum 32 characters) |
Jwt:Issuer / Audience |
JWT validation parameters |
Jwt:AccessTokenExpirationMinutes |
Access token lifetime (default: 15) |
Jwt:RefreshTokenExpirationDays |
Refresh token lifetime (default: 7) |
Authentication:MaxFailedLoginAttempts |
Failed attempts before lockout (default: 5) |
Authentication:AccountLockoutDurationMinutes |
Lockout duration (default: 15) |
Cache:DocumentMatchesTtlMinutes |
TTL for cached scan-match results (default: 5) |
Cache:DocumentContentTtlMinutes |
TTL for cached document-content results (default: 5) |
Created from the respective appsettings.template.json in each worker project.
| Key | Description |
|---|---|
ConnectionStrings:DefaultConnection |
SQL Server connection string |
RabbitMQ:Host / Port / Username / Password |
MassTransit broker connection |
cd DocuFlow
dotnet test DocuFlowTest/DocuFlowTest.csprojTests cover:
- Authentication controller — all 8 endpoints: register, login, refresh, logout, revoke-all, change-password, forgot-password, reset-password.
- Admin controller — dashboard, user management, tenant management, security events.
- JWT service — token generation, claim integrity, secret key validation, expiry enforcement.
- Repository tests —
UserRepository,RefreshTokenRepository,TenantRepository,PasswordResetTokenRepository,EmailVerificationTokenRepository,SecurityEventRepository— all backed by EF Core InMemory with per-test database isolation. - Middleware —
GlobalExceptionMiddleware(exception-to-HTTP mapping),RequestIdMiddleware(correlation ID injection). - Extensions —
HttpContextExtensions(GetClientIpAddress,GetUserIdfrom JWT claims).
Why structure the projects this way? A common failure mode in service projects is allowing business logic to become entangled with infrastructure (database queries, HTTP calls, broker configuration). When that happens, unit tests require real databases or real brokers, feedback loops slow down, and swapping infrastructure becomes a rewrite.
DocuFlow enforces a strict, provable dependency direction:
Domain ← Application ← Infrastructure ← API
Every .csproj is the proof: IdentityAccess.Application.csproj references only IdentityAccess.Domain and Shared.Contracts. It has no knowledge of EF Core, BCrypt, or JWT libraries. The IPasswordHasher interface lives in Application; BcryptPasswordHasher lives in Infrastructure. The IJwtTokenService interface lives in Application; the implementation that signs HMAC-SHA256 tokens lives in Infrastructure. All DI registrations follow the same rule: Application handlers are registered in the Application layer's DependencyInjection.cs; repositories and infrastructure services are registered in the Infrastructure layer's DependencyInjection.cs.
The two bounded contexts (IdentityAccess and Documents) communicate only through Shared.Contracts message types. There is no direct project reference from Documents.* to IdentityAccess.* or vice versa. A new bounded context can be added with no changes to existing code.
The consequences:
- Testability: Handler logic is tested against NSubstitute mock interfaces. No database, broker, or cryptographic library needs to be running for unit tests to pass.
- Infrastructure replaceability: Switching BCrypt to Argon2 is a change to
BcryptPasswordHasherin Infrastructure. Switching SQL Server to PostgreSQL is a change to the Infrastructure.csprojand EF Core configuration. No business logic changes. - Onboarding clarity: A new developer can read the Domain and Application layers to understand what the system does, without needing to understand how it persists or communicates.
A document moves through three states: Unknown → Processing → Available.
Unknown— the document has been persisted by the API but the Scanner worker has not yet picked it up. This window is normally very short (milliseconds), but it exists because the status is intentionally not set toProcessingat insert time. Setting it at the API layer would require the API to know about worker semantics; instead, the Scanner owns theUnknown → Processingtransition when it begins work.Processing— the Scanner has started scanning. The document will stay in this state until the Extractor finishes and marks itAvailable. Both workers guard against regressing a status that has already advanced.Available— the Extractor has completed pattern extraction and the full set of matches is ready to be queried.
This forward-only, three-state model keeps the logic simple: any worker can check the current status and decide whether to proceed, skip, or treat the message as a duplicate — without needing distributed locks or version vectors.
Why it matters here: In a distributed system with retries at every layer — HTTP client retries, at-least-once message delivery, and operator-triggered restarts — the same operation will inevitably be executed more than once. Without idempotency guarantees, each retry risks creating duplicate documents, duplicate match rows, or double-processing a document that was already scanned.
Idempotency is enforced at multiple independent layers:
Document submission (API)
Before inserting, the API computes SHA-256(fileName + "|" + content). A unique index on ContentHash enforces uniqueness at the storage level. If a concurrent insert races and wins, the application catches the DbUpdateException, re-queries the existing document ID, and returns 200 OK with that ID.
Message delivery (MassTransit)
DocumentSubmittedMessage is published inside a MassTransit ISendEndpoint call after save. If a worker crashes mid-processing and the message is redelivered, the handler checks the document's current status before writing matches.
Match persistence
A unique composite index on (DocumentId, Position, MatchType) means a duplicate match row from a replayed message is silently rejected by the database.
Refresh token issuance
Each refresh token has a unique TokenHash (SHA-256 of the plaintext). Every refresh call issues a new token and revokes the old one atomically within a database transaction; there is no gap where both the old and new tokens are simultaneously valid.
No traditional row-level locking is used anywhere. The system uses structural guards and database-level uniqueness constraints instead.
Document creation
AddAsync → SaveChangesAsync runs without a lock. When two clients submit the same document simultaneously, one insert succeeds and the other raises a DbUpdateException on the unique ContentHash constraint. The losing request recovers gracefully by falling back to FindIdByContentHashAsync.
Status transitions
UpdateStatusAsync uses EF Core's ExecuteUpdateAsync (a single UPDATE … WHERE id = @id) rather than load-then-save. The transition is always forward-only (Processing → Available) and unconditional. A guard in each handler also checks the document's current status before writing results:
Scanner: if status is not Processing → skip (already done or superseded)
Extractor: if status is not Processing → skip (already done or superseded)
The design guarantee is: any service can be stopped and restarted at any point during processing and the document will still eventually reach Available with the correct matches.
| Failure scenario | Why recovery works |
|---|---|
| API crashes after DB insert, before RabbitMQ publish | MassTransit at-least-once delivery ensures the message is republished on next send |
| RabbitMQ is temporarily unavailable | Messages accumulate and are forwarded once the broker is reachable |
| Scanner worker crashes mid-processing | Broker redelivers DocumentSubmittedMessage; the match uniqueness index absorbs already-written rows silently |
| Extractor worker crashes mid-processing | Broker redelivers DocumentScannedMessage; same deduplication applies |
| Database restarts | EF Core reconnects on the next request; healthcheck in Compose prevents dependent services from starting prematurely |
| Complete stack restart | docker compose up re-applies migrations (idempotent IF NOT EXISTS) |
At-least-once delivery is guaranteed by MassTransit + RabbitMQ. Messages that repeatedly fail are requeued with exponential back-off.
The architecture was designed for horizontal scale from the start, but a single-instance Compose is used deliberately for the F5 experience. None of the scaling steps below require code changes.
| What to scale | Why | How |
|---|---|---|
| API | Fully stateless — JWT is self-contained, Redis cache is distributed, no in-memory session | Add replicas behind any HTTP load balancer |
| Scanner / Extractor workers | MassTransit competing consumers — replicas share one named queue, broker distributes evenly | Increase replica count: docker compose up --scale |
| Database | EF Core abstraction makes read/write replica routing a connection-string–level config | Add read replicas; tune MaxPoolSize |
| Redis | Session-less design means Redis is used only for distributed cache, not session state | Redis Cluster or Sentinel without application changes |
The document size cap (1 KB) makes the response size self-limiting. A 1 KB document can produce at most roughly 100–170 matches in a pathological, densely-packed case — a payload that fits in a single network frame. Adding pagination would introduce cursor or offset management, additional query parameters to validate, and extra test cases for a code path that can never be triggered under the stated constraints.
For purely ASCII content — which all realistic test documents use — 1,024 characters and 1,024 bytes are identical. Match positions are currently stored as character offsets (the value returned by string.IndexOf and Regex.Match.Index). A production-hardened solution would validate Encoding.UTF8.GetByteCount(content) <= 1024 and compute positions on the UTF-8 byte representation. This is noted so that any consumer performing byte-level operations on the content is aware of the distinction.
Combining scan and extraction in one worker would be simpler to deploy but would tightly couple two independent concerns. The chosen design connects them through a DocumentScannedMessage event:
- Independent scalability: If pattern extraction becomes a bottleneck, the Extractor can be scaled up without touching the Scanner.
- Independent failure domains: A crash in the Extractor does not prevent the Scanner from processing new documents.
- Open/closed for extension: A third worker (e.g., a PII detector) can subscribe to
DocumentScannedMessagewithout modifying existing workers.
Why not just use a long-lived JWT? A long-lived JWT cannot be invalidated server-side. If a token is stolen, the attacker has access until it expires. Refresh token rotation addresses this in two ways.
First, the refresh token is never stored in plaintext. JwtTokenService.GenerateTokensAsync returns the plaintext token over the wire but immediately stores only SHA-256(token) in the database. An attacker who dumps the database cannot reconstruct active tokens.
Second, every call to /refresh atomically revokes the old token and issues a new pair within a database transaction. The old TokenHash is marked as revoked before the new one is written. There is no window where two valid refresh tokens for the same session exist simultaneously.
Third, user.UpdateSecurityStamp() is called on every password change and full token revocation. The security stamp is embedded in the JWT claims at issuance time — any token issued before the stamp was rotated carries an outdated stamp value. A guard in JwtTokenService.ValidateRefreshTokenAsync rejects tokens whose associated user has a different security stamp than the one embedded in the access token.
The access token has a 15-minute lifetime with ClockSkew = TimeSpan.Zero (see below), so even if a stolen access token were to leak, the window of exploitability is narrow and bounded.
BCrypt.Net.BCrypt never appears in the Application layer. The IPasswordHasher interface in IdentityAccess.Application.Common.Interfaces exposes only HashPassword(string) and Verify(string, string). BcryptPasswordHasher in Infrastructure implements it using BCrypt with a work factor of 12.
Why this matters: If the hashing algorithm needs to change (e.g., migrating to Argon2), the Application layer handlers are untouched. Only BcryptPasswordHasher and its registration in Infrastructure DI need to change. More immediately: unit tests for LoginHandler, RegisterHandler, ChangePasswordHandler, and ResetPasswordHandler can inject a no-op IPasswordHasher mock and test the surrounding orchestration logic without BCrypt's deliberate CPU cost.
By default ASP.NET Core's JWT bearer handler adds a 5-minute ClockSkew to all token lifetime validations — a token that should expire at 14:00 is actually accepted until 14:05. This is intended to absorb clock drift between servers. DocuFlow sets ClockSkew = TimeSpan.Zero deliberately.
Why: The access token lifetime is 15 minutes. A 5-minute skew allowance extends the effective lifetime to 20 minutes — a 33% increase in the window during which a stolen access token remains valid. Since all services run as containers on the same Docker network (synchronised system clock), there is no legitimate clock drift to absorb. The overhead of the extra 5 minutes is all risk and no benefit for this deployment topology.
Account lockout is enforced in LoginHandler. After MaxFailedLoginAttempts (configurable, default 5) consecutive failed passwords for a user, user.LockAccount(DateTimeOffset lockoutEnd) is called — a domain method on the User entity that sets LockoutEnd and IsLockedOut = true. The lockout duration is configurable (default 15 minutes). A subsequent login attempt on a locked account returns 403 Forbidden before BCrypt verification is attempted, preventing timing-based enumeration.
Security events are written to the SecurityEvents table for every significant authentication action: registration, login success, login failure, password change, password reset, account lockout, and token revocation. These are queryable by admins through GET /api/admin/security-events. Every event records the user ID, tenant ID, event type, IP address, user agent, timestamp, and a human-readable detail string — providing a complete audit trail without requiring a separate logging infrastructure.
ASP.NET Core's built-in rate limiter (AddRateLimiter in Program.cs) is configured with multiple named policies mapped to endpoint sensitivity:
| Policy | Endpoint | Limit |
|---|---|---|
auth-register |
POST /register |
Strict — prevents automated account creation |
auth-password-reset |
POST /reset-password |
Strict — prevents token enumeration |
auth-general |
POST /change-password, POST /revoke-all |
Moderate |
admin-write |
PUT /admin/users/{id}/lock |
Moderate — prevents automated lockout attacks |
Endpoints with no rate-limiting annotation (e.g., GET /documents/{id}/status) are unrestricted. Rate limit violations return 429 Too Many Requests.
Every User belongs to a Tenant. The TenantId is required on POST /register and POST /login.
Identity isolation is enforced at the repository level — queries against User, RefreshToken, and related entities always include a WHERE TenantId = @tenantId predicate derived from the authenticated user's JWT claim. A user in Tenant A cannot see, modify, or act on identity resources in Tenant B, regardless of their role.
Document isolation follows the same pattern. Every Document row carries a TenantId column (FK-free by design — the Documents bounded context does not reference IdentityAccess tables). All four document API handlers (Submit, GetStatus, GetMatches, GetContent) accept a tenantId parameter extracted from the JWT claim via HttpContext.GetTenantId(). Each repository method scopes its query with WHERE TenantId = @tenantId && Id = @id, so a user in Tenant A will receive a 404 Not Found for a document that exists but belongs to Tenant B.
Deduplication is per-tenant. The unique index on the Documents table is (TenantId, ContentHash) rather than just ContentHash. The same file submitted by two different tenants creates two fully independent documents, each going through the normal pipeline and owned exclusively by its tenant.
The Admin role is a cross-tenant role: admin handlers in IdentityAccess.Application.Admin query across tenants by design, which is why they are restricted behind the AdminOnly policy.
Redis is already present in the stack as the distributed cache backing for ASP.NET Core's rate-limiter state. Two additional read endpoints now participate in a cache-aside (read-through) pattern using IDistributedCache:
| Endpoint | Cache key | TTL source |
|---|---|---|
GET /api/documents/{id}/matches |
matches:{documentId} |
Cache:DocumentMatchesTtlMinutes |
GET /api/documents/{id}/content |
content:{tenantId}:{documentId} |
Cache:DocumentContentTtlMinutes |
Why these two endpoints? Both return data that is immutable once it exists. Scan matches are written exactly once by the two workers; document content is never modified after submission. Every subsequent request for the same resource is a pure repeat read — the ideal case for caching.
Why cache-aside and not cache-on-write? The workers write directly to SQL Server and have no dependency on the Redis cache. Populating the cache on the write path would require the workers to reference IDistributedCache, coupling infrastructure concerns across bounded-context boundaries. Instead, the API handler populates the cache lazily on the first read and serves subsequent requests from Redis, keeping the workers free of any cache awareness.
Tenant isolation for content. The matches cache key uses only documentId because a documentId is already globally unique — a document belongs to exactly one tenant, so there is no cross-tenant leak risk. The content cache key includes tenantId as an explicit defence-in-depth measure, since the content payload includes the raw file data.
TTL from configuration. The expiry is read from DocumentCacheOptions (bound to the Cache section in appsettings.json) rather than being hard-coded, so it can be tuned per environment without a rebuild. Both values default to 5 minutes, which is short enough to bound stale-data risk and long enough to absorb polling bursts against the same document.
IDistributedCache.GetStringAsync returns null vs empty string. The standard GetStringAsync extension calls GetAsync internally and converts a null byte array to null and a non-null byte array (including an empty one) to a string. Some mock implementations (including NSubstitute's default) return an empty byte[] rather than null for unconfigured calls, which causes GetStringAsync to return "" — an empty string that passes a is not null guard and crashes JSON deserialization. Both handlers use !string.IsNullOrEmpty(cached) as the cache-hit guard to handle this correctly in both production and test environments.
The console client creates a single HttpClient instance in Program.cs, holds it for the lifetime of the process, and disposes it on exit. IHttpClientFactory was deliberately not used: it solves socket exhaustion and DNS staleness caused by creating and disposing many short-lived HttpClient instances, neither of which applies to a single-instance, single-run console application.
The API applies pending EF Core migrations synchronously at startup, and docker-compose.yml health checks enforce the correct startup order:
- SQL Server starts and passes its
SELECT 1health check. - The
db-initone-shot container creates the empty database (IF NOT EXISTS— idempotent on restarts). - The API starts, runs migrations, and reaches a healthy state.
- Workers start only after the API is healthy, so their first consumed message always finds a fully-provisioned schema.
The trade-off is that migration failures surface as a startup crash rather than a deployment-time error. For a production system, migrations would be applied by a dedicated migration job in the deployment pipeline.