Skip to content

ChristosKarathanasisac/DocuFlowPlatform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocuFlow

A distributed, multi-tenant document-processing platform built with .NET 8, Clean Architecture, and Domain-Driven Design. It provides a full authentication and authorisation system with JWT bearer tokens and refresh token rotation, a role-based admin panel, and an asynchronous two-stage document pipeline that scans .txt files for dangerous content and structured code patterns — all orchestrated via RabbitMQ and deployed through Docker Compose.


Table of Contents


Architecture Overview

┌──────────────────────────────────────────────────────────────────────┐
│  Console Client (DocuFlow.Client)                                    │
│  Scans ./docs for *.txt → POST /api/documents (parallel, up to 64)  │
└───────────────────────────────┬──────────────────────────────────────┘
                                │ HTTP + JWT Bearer
                                ▼
┌──────────────────────────────────────────────────────────────────────┐
│  REST API  (DocuFlowApi)                                             │
│                                                                      │
│  IdentityAccess Bounded Context        Documents Bounded Context     │
│  ─────────────────────────────         ────────────────────────────  │
│  POST   /api/authentication/register   POST   /api/documents         │
│  POST   /api/authentication/login      GET    /api/documents/{id}/   │
│  POST   /api/authentication/refresh          status | matches |      │
│  POST   /api/authentication/logout            content               │
│  POST   /api/authentication/                                         │
│         change-password                Admin Bounded Context         │
│  POST   /api/authentication/           ─────────────────────────    │
│         forgot-password                GET    /api/admin/dashboard  │
│  POST   /api/authentication/           GET    /api/admin/users      │
│         reset-password                 PUT    /api/admin/users/{id}/│
│  POST   /api/authentication/                  lock | unlock         │
│         revoke-all                     GET    /api/admin/tenants    │
│                                        GET    /api/admin/security-  │
│                                               events               │
│                                                                      │
│  Persists document → publishes DocumentSubmittedMessage via          │
│  MassTransit → RabbitMQ                                              │
└──────┬─────────────────────────────┬──────────────────┬─────────────┘
       │ SQL Server (EF Core)        │ Redis            │ RabbitMQ
       ▼                             ▼                  ▼
┌────────────────┐   ┌───────────────────────┐  ┌──────────────────────────┐
│  SQL Server    │   │  Redis Cache          │  │  Worker: DocumentProcess │
│                │   │                       │  │  • Finds "dangerous"     │
│  IdentityAccess│   │  matches:{docId}      │  │    (case-insensitive)    │
│  Documents     │   │  content:{tid}:{docId}│  │  • Publishes scanned evt │
│  Logs          │   │  (TTL from config)    │  └──────────────────────────┘
└────────────────┘   └───────────────────────┘            │ RabbitMQ
       ▲                                                   ▼
       │                                      ┌──────────────────────────┐
       └──────────────────────────────────────│  Worker: DocumentExtract │
                                              │  • Extracts [A-Z]{3,5}   │
                                              │    -?[0-9]{3} patterns   │
                                              │  • Marks doc Available   │
                                              └──────────────────────────┘

Bounded Contexts

DocuFlow is structured around three bounded contexts, each with its own Domain / Application / Infrastructure project triad. They share nothing except Shared.Contracts (domain error codes, domain messages, exception types, message contracts, and JWT constants).

Bounded Context Responsibility
IdentityAccess User registration, login, JWT token generation, refresh token rotation, account lockout, password reset, email verification, tenant management, security event auditing
Documents Document submission, per-tenant SHA-256 deduplication, asynchronous two-stage scan pipeline (dangerous word detection → regex extraction), tenant-scoped status and match querying
Admin (part of IdentityAccess.Application) Admin dashboard aggregates, cross-tenant user management, tenant CRUD, security event queries — restricted to the Admin role

Components

Project Type Description
DocuFlowApi ASP.NET Core 8 Web API REST endpoints for authentication, document processing, and admin operations
DocuFlow.Client .NET Console App Batch-submits .txt files from a local directory to the API (requires a valid JWT)
Worker.DocumentProcessing .NET Worker Service Consumes DocumentSubmittedMessage, scans for the word dangerous, publishes DocumentScannedMessage
Worker.DocumentExtractor .NET Worker Service Consumes DocumentScannedMessage, extracts [A-Z]{3,5}-?[0-9]{3} patterns, marks the document Available
IdentityAccess.Domain Class Library User, Tenant, RefreshToken, PasswordResetToken, EmailVerificationToken, SecurityEvent entities with domain behaviour — zero external dependencies
IdentityAccess.Application Class Library All authentication and admin handlers, handler interfaces, DTOs, and IPasswordHasher abstraction
IdentityAccess.Infrastructure Class Library EF Core persistence, JwtTokenService, BcryptPasswordHasher, repositories, unit of work
Documents.Domain Class Library Document, ScanMatch entities — zero external dependencies
Documents.Application Class Library Document-processing handlers, interfaces, DTOs
Documents.Infrastructure Class Library EF Core persistence, MassTransit/RabbitMQ setup, document repositories
Shared.Contracts Class Library DomainException, DomainErrorCodes, DomainMessages, JwtClaimTypes, JwtConfigurationKeys, message contracts
DocuFlowTest xUnit Test Project Unit tests for handlers, controllers, repositories, middleware, JWT service, and extensions

Processing Pipeline

  1. Client reads each .txt file (max 1 KB), obtains a JWT by calling POST /api/authentication/login, then sends POST /api/documents with the Bearer token attached.
  2. API deduplicates by SHA-256(fileName + "|" + content). If the document is new it is saved with status Processing and a DocumentSubmittedMessage is published to RabbitMQ via MassTransit.
  3. Worker.DocumentProcessing consumes DocumentSubmittedMessage, searches the content for the word dangerous (case-insensitive), persists each occurrence as a ScanMatch with its character position, then publishes DocumentScannedMessage.
  4. Worker.DocumentExtractor consumes DocumentScannedMessage, applies [A-Z]{3,5}-?[0-9]{3}, persists each regex match, and transitions the document status to Available.
  5. Callers poll GET /api/documents/{id}/status until available, then retrieve results via GET /api/documents/{id}/matches.

Authentication Flow

┌─────────┐  POST /register       ┌──────────────────┐
│  Client │ ─────────────────────►│  Register        │
│         │                       │  BCrypt hash pwd │
│         │  POST /login          │  Create user     │
│         │ ─────────────────────►│  Validate creds  │
│         │ ◄─────────────────────│  Issue JWT +     │
│         │  { accessToken,       │  refresh token   │
│         │    refreshToken }     └──────────────────┘
│         │
│         │  POST /refresh        ┌──────────────────┐
│         │ ─────────────────────►│  Validate refresh │
│         │ ◄─────────────────────│  token (SHA-256  │
│         │  { new accessToken,   │  hash lookup)    │
│         │    new refreshToken } │  Rotate token    │
│         │                       └──────────────────┘
│         │
│         │  POST /logout         ┌──────────────────┐
│         │ ─────────────────────►│  Revoke single   │
│         │                       │  refresh token   │
│         │  POST /revoke-all     └──────────────────┘
│         │ ─────────────────────►┌──────────────────┐
│         │                       │  Revoke all user  │
│         │                       │  refresh tokens  │
│         │                       └──────────────────┘
│         │
│         │  POST /change-password ┌─────────────────┐
│         │ ──────────────────────►│ Verify current  │
│         │                        │ pwd, hash new,  │
│         │                        │ rotate stamp,   │
│         │                        │ optionally      │
│         │                        │ revoke all      │
│         │                        └─────────────────┘
│         │
│         │  POST /forgot-password  ┌────────────────┐
│         │ ───────────────────────►│ Issue reset    │
│         │                         │ token (SHA-256 │
│         │  POST /reset-password   │ stored hash)   │
│         │ ───────────────────────►│ Validate token │
│         │                         │ Set new pwd    │
└─────────┘                         └────────────────┘

Token lifecycle:

  • Access token: short-lived JWT (15 minutes), ClockSkew = TimeSpan.Zero
  • Refresh token: long-lived (7 days), stored as a SHA-256 hash in the database — the plaintext is only ever sent over the wire and never persisted
  • Refresh token rotation: every call to /refresh revokes the old token and issues a new pair
  • Security stamp: every password change and token revocation increments the user's security stamp — issued tokens that predate the stamp change become invalid on the next validation

Prerequisites


Setup & Running

All server-side services (SQL Server, RabbitMQ, Redis, API, workers) run via Docker Compose. The client is run manually once the stack is healthy.

1. Configure environment

# In the Infra/ directory
cp Infra/.env.template Infra/.env
# Edit Infra/.env and set your passwords for SQL_SA_PASSWORD, RABBITMQ_PASSWORD, REDIS_PASSWORD

Each application project also needs its own appsettings.json (and optionally appsettings.Development.json) created from the committed templates:

# API
cp DocuFlow/DocuFlowApi/appsettings.template.json               DocuFlow/DocuFlowApi/appsettings.json
cp DocuFlow/DocuFlowApi/appsettings.Development.template.json   DocuFlow/DocuFlowApi/appsettings.Development.json

# Workers
cp DocuFlow/Worker.DocumentProcessing/appsettings.template.json              DocuFlow/Worker.DocumentProcessing/appsettings.json
cp DocuFlow/Worker.DocumentProcessing/appsettings.Development.template.json  DocuFlow/Worker.DocumentProcessing/appsettings.Development.json

cp DocuFlow/Worker.DocumentExtractor/appsettings.template.json               DocuFlow/Worker.DocumentExtractor/appsettings.json
cp DocuFlow/Worker.DocumentExtractor/appsettings.Development.template.json   DocuFlow/Worker.DocumentExtractor/appsettings.Development.json

Update each copied file with the same passwords you set in Infra/.env.

2. Start the stack

# From the repo root
docker compose -f Infra/docker-compose.yml up --build

The API is available at http://localhost:8080 (OpenAPI UI at http://localhost:8080/swagger).
The RabbitMQ management console is at http://localhost:15672.

EF Core migrations are applied automatically by the API on startup.

3. Run the client manually

Once the stack is up, obtain a JWT first (via POST /api/authentication/login), then run:

cd DocuFlow/DocuFlow.Client
dotnet run

By default the client scans ./docs relative to its working directory. The docs/ folder at the root of the repository contains 50 pre-built test files covering a wide range of scenarios.


API Reference

All endpoints return responses wrapped in a consistent envelope:

{
  "success": true,
  "data": { },
  "requestId": "550e8400-e29b-41d4-a716-446655440000"
}

Authentication — POST /api/authentication/register

Register a new user within a tenant.

Request body:

{
  "email": "user@example.com",
  "password": "P@ssw0rd123!",
  "tenantId": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "role": "Reader"
}

Responses: 201 Created | 400 Bad Request | 409 Conflict (email already exists) | 429 Too Many Requests


Authentication — POST /api/authentication/login

Log in and receive a JWT access token and a refresh token.

Request body:

{
  "email": "user@example.com",
  "password": "P@ssw0rd123!",
  "tenantId": "3fa85f64-5717-4562-b3fc-2c963f66afa6"
}

Response 200 OK:

{
  "accessToken": "eyJ...",
  "refreshToken": "base64-opaque-token",
  "expiresIn": 900,
  "userId": "...",
  "email": "user@example.com",
  "role": "Reader"
}

Responses: 200 OK | 401 Unauthorized (invalid credentials) | 403 Forbidden (account locked)


Authentication — POST /api/authentication/refresh

Exchange a valid refresh token for a new access + refresh token pair. The old refresh token is revoked.

Request body:

{ "refreshToken": "base64-opaque-token" }

Responses: 200 OK (new token pair) | 401 Unauthorized (token invalid or expired)


Authentication — POST /api/authentication/logout

Revoke a single refresh token.

Request body:

{ "refreshToken": "base64-opaque-token" }

Responses: 200 OK | 404 Not Found


Authentication — POST /api/authentication/revoke-all (requires Bearer token)

Revoke all active refresh tokens for the authenticated user.

Responses: 200 OK | 401 Unauthorized


Authentication — POST /api/authentication/change-password (requires Bearer token)

Change the authenticated user's password. Optionally revokes all active tokens.

Request body:

{
  "currentPassword": "OldP@ss!",
  "newPassword": "NewP@ss!",
  "revokeAllTokens": true
}

Responses: 200 OK | 400 Bad Request (invalid current password) | 401 Unauthorized


Authentication — POST /api/authentication/forgot-password

Request a password-reset token (token is returned in the response in this demo; a production system would email it).

Request body:

{ "email": "user@example.com" }

Responses: 200 OK


Authentication — POST /api/authentication/reset-password

Reset a password using a reset token.

Request body:

{
  "email": "user@example.com",
  "token": "opaque-reset-token",
  "newPassword": "NewP@ss!"
}

Responses: 200 OK | 400 Bad Request (expired/used token) | 404 Not Found


Documents — POST /api/documents (requires Bearer token)

Submit a new document for processing.

Request body:

{
  "fileName": "report.txt",
  "content": "The PSV-123 part is dangerous."
}

Responses:

  • 201 Created — document accepted and queued. Body: { "id": "<guid>" }
  • 200 OK — duplicate detected (same file name + content already exists). Body: { "id": "<guid>" }

Documents — GET /api/documents/{id}/status (requires Bearer token)

Returns the current processing status.

Response 200 OK:

{ "status": "processing" }

Possible values: unknown | processing | available


Documents — GET /api/documents/{id}/matches (requires Bearer token)

Returns all scan matches once the document is available.

Response 200 OK:

[
  { "position": 4,  "matchType": "dangerous", "matchValue": "dangerous" },
  { "position": 18, "matchType": "pattern",   "matchValue": "PSV-123"   }
]
  • 202 Accepted — processing not yet complete; poll /status and retry.
  • 404 Not Found — unknown document ID.

Documents — GET /api/documents/{id}/content (requires Bearer token)

Returns the original document text.

Response 200 OK:

{ "fileName": "report.txt", "content": "The PSV-123 part is dangerous." }

Admin — GET /api/admin/dashboard (requires Admin role)

Returns aggregate platform statistics: total users, active tenants, recent security events, etc.


Admin — GET /api/admin/users / GET /api/admin/users/{userId} (requires Admin role)

List all users or retrieve a specific user's details including lock status, role, tenant, and last login.


Admin — PUT /api/admin/users/{userId}/lock / PUT /api/admin/users/{userId}/unlock (requires Admin role)

Administratively lock or unlock a user account.


Admin — GET /api/admin/tenants / POST /api/admin/tenants (requires Admin role)

List all tenants or create a new tenant.


Admin — GET /api/admin/security-events (requires Admin role)

Query the security event audit trail with optional filtering by tenant, user, or event type.


Configuration

Infra/.env

Created from Infra/.env.template. Controls infrastructure passwords shared by all Docker containers.

Variable Default Description
SQL_SA_PASSWORD YourStrong!Password123 SQL Server SA password
RABBITMQ_USER docuflow RabbitMQ username
RABBITMQ_PASSWORD YourRabbitMQPassword123 RabbitMQ password
REDIS_PASSWORD YourRedisPassword123 Redis password

DocuFlowApi/appsettings.json

Created from DocuFlowApi/appsettings.template.json.

Key Description
ConnectionStrings:DefaultConnection SQL Server connection string
RabbitMQ:Host / Username / Password MassTransit broker connection
Redis:ConnectionString StackExchange.Redis connection string
Jwt:SecretKey HMAC-SHA256 signing key (minimum 32 characters)
Jwt:Issuer / Audience JWT validation parameters
Jwt:AccessTokenExpirationMinutes Access token lifetime (default: 15)
Jwt:RefreshTokenExpirationDays Refresh token lifetime (default: 7)
Authentication:MaxFailedLoginAttempts Failed attempts before lockout (default: 5)
Authentication:AccountLockoutDurationMinutes Lockout duration (default: 15)
Cache:DocumentMatchesTtlMinutes TTL for cached scan-match results (default: 5)
Cache:DocumentContentTtlMinutes TTL for cached document-content results (default: 5)

Worker appsettings.json

Created from the respective appsettings.template.json in each worker project.

Key Description
ConnectionStrings:DefaultConnection SQL Server connection string
RabbitMQ:Host / Port / Username / Password MassTransit broker connection

Running Tests

cd DocuFlow
dotnet test DocuFlowTest/DocuFlowTest.csproj

Tests cover:

  • Authentication controller — all 8 endpoints: register, login, refresh, logout, revoke-all, change-password, forgot-password, reset-password.
  • Admin controller — dashboard, user management, tenant management, security events.
  • JWT service — token generation, claim integrity, secret key validation, expiry enforcement.
  • Repository testsUserRepository, RefreshTokenRepository, TenantRepository, PasswordResetTokenRepository, EmailVerificationTokenRepository, SecurityEventRepository — all backed by EF Core InMemory with per-test database isolation.
  • MiddlewareGlobalExceptionMiddleware (exception-to-HTTP mapping), RequestIdMiddleware (correlation ID injection).
  • ExtensionsHttpContextExtensions (GetClientIpAddress, GetUserId from JWT claims).

Design Decisions

Clean Architecture & Bounded Context Separation

Why structure the projects this way? A common failure mode in service projects is allowing business logic to become entangled with infrastructure (database queries, HTTP calls, broker configuration). When that happens, unit tests require real databases or real brokers, feedback loops slow down, and swapping infrastructure becomes a rewrite.

DocuFlow enforces a strict, provable dependency direction:

Domain ← Application ← Infrastructure ← API

Every .csproj is the proof: IdentityAccess.Application.csproj references only IdentityAccess.Domain and Shared.Contracts. It has no knowledge of EF Core, BCrypt, or JWT libraries. The IPasswordHasher interface lives in Application; BcryptPasswordHasher lives in Infrastructure. The IJwtTokenService interface lives in Application; the implementation that signs HMAC-SHA256 tokens lives in Infrastructure. All DI registrations follow the same rule: Application handlers are registered in the Application layer's DependencyInjection.cs; repositories and infrastructure services are registered in the Infrastructure layer's DependencyInjection.cs.

The two bounded contexts (IdentityAccess and Documents) communicate only through Shared.Contracts message types. There is no direct project reference from Documents.* to IdentityAccess.* or vice versa. A new bounded context can be added with no changes to existing code.

The consequences:

  • Testability: Handler logic is tested against NSubstitute mock interfaces. No database, broker, or cryptographic library needs to be running for unit tests to pass.
  • Infrastructure replaceability: Switching BCrypt to Argon2 is a change to BcryptPasswordHasher in Infrastructure. Switching SQL Server to PostgreSQL is a change to the Infrastructure .csproj and EF Core configuration. No business logic changes.
  • Onboarding clarity: A new developer can read the Domain and Application layers to understand what the system does, without needing to understand how it persists or communicates.

Document Status Flow

A document moves through three states: Unknown → Processing → Available.

  • Unknown — the document has been persisted by the API but the Scanner worker has not yet picked it up. This window is normally very short (milliseconds), but it exists because the status is intentionally not set to Processing at insert time. Setting it at the API layer would require the API to know about worker semantics; instead, the Scanner owns the Unknown → Processing transition when it begins work.
  • Processing — the Scanner has started scanning. The document will stay in this state until the Extractor finishes and marks it Available. Both workers guard against regressing a status that has already advanced.
  • Available — the Extractor has completed pattern extraction and the full set of matches is ready to be queried.

This forward-only, three-state model keeps the logic simple: any worker can check the current status and decide whether to proceed, skip, or treat the message as a duplicate — without needing distributed locks or version vectors.


Idempotency

Why it matters here: In a distributed system with retries at every layer — HTTP client retries, at-least-once message delivery, and operator-triggered restarts — the same operation will inevitably be executed more than once. Without idempotency guarantees, each retry risks creating duplicate documents, duplicate match rows, or double-processing a document that was already scanned.

Idempotency is enforced at multiple independent layers:

Document submission (API)
Before inserting, the API computes SHA-256(fileName + "|" + content). A unique index on ContentHash enforces uniqueness at the storage level. If a concurrent insert races and wins, the application catches the DbUpdateException, re-queries the existing document ID, and returns 200 OK with that ID.

Message delivery (MassTransit)
DocumentSubmittedMessage is published inside a MassTransit ISendEndpoint call after save. If a worker crashes mid-processing and the message is redelivered, the handler checks the document's current status before writing matches.

Match persistence
A unique composite index on (DocumentId, Position, MatchType) means a duplicate match row from a replayed message is silently rejected by the database.

Refresh token issuance
Each refresh token has a unique TokenHash (SHA-256 of the plaintext). Every refresh call issues a new token and revokes the old one atomically within a database transaction; there is no gap where both the old and new tokens are simultaneously valid.


Optimistic Concurrency

No traditional row-level locking is used anywhere. The system uses structural guards and database-level uniqueness constraints instead.

Document creation
AddAsyncSaveChangesAsync runs without a lock. When two clients submit the same document simultaneously, one insert succeeds and the other raises a DbUpdateException on the unique ContentHash constraint. The losing request recovers gracefully by falling back to FindIdByContentHashAsync.

Status transitions
UpdateStatusAsync uses EF Core's ExecuteUpdateAsync (a single UPDATE … WHERE id = @id) rather than load-then-save. The transition is always forward-only (Processing → Available) and unconditional. A guard in each handler also checks the document's current status before writing results:

Scanner:   if status is not Processing  → skip (already done or superseded)
Extractor: if status is not Processing  → skip (already done or superseded)

Fault Tolerance & Restart Safety

The design guarantee is: any service can be stopped and restarted at any point during processing and the document will still eventually reach Available with the correct matches.

Failure scenario Why recovery works
API crashes after DB insert, before RabbitMQ publish MassTransit at-least-once delivery ensures the message is republished on next send
RabbitMQ is temporarily unavailable Messages accumulate and are forwarded once the broker is reachable
Scanner worker crashes mid-processing Broker redelivers DocumentSubmittedMessage; the match uniqueness index absorbs already-written rows silently
Extractor worker crashes mid-processing Broker redelivers DocumentScannedMessage; same deduplication applies
Database restarts EF Core reconnects on the next request; healthcheck in Compose prevents dependent services from starting prematurely
Complete stack restart docker compose up re-applies migrations (idempotent IF NOT EXISTS)

At-least-once delivery is guaranteed by MassTransit + RabbitMQ. Messages that repeatedly fail are requeued with exponential back-off.


Scalability

The architecture was designed for horizontal scale from the start, but a single-instance Compose is used deliberately for the F5 experience. None of the scaling steps below require code changes.

What to scale Why How
API Fully stateless — JWT is self-contained, Redis cache is distributed, no in-memory session Add replicas behind any HTTP load balancer
Scanner / Extractor workers MassTransit competing consumers — replicas share one named queue, broker distributes evenly Increase replica count: docker compose up --scale
Database EF Core abstraction makes read/write replica routing a connection-string–level config Add read replicas; tune MaxPoolSize
Redis Session-less design means Redis is used only for distributed cache, not session state Redis Cluster or Sentinel without application changes

Why There Is No Pagination on Matches

The document size cap (1 KB) makes the response size self-limiting. A 1 KB document can produce at most roughly 100–170 matches in a pathological, densely-packed case — a payload that fits in a single network frame. Adding pagination would introduce cursor or offset management, additional query parameters to validate, and extra test cases for a code path that can never be triggered under the stated constraints.


Bytes vs Characters

For purely ASCII content — which all realistic test documents use — 1,024 characters and 1,024 bytes are identical. Match positions are currently stored as character offsets (the value returned by string.IndexOf and Regex.Match.Index). A production-hardened solution would validate Encoding.UTF8.GetByteCount(content) <= 1024 and compute positions on the UTF-8 byte representation. This is noted so that any consumer performing byte-level operations on the content is aware of the distinction.


Two-Stage Worker Pipeline

Combining scan and extraction in one worker would be simpler to deploy but would tightly couple two independent concerns. The chosen design connects them through a DocumentScannedMessage event:

  • Independent scalability: If pattern extraction becomes a bottleneck, the Extractor can be scaled up without touching the Scanner.
  • Independent failure domains: A crash in the Extractor does not prevent the Scanner from processing new documents.
  • Open/closed for extension: A third worker (e.g., a PII detector) can subscribe to DocumentScannedMessage without modifying existing workers.

Refresh Token Rotation

Why not just use a long-lived JWT? A long-lived JWT cannot be invalidated server-side. If a token is stolen, the attacker has access until it expires. Refresh token rotation addresses this in two ways.

First, the refresh token is never stored in plaintext. JwtTokenService.GenerateTokensAsync returns the plaintext token over the wire but immediately stores only SHA-256(token) in the database. An attacker who dumps the database cannot reconstruct active tokens.

Second, every call to /refresh atomically revokes the old token and issues a new pair within a database transaction. The old TokenHash is marked as revoked before the new one is written. There is no window where two valid refresh tokens for the same session exist simultaneously.

Third, user.UpdateSecurityStamp() is called on every password change and full token revocation. The security stamp is embedded in the JWT claims at issuance time — any token issued before the stamp was rotated carries an outdated stamp value. A guard in JwtTokenService.ValidateRefreshTokenAsync rejects tokens whose associated user has a different security stamp than the one embedded in the access token.

The access token has a 15-minute lifetime with ClockSkew = TimeSpan.Zero (see below), so even if a stolen access token were to leak, the window of exploitability is narrow and bounded.


Password Hashing Abstraction

BCrypt.Net.BCrypt never appears in the Application layer. The IPasswordHasher interface in IdentityAccess.Application.Common.Interfaces exposes only HashPassword(string) and Verify(string, string). BcryptPasswordHasher in Infrastructure implements it using BCrypt with a work factor of 12.

Why this matters: If the hashing algorithm needs to change (e.g., migrating to Argon2), the Application layer handlers are untouched. Only BcryptPasswordHasher and its registration in Infrastructure DI need to change. More immediately: unit tests for LoginHandler, RegisterHandler, ChangePasswordHandler, and ResetPasswordHandler can inject a no-op IPasswordHasher mock and test the surrounding orchestration logic without BCrypt's deliberate CPU cost.


JWT ClockSkew = Zero

By default ASP.NET Core's JWT bearer handler adds a 5-minute ClockSkew to all token lifetime validations — a token that should expire at 14:00 is actually accepted until 14:05. This is intended to absorb clock drift between servers. DocuFlow sets ClockSkew = TimeSpan.Zero deliberately.

Why: The access token lifetime is 15 minutes. A 5-minute skew allowance extends the effective lifetime to 20 minutes — a 33% increase in the window during which a stolen access token remains valid. Since all services run as containers on the same Docker network (synchronised system clock), there is no legitimate clock drift to absorb. The overhead of the extra 5 minutes is all risk and no benefit for this deployment topology.


Account Lockout & Security Events

Account lockout is enforced in LoginHandler. After MaxFailedLoginAttempts (configurable, default 5) consecutive failed passwords for a user, user.LockAccount(DateTimeOffset lockoutEnd) is called — a domain method on the User entity that sets LockoutEnd and IsLockedOut = true. The lockout duration is configurable (default 15 minutes). A subsequent login attempt on a locked account returns 403 Forbidden before BCrypt verification is attempted, preventing timing-based enumeration.

Security events are written to the SecurityEvents table for every significant authentication action: registration, login success, login failure, password change, password reset, account lockout, and token revocation. These are queryable by admins through GET /api/admin/security-events. Every event records the user ID, tenant ID, event type, IP address, user agent, timestamp, and a human-readable detail string — providing a complete audit trail without requiring a separate logging infrastructure.


Rate Limiting

ASP.NET Core's built-in rate limiter (AddRateLimiter in Program.cs) is configured with multiple named policies mapped to endpoint sensitivity:

Policy Endpoint Limit
auth-register POST /register Strict — prevents automated account creation
auth-password-reset POST /reset-password Strict — prevents token enumeration
auth-general POST /change-password, POST /revoke-all Moderate
admin-write PUT /admin/users/{id}/lock Moderate — prevents automated lockout attacks

Endpoints with no rate-limiting annotation (e.g., GET /documents/{id}/status) are unrestricted. Rate limit violations return 429 Too Many Requests.


Multi-Tenancy

Every User belongs to a Tenant. The TenantId is required on POST /register and POST /login.

Identity isolation is enforced at the repository level — queries against User, RefreshToken, and related entities always include a WHERE TenantId = @tenantId predicate derived from the authenticated user's JWT claim. A user in Tenant A cannot see, modify, or act on identity resources in Tenant B, regardless of their role.

Document isolation follows the same pattern. Every Document row carries a TenantId column (FK-free by design — the Documents bounded context does not reference IdentityAccess tables). All four document API handlers (Submit, GetStatus, GetMatches, GetContent) accept a tenantId parameter extracted from the JWT claim via HttpContext.GetTenantId(). Each repository method scopes its query with WHERE TenantId = @tenantId && Id = @id, so a user in Tenant A will receive a 404 Not Found for a document that exists but belongs to Tenant B.

Deduplication is per-tenant. The unique index on the Documents table is (TenantId, ContentHash) rather than just ContentHash. The same file submitted by two different tenants creates two fully independent documents, each going through the normal pipeline and owned exclusively by its tenant.

The Admin role is a cross-tenant role: admin handlers in IdentityAccess.Application.Admin query across tenants by design, which is why they are restricted behind the AdminOnly policy.


Distributed Cache (Read-Through)

Redis is already present in the stack as the distributed cache backing for ASP.NET Core's rate-limiter state. Two additional read endpoints now participate in a cache-aside (read-through) pattern using IDistributedCache:

Endpoint Cache key TTL source
GET /api/documents/{id}/matches matches:{documentId} Cache:DocumentMatchesTtlMinutes
GET /api/documents/{id}/content content:{tenantId}:{documentId} Cache:DocumentContentTtlMinutes

Why these two endpoints? Both return data that is immutable once it exists. Scan matches are written exactly once by the two workers; document content is never modified after submission. Every subsequent request for the same resource is a pure repeat read — the ideal case for caching.

Why cache-aside and not cache-on-write? The workers write directly to SQL Server and have no dependency on the Redis cache. Populating the cache on the write path would require the workers to reference IDistributedCache, coupling infrastructure concerns across bounded-context boundaries. Instead, the API handler populates the cache lazily on the first read and serves subsequent requests from Redis, keeping the workers free of any cache awareness.

Tenant isolation for content. The matches cache key uses only documentId because a documentId is already globally unique — a document belongs to exactly one tenant, so there is no cross-tenant leak risk. The content cache key includes tenantId as an explicit defence-in-depth measure, since the content payload includes the raw file data.

TTL from configuration. The expiry is read from DocumentCacheOptions (bound to the Cache section in appsettings.json) rather than being hard-coded, so it can be tuned per environment without a rebuild. Both values default to 5 minutes, which is short enough to bound stale-data risk and long enough to absorb polling bursts against the same document.

IDistributedCache.GetStringAsync returns null vs empty string. The standard GetStringAsync extension calls GetAsync internally and converts a null byte array to null and a non-null byte array (including an empty one) to a string. Some mock implementations (including NSubstitute's default) return an empty byte[] rather than null for unconfigured calls, which causes GetStringAsync to return "" — an empty string that passes a is not null guard and crashes JSON deserialization. Both handlers use !string.IsNullOrEmpty(cached) as the cache-hit guard to handle this correctly in both production and test environments.


HttpClient Usage in the Console Client

The console client creates a single HttpClient instance in Program.cs, holds it for the lifetime of the process, and disposes it on exit. IHttpClientFactory was deliberately not used: it solves socket exhaustion and DNS staleness caused by creating and disposing many short-lived HttpClient instances, neither of which applies to a single-instance, single-run console application.


Auto-Migration on Startup

The API applies pending EF Core migrations synchronously at startup, and docker-compose.yml health checks enforce the correct startup order:

  1. SQL Server starts and passes its SELECT 1 health check.
  2. The db-init one-shot container creates the empty database (IF NOT EXISTS — idempotent on restarts).
  3. The API starts, runs migrations, and reaches a healthy state.
  4. Workers start only after the API is healthy, so their first consumed message always finds a fully-provisioned schema.

The trade-off is that migration failures surface as a startup crash rather than a deployment-time error. For a production system, migrations would be applied by a dedicated migration job in the deployment pipeline.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages