Skip to content

Add refresh token rotation with session management and replay attack detection#821

Merged
tjementum merged 2 commits intomainfrom
pp-204-backend-for-refresh-token-rotation-management
Jan 6, 2026
Merged

Add refresh token rotation with session management and replay attack detection#821
tjementum merged 2 commits intomainfrom
pp-204-backend-for-refresh-token-rotation-management

Conversation

@tjementum
Copy link
Member

Summary & Motivation

Implement secure refresh token rotation with replay attack detection by introducing a new Session aggregate that tracks refresh token chains in the database. This addresses a security gap where refresh tokens were previously stateless JWTs without server-side validation, making it impossible to detect token reuse or revoke sessions.

  • Add Session aggregate to track refresh token chains with version tracking and grace period support
  • Implement replay attack detection that revokes the entire session when an outdated token is used outside the grace period
  • Integrate session creation into login, signup, and tenant switch flows
  • Add session revocation on logout
  • Add API endpoint to retrieve active sessions for the current user
  • Fix IP address forwarding in YARP by adding AddXForwarded() transform, ensuring client IP addresses are correctly captured in sessions

Grace period explained: When a refresh token is rotated, the previous token version remains valid for a 30-second grace period. This handles parallel requests where multiple API calls may use the same refresh token simultaneously. If request A triggers a token refresh while request B is still in flight with the old token, request B can still complete successfully within the grace period. Outside this window, using an old token is treated as a replay attack and the entire session is revoked, forcing re-authentication.

Downstream projects

Please note that this will invalidate all existing refresh tokens, and all users will be logged out after deploying this.

If the default HTTP request timeout (DEFAULT_TIMEOUT in shared-webapp/infrastructure/http/httpClient.ts) has been extended beyond 30 seconds, also increase GracePeriodSeconds in account-management/Core/Features/Authentication/Domain/Session.cs to match. These values must stay in sync to prevent legitimate parallel requests from being incorrectly flagged as replay attacks.

Checklist

  • I have added tests, or done manual regression tests
  • I have updated the documentation, if necessary

@tjementum tjementum self-assigned this Jan 6, 2026
@tjementum tjementum added the Deploy to Staging Set this label on pull requests to deploy code or infrastructure to the Staging environment label Jan 6, 2026
@linear
Copy link

linear bot commented Jan 6, 2026

@tjementum tjementum moved this to 🏗 In Progress in Kanban board Jan 6, 2026
@tjementum tjementum linked an issue Jan 6, 2026 that may be closed by this pull request
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 6, 2026

@github-actions
Copy link

github-actions bot commented Jan 6, 2026

Approve Database Migration account-management database on stage

The following pending migration(s) will be applied to the database when approved:

  • AddSessions (20260106041044_AddSessions)

Migration Script

BEGIN TRANSACTION;
IF NOT EXISTS (
    SELECT * FROM [__EFMigrationsHistory]
    WHERE [MigrationId] = N'20260106041044_AddSessions'
)
BEGIN
    CREATE TABLE [Sessions] (
        [TenantId] bigint NOT NULL,
        [Id] varchar(32) NOT NULL,
        [UserId] varchar(32) NOT NULL,
        [CreatedAt] datetimeoffset NOT NULL,
        [ModifiedAt] datetimeoffset NULL,
        [RefreshTokenJti] varchar(32) NOT NULL,
        [PreviousRefreshTokenJti] varchar(32) NULL,
        [RefreshTokenVersion] int NOT NULL,
        [DeviceType] varchar(20) NOT NULL,
        [UserAgent] nvarchar(500) NOT NULL,
        [IpAddress] varchar(45) NOT NULL,
        [RevokedAt] datetimeoffset NULL,
        [RevokedReason] varchar(20) NULL,
        CONSTRAINT [PK_Sessions] PRIMARY KEY ([Id]),
        CONSTRAINT [FK_Sessions_Tenants_TenantId] FOREIGN KEY ([TenantId]) REFERENCES [Tenants] ([Id]),
        CONSTRAINT [FK_Sessions_Users_UserId] FOREIGN KEY ([UserId]) REFERENCES [Users] ([Id]) ON DELETE CASCADE
    );
END;

IF NOT EXISTS (
    SELECT * FROM [__EFMigrationsHistory]
    WHERE [MigrationId] = N'20260106041044_AddSessions'
)
BEGIN
    CREATE INDEX [IX_Sessions_TenantId] ON [Sessions] ([TenantId]);
END;

IF NOT EXISTS (
    SELECT * FROM [__EFMigrationsHistory]
    WHERE [MigrationId] = N'20260106041044_AddSessions'
)
BEGIN
    CREATE INDEX [IX_Sessions_UserId] ON [Sessions] ([UserId]);
END;

IF NOT EXISTS (
    SELECT * FROM [__EFMigrationsHistory]
    WHERE [MigrationId] = N'20260106041044_AddSessions'
)
BEGIN
    INSERT INTO [__EFMigrationsHistory] ([MigrationId], [ProductVersion])
    VALUES (N'20260106041044_AddSessions', N'10.0.1');
END;

COMMIT;
GO

@tjementum tjementum added Enhancement New feature or request and removed Deploy to Staging Set this label on pull requests to deploy code or infrastructure to the Staging environment labels Jan 6, 2026
@tjementum tjementum merged commit a806cef into main Jan 6, 2026
28 of 29 checks passed
@github-project-automation github-project-automation bot moved this from 🏗 In Progress to ✅ Done in Kanban board Jan 6, 2026
@tjementum tjementum deleted the pp-204-backend-for-refresh-token-rotation-management branch January 6, 2026 21:33
tjementum added a commit that referenced this pull request Jan 11, 2026
…en refresh (#826)

### Summary & Motivation

Fix the token refresh race condition introduced in #821. When multiple
API requests are made concurrently and all attempt to refresh an expired
access token, only one succeeds while the others fail with
`DbUpdateConcurrencyException`, causing the user to be logged out. This
commonly occurs when returning to a browser tab after tokens have
expired (e.g., TanStack Query's `invalidateQueries()` triggers multiple
parallel requests).

- Implement atomic token refresh using an isolated database connection
that commits immediately, independent of EF Core's Unit of Work
transaction. This ensures only one concurrent request wins the refresh
race, while others fall back to the grace period mechanism using
`PreviousRefreshTokenJti`
- Update SQLite test connection strings to use shared cache mode
(`Cache=Shared`), enabling isolated connections to access the same
in-memory database. This allows the atomic refresh pattern using
`Activator.CreateInstance(existingConnection.GetType())` to work in
tests
- Simplify `RefreshTokenGenerator` API by consolidating `Generate` and
`Update` methods into a single `Generate` method with explicit version
and expiry parameters
- Reorder `UserInfoFactory` parameters to follow async conventions with
`cancellationToken` last
- Remove `SessionRefreshed` and `AuthenticationTokensRefreshed`
telemetry events as they add noise without providing meaningful business
value

### Downstream projects

Update the SQLite connection string in
`your-self-contained-system/Tests/EndpointBaseTest.cs` to use shared
cache mode, which allows isolated database connections to access the
same in-memory database:

```diff
-        // Create connection and add DbContext to the service collection
-        Connection = new SqliteConnection("DataSource=:memory:");
+        // Create connection using shared cache mode so isolated connections can access the same in-memory database
+        Connection = new SqliteConnection($"Data Source=TestDb_{Guid.NewGuid():N};Mode=Memory;Cache=Shared");
```

### Checklist

- [x] I have added tests, or done manual regression tests
- [x] I have updated the documentation, if necessary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement New feature or request

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

Backend for refresh token rotation management

1 participant