All notable changes to the Castellan Security Platform will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Status: PRODUCTION RELEASE - First Official Production Release
CastellanAI v1.0.0 marks the first official production release of our AI-powered Windows security monitoring platform. This release represents the culmination of intensive development through v0.1.0 - v0.9.0, delivering a complete, enterprise-ready security monitoring solution with groundbreaking conversational AI capabilities.
-
Conversational AI Chat Interface (v0.8.0 - v0.9.0): Natural language security analysis at
/chat- 7 intent types: Query, Investigate, Hunt, Compliance, Explain, Action, Conversational
- RAG context retrieval with vector search, critical events, and correlation patterns
- Multi-turn conversations with database-backed conversation history
- Citations with clickable links to security events
- Suggested actions with confirmation dialogs
- Follow-up question recommendations
- Markdown rendering with syntax highlighting
-
Human-in-the-Loop Action Execution (v0.9.0): Execute security actions with full audit trail
- Action types: Block IP, Isolate Host, Quarantine File, Add to Watchlist, Create Ticket
- Undo/Rollback capability with 24-hour window for reversible actions
- Action History panel tracking all executed actions with before/after states
- Confirmation dialogs with impact warnings
- Visual indicators for reversible vs non-reversible actions
-
Citation Linking (v0.9.0): Click citations in AI responses to view full event details
- Citations displayed under AI responses with event details
- Clickable citation cards navigate to
/security-events/{id} - Relevance scores shown as percentages
- ExternalLink icons for visual clarity
-
Shared Modal Components (v1.0.0): Consistent UX across Dashboard and Security Events
- Created
SecurityEventDetailModal.tsxas shared component - RecentActivity items on Dashboard now open modal dialog
- Same detailed view as Security Events page
- Includes all event fields (risk level, MITRE, scores, IPs, etc.)
- Created
-
Comprehensive Disclaimer (v1.0.0): Production deployment guidelines
- Clear usage guidelines for open source experimental platform
- Security considerations and professional review recommendations
- Liability limitation and warranty disclaimer
- Production deployment recommendations
-
AI Intelligence Upgrades (v0.7.0 - v0.8.0):
- Embedding cache with 30-70% fewer API calls
- Polly resilience patterns (retry, circuit breaker, timeout)
- Strict JSON validation with 97%+ parse success
- Hybrid search (70% vector + 30% metadata)
- OpenTelemetry tracing with Jaeger/Zipkin support
- Multi-model ensemble with 20-30% accuracy improvement
- Automated evaluation framework
-
Performance Optimizations:
- React Query snapshot caching (30min memory, 24h localStorage, <50ms page loads)
- Dashboard instant load with skeleton screens
- Database connection pooling (100 max connections, WAL mode)
- Worker performance optimization (5-9x throughput improvement)
- Malware scanning optimization (4-8x concurrent throughput)
-
Security Enhancements:
- BCrypt password hashing for secure password storage
- JWT token blacklisting with server-side token invalidation
- Refresh token system with secure token rotation
- Audit trail for all security actions
- Admin user configuration via environment variables
- 24-Hour Rolling Window: Automatic event cleanup maintains 24-hour scope for AI analysis
- Single Database: Consolidated
/data/castellan.dbfor all components - Malware Detection: 70 active YARA rules with automatic updates and deduplication
- React Dashboard: Modern UI with Tailwind CSS and real-time SignalR updates
- Conversational AI: Complete chat backend and frontend at
/chat
- 24-Hour Event Retention: Security events older than 24 hours are automatically deleted
- Single-User Mode: One admin user configured via environment variables
- Windows-Only: Currently supports Windows Event Log monitoring only
- Qdrant Dependency: Vector search requires Qdrant Docker container
See KNOWN_ISSUES.md for complete details and workarounds.
Breaking Changes: None
Steps:
- No database migrations required
- No configuration changes required
- Update dashboard:
npm installandnpm run build - Restart services:
.\scripts\stop.ps1then.\scripts\start.ps1
Open Source Roadmap:
- Export to PDF (chat conversations as incident reports)
- Streaming responses (token-by-token)
- Rate limiting (10 messages/minute)
- Virtual scrolling for long conversations
- Enhanced input validation
Pro Version: Multi-user RBAC, extended retention, compliance reporting, PostgreSQL, multi-tenancy, professional support
See RELEASE_NOTES_v1.0.0.md for complete release notes.
Status: COMPLETE - Performance & Caching Overhaul with Worker Optimization and Threat Scanner System
- Worker Performance Optimization: 5-9x throughput improvement through Sprint 1-3 optimizations
- Threat Scanner System: Complete on-demand scanning with real-time progress tracking
- Tailwind Dashboard: Full dashboard implementation at port 3000 with all pages
- Database Connection Pooling: EF Core PooledDbContextFactory with health monitoring
- React Query Caching: 30min memory retention, 24h localStorage persistence for instant page loads
Critical Bug Fix: Resolved configuration not persisting issue in Threat Scanner settings
Problem:
- Users could save Threat Scanner configuration (e.g., disable scheduled scans)
- Backend returned 200 OK success response
- Configuration appeared to save successfully
- On page refresh, settings reverted to defaults (
Enabled = true) - File was being saved correctly with user's choice (
"enabled": false) - But when reading back, configuration always returned default values
Root Cause:
- JSON serialization used
camelCasenaming policy (file:"enabled": false) - C# class properties use
PascalCasenaming (Enabled) JsonSerializerOptionswas missingPropertyNameCaseInsensitive = true- Deserialization failed silently and fell back to
new ThreatScanOptions()with defaults
Solution:
- Added
PropertyNameCaseInsensitive = truetoJsonSerializerOptionsinThreatScanConfigurationService.cs - Added comprehensive debug logging to track configuration lifecycle
- Enhanced error handling with detailed logging for deserialization failures
Impact:
- Threat Scanner configuration now persists correctly across saves and browser refreshes
- User settings (Enable/Disable scheduler, scan intervals, exclusions) now properly saved and loaded
- Configuration file reads correctly despite case differences between JSON and C# properties
Files Modified:
src/Castellan.Worker/Services/ThreatScanConfigurationService.cs- Added case-insensitive deserialization and enhanced logging
Testing: Verified complete save/load cycle - checkbox state persists correctly after save and page refresh
UI/UX Improvements: Multiple configuration interface refinements for consistency and usability
Changes Made:
- Removed Info Boxes: Removed "About Auto-Updates" from MITRE Techniques tab and "About Threat Scanner" from Threat Scanner tab (documentation will be added separately)
- Notifications Configuration Rewrite: Complete overhaul to match backend API structure
- Removed non-existent
notificationTypescheckboxes - Added proper backend fields:
castellanUrl,rateLimitSettings(throttle minutes for Critical/High/Medium/Low) - Added Slack channel routing:
defaultChannel,criticalChannel,highChannel - Proper API integration with POST (create) and PUT (update) endpoints
- Backend model:
/api/notifications/configwith proper ID-based CRUD operations
- Removed non-existent
- Button Consistency:
- Malware Detection Rules import button text changed from "Import Now" to "Import Malware Detection Rules"
- MITRE import button changed from blue to green (
bg-green-600) to match YARA styling
- Pagination Fix: Malware Detection Rules page pagination now uses actual backend page size instead of hardcoded value
- Correctly displays "Showing X to Y of Z rules" based on actual rules returned per page
- Fixed totalPages calculation to use backend-reported page size
- Reset to page 1 when filters change
Files Modified:
dashboard/src/pages/Configuration.tsx- Notifications config rewrite, info box removals, MITRE button colordashboard/src/components/YaraConfigComponent.tsx- Import button text updatedashboard/src/pages/MalwareRules.tsx- Pagination calculations fixed
Complete Threat Scanner Configuration Tab: Implemented comprehensive threat scanner configuration interface in Tailwind Dashboard
Configuration Features:
- Scheduled Scans: Enable/disable scheduler with configurable interval (days/hours) using TimeSpan format
- Scan Type Selection: Choose between Quick Scan (high-risk locations) and Full Scan (all drives) as default
- Quarantine Settings: Enable/disable quarantine functionality with configurable directory path
- Performance Tuning: Max concurrent files (1-100), max file size (1-1000 MB), notification threshold (1-100)
- Directory Exclusions: Add/remove excluded directories with Enter key support and list display
- File Extension Exclusions: Add/remove excluded extensions (auto-prepends dot if missing) with chip display
- Real-time Status: Scanner status updates every 30 seconds showing current operation and schedule
- About Section: Feature information and configuration guidance
Technical Implementation:
- React Query Integration: Two queries (config and status) with 30s auto-refresh for status
- TimeSpan Parsing: Bidirectional conversion between .NET format "d.hh:mm:ss" and UI (days/hours inputs)
- Type Safety: Fixed parseScanInterval function to handle undefined values with guard clauses and fallback defaults
- Mutation with Cache: Save configuration with optimistic updates and automatic cache invalidation
- API Endpoints:
/api/scheduledscan/config(GET/PUT) and/api/scheduledscan/status(GET) - Component Structure: Eight logical sections in clean card-based layout with Tailwind CSS styling
- Error Handling: Comprehensive validation and error states with user-friendly messages
Files Modified:
dashboard/src/pages/Configuration.tsx- Added ThreatScannerConfig component (~520 lines)- Fixed runtime TypeError with type guards:
parseScanInterval(interval: string | undefined)
Dashboard Metric Updates: Improved dashboard cards for better system visibility
Card Changes:
- System Status Card: Replaced "Detection Rate" with "System Status" showing Healthy/Total Components ratio
- Status Color Coding: Green for healthy systems, yellow/red for issues
- Threat Scans Card: New card displaying total scans and last scan result with status-based color coding
- Enhanced Metrics: All cards now show 24-hour scope metrics (Events/24h, Open Events, Critical Threats)
Technical Changes:
- Updated consolidated dashboard API response with threat scanner metadata (lastScanResult, lastScanStatus, scanType)
- Enhanced SignalR broadcast to include scan progress and status updates
- Modified dashboard layout to accommodate new metrics
Complete React Dashboard: Implemented full-featured dashboard at ./dashboard (port 3000) with Tailwind CSS styling
New Pages:
- MITRE ATT&CK (
/mitre-attack): Techniques list with search, filters, import dialog, detail modal, statistics - Malware Detection Rules (
/malware-rules): Rules management with enable/disable, import, validation status, statistics - Security Event Detail (
/security-events/:id): Full event details with MITRE techniques, IP enrichment, analysis scores - System Status (
/system-status): Component health monitoring with auto-refresh, response times, uptime tracking - Configuration (
/configuration): Tabbed settings for Threat Intelligence, Notifications, IP Enrichment, YARA auto-update
Enhanced Pages:
- Security Events: Added all API fields (eventId, machine, user, mitreAttack, correlationScore, burstScore, anomalyScore, confidence, ipAddresses)
- Menu Navigation: Swapped text positions - "Threat Intelligence"/"MITRE ATT&CK", "Malware Detection"/"Malware Detection Rules"
Features:
- Authentication checks on all pages with automatic login redirect
- API response handling for PascalCase/camelCase mapping
- Dark mode support throughout
- Responsive design for mobile/tablet/desktop
- Loading states with spinners and skeleton screens
- Error handling with user-friendly messages
- Clickable event cards with hover effects
API Integration:
- Added MITRE API methods:
getMitreTechniques,getMitreStatistics,importMitreTechniques - Added YARA API methods:
getMalwareRules,getYaraStatistics,toggleYaraRule,deleteYaraRule,importMalwareRules - Corrected endpoints:
/api/settings/threat-intelligence,/api/yara-configuration
Malware Scanning Optimization: Fixed global lock bottleneck for 4-8x concurrent scanning throughput
Phase 4: Malware Scanning Optimization
- Minimized
_yaraLockscope - lock only held to get_compiledRulesreference - Moved entire scanning operation OUTSIDE the lock (lines 303-336)
- Enables true concurrent scanning up to
MaxConcurrentScanslimit (8 concurrent scans) - Added binary scanning support via reflection-based
ScanMemory()attempt - Avoids UTF-8 conversion overhead when
ScanMemoryis available - Graceful fallback to
ScanString()if binary scanning unavailable
Performance Impact:
- YARA throughput: 1 concurrent scan → 8 concurrent scans (4-8x improvement)
- Lock contention: Eliminated during scanning operations
- UTF-8 overhead: Reduced when binary scanning available
- Thread safety: Maintained via reference-based concurrency pattern
Major Performance Enhancement: Async database persistence and semaphore throttling fixes for 5-9x cumulative throughput improvement
Phase 1: Async Database Persistence
- Converted all security event database writes from synchronous to asynchronous operations
- Added
AddSecurityEventAsync(SecurityEvent, CancellationToken)toISecurityEventStoreinterface - Updated all implementations:
DatabaseSecurityEventStore,SignalRSecurityEventStore,InMemorySecurityEventStore,FileBasedSecurityEventStore Pipeline.csandWindowsEventLogWatcherService.csnow use async database writes- Database write latency reduced from 50-200ms to <10ms (80-95% reduction)
- Backward compatibility maintained through sync method delegation
- Updated test mocks:
MockSecurityEventStoreimplements async interface
Phase 3: Semaphore Throttling Fix
- Replaced inefficient non-blocking semaphore check + 1-second sleep with proper async waiting
TryAcquireSemaphoreAsyncnow usessemaphore.WaitAsync(timeout, ct)instead ofWait(0)- Eliminated 1-second
Task.Delaystalls on contention - Simplified caller logic - removed retry loop and manual delays
- Semaphore timeout configured to 10 seconds (from Sprint 1)
Performance Impact:
- Throughput: 160-320 eps → 400-960 eps (Additional 2-3x improvement, 5-9x total with Sprint 1)
- Database writes: Now fully async with <10ms p95 latency
- CPU utilization: 60-80% sustained (no more throttling stalls)
- Semaphore behavior: Proper async waiting, better CPU utilization during load
Configuration and Database Tuning: Low-risk optimizations for 2-3x throughput improvement
Phase 7: Configuration Tuning
Pipeline.MaxConcurrentTasks: 8 → 16 (better CPU utilization)Pipeline.SemaphoreTimeoutMs: 15000 → 10000 (faster backpressure response)Pipeline.SkipOnThrottleTimeout: false → true (drop events under load instead of blocking)Pipeline.MinCorrelationScoreThreshold: 0.1 → 0.15 (reduce noise)Pipeline.MinBurstScoreThreshold: 0.0 → 0.15 (reduce noise)Pipeline.MinAnomalyScoreThreshold: 0.0 → 0.15 (reduce noise)WindowsEventLog.ConsumerConcurrency: 4 → 8 (more event processing threads)YaraScanning.MaxConcurrentScans: 4 → 8 (enable more parallel scans)- Added
ConnectionPools.Databaseconfiguration section with MaxPoolSize=100, MinPoolSize=10
Phase 2 Option B: Disable Immediate Dashboard Broadcasts
WindowsEventLog.ImmediateDashboardBroadcast: true → false- Eliminated per-event dashboard broadcasts (90-100% overhead reduction)
- System relies on efficient 30-second periodic broadcasts
Phase 5: Database Performance Enhancements
- Applied SQLite PRAGMAs at startup in
Program.cs - WAL mode enabled for concurrent read/write operations
busy_timeout=5000msfor better lock handlingwal_autocheckpoint=1000for optimized WAL maintenance- Database optimizations applied via
DatabasePerformanceEnhancements.ApplyPerformanceEnhancementsAsync
Performance Impact:
- Throughput: 80-160 eps → 160-320 eps (2-3x improvement)
- Dashboard overhead: 90-100% reduction
- Lock contention: Reduced via WAL mode
- Concurrent operations: 16 max tasks (up from 8)
Version: v0.7.0 (October 2025)
- Customizable Templates: Production-ready notification templates with dynamic tag/placeholder support
- 8 Default Templates: 4 template types × 2 platforms (Teams and Slack)
- Template Types: SecurityEvent, SystemAlert, HealthWarning, PerformanceAlert
- Platforms: Microsoft Teams and Slack with platform-specific formatting
- Rich Formatting: Visual separators (━━━━━), organized sections with emoji headers (📋, 🖥️, 📊, 🎯, ✅)
- Professional Footer: "⚡ Powered by CastellanAI Security Platform" branding
- Dynamic Tags: 15+ supported tags for event data substitution
- Event data: {{DATE}}, {{HOST}}, {{USER}}, {{EVENT_ID}}, {{EVENT_TYPE}}, {{SEVERITY}}, {{RISK_LEVEL}}, {{CONFIDENCE}}
- Analysis: {{SUMMARY}}, {{MITRE_TECHNIQUES}}, {{RECOMMENDED_ACTIONS}}, {{CORRELATION_SCORE}}
- Networking: {{IP_ADDRESS}}, {{DETAILS_URL}}
- Formatting: {{BOLD:text}}, {{LINK:url|text}}
- Template Management UI: Configuration page interface for template editing
- Navigate to Configuration → Notifications → Message Templates
- Live preview with sample data
- Real-time syntax validation
- Enable/disable templates
- Automatic Initialization: TemplateInitializationService creates default templates on first startup
- File-Based Persistence: Templates stored in JSON format at
data/notification-templates.json - API Endpoints: Full CRUD REST API at
/api/notification-templates- GET/POST/PUT/DELETE operations with Admin authorization
- Template validation and preview endpoints
- Files Added:
DefaultTemplates.cs- Factory for rich production-ready templatesTemplateInitializationService.cs- IHostedService for automatic template creationFileBasedNotificationTemplateStore.cs- JSON persistence layer
- Frontend Integration: React components in Configuration.tsx for template management
- 8 Default Templates: 4 template types × 2 platforms (Teams and Slack)
- Sequential Pattern Filtering: Intelligent event filtering to reduce false positives
EventIgnorePatternServicewith pattern matching for benign event sequencesIgnorePatternOptionsconfiguration for customizable filtering rules- Support for filtering based on EventType, MITRE techniques, source machines, account names, logon types, and source IPs
- Time-window based pattern detection (configurable 30-second window)
- Option to filter ALL events from local machines with
FilterAllLocalEventssetting - Four pre-configured patterns for common benign scenarios (SYSTEM service logons, service accounts, DWM/UMFD, standalone T1078)
- Integration into Pipeline and WindowsEventLogWatcherService
- Registered in PipelineServiceExtensions for dependency injection
- SecurityEventsController IP Enrichment: Fixed stub
ParseEnrichedIPsmethod that returned hardcoded "Unknown" values- Implemented proper JSON deserialization for enrichment data
- Added support for both single object and array enrichment formats
- Case-insensitive property matching (ipAddress/IP, country/Country, etc.)
- Enhanced error handling with detailed logging
- IP enrichment now displays: IP address, country, city, ASN, and high-risk indicators
- Created helper methods:
ParseSingleEnrichment(),GetJsonString(),GetJsonBool()
- DatabaseSecurityEventStore Timezone Fix: Fixed dashboard/events list discrepancy caused by timezone mismatch
- Changed
Time.DateTimetoTime.UtcDateTimein ConvertToEntity method (line 149) - Events now consistently stored in UTC across all components
- Dashboard filtering now correctly uses UTC timestamps
- Eliminated 77 vs 677 event count discrepancies
- Changed
- Eliminated Duplicate Caching: Removed EnhancedDataProvider in-memory cache, now using React Query exclusively
- Single Source of Truth: All data caching managed by React Query with resource-specific TTLs
- New SimplifiedDataProvider: Lightweight data provider with only request deduplication (no caching)
- Centralized Configuration: Created
reactQueryConfig.tsfor consistent cache settings across all resources- Security Events: 15s TTL, 30s background polling
- Malware Detection Rules: 60s TTL, no polling
- System Status: 10s TTL, 15s background polling
- MITRE Techniques: 120s TTL, no polling
- Background Polling: Added
useBackgroundPollinghook for automatic cache refresh of critical resources- Critical resources: security-events, system-status, threat-scanner, yara-matches, timeline, dashboard
- Smart polling only when authenticated and online
- Automatic cleanup on component unmount
- SignalR Cache Invalidation: Real-time cache invalidation when SignalR receives updates
- Security events invalidate security-events and dashboard caches
- YARA matches invalidate yara-matches and security-events caches
- Correlation alerts invalidate security-events cache
- React Query DevTools: Added in development mode for cache debugging
- Standardized Query Keys: Consistent query key format across all pages using
queryKeyshelpers - Performance Improvements: Expected 95%+ cache hit rate (up from 81%), 30% less memory usage
- Breaking Change: EnhancedDataProvider deprecated, all pages now use SimplifiedDataProvider + React Query
Major Performance Enhancement: All three snapshot caching options implemented for instant page loads
1. Extended Memory Retention (30 Minutes)
- Increased
gcTimefrom 5 minutes to 30 minutes for all resources - Snapshots persist in memory 6x longer
- Navigate between pages instantly within 30-minute window
- Resources updated:
- Security Events, YARA Matches, System Status, Threat Scanner: 30min (was 2-5min)
- Timeline, Dashboard, Security Event Rules: 30min (was 5-10min)
- Malware Detection Rules, MITRE Techniques, Configuration: 30min (was 10-30min)
2. localStorage Persistence (24 Hours)
- Added React Query Persist plugin for browser storage
- Cache survives page refresh and browser restart
- Persisted for 24 hours in localStorage under
CASTELLAN_CACHE_v1key - Only successful queries persisted (errors excluded)
- Throttled to save once per second (performance optimization)
- Auto-hydrates on page load for instant initial render
- New functions:
createCachePersister()- Creates localStorage persister with throttlingsetupCachePersistence()- Configures persistence for QueryClient
3. Placeholder Data (Instant Snapshots)
- Added
placeholderData: keepPreviousDataglobally to all queries - Zero loading spinners during background refetch
- Always shows previous data while fetching new data
- Seamless, app-like navigation experience
- Applied globally via
createConfiguredQueryClientdefaults - All Tailwind Dashboard pages and custom queries benefit automatically
Performance Impact:
- Instant navigation: Pages load < 50ms when cached (vs 200-2000ms before)
- Survives refresh: Cached data available immediately after browser restart
- No loading states: Previous data shown during refetch (smooth UX)
- Memory efficient: Snapshots automatically cleaned after 30 minutes
- localStorage size: ~2-5MB for typical session data
User Experience:
- Navigate between any pages → Instant (no loading)
- Close browser, reopen → Instant (loads from localStorage)
- Change filters → Instant (shows old data while fetching new)
- Background polling → Invisible (updates happen seamlessly)
Bundle Impact: Main bundle increased 1.81 KB (306.85 KB total) for persistence library
- Dashboard Not Using React Query: Converted Dashboard component from manual fetch() calls to React Query
- Replaced direct
fetch()withuseQueryhook for consolidated dashboard data - Dashboard data now cached for 15 seconds (fresh), 30 minutes (in memory)
- Automatic background polling every 30 seconds for real-time updates
- Instant page navigation when cache is fresh
- Added
placeholderData: keepPreviousDatafor instant snapshots - SignalR integration preserved - cache invalidation on real-time updates
- Removed manual
setDashboardDataLoadingandsetDashboardDataErrorstate management - Refresh function now uses
refetchDashboard()for instant cache-first behavior
- Replaced direct
- Configuration Not Using React Query: Converted Configuration page from manual state management to React Query
- Replaced manual
useEffect+dataProvider.getOne()calls withuseQueryhooks - Four separate queries for config sections: threat-intelligence, notifications, ip-enrichment, malware-rules
- Configuration data now cached for 5 minutes (fresh), 30 minutes (in memory)
- No background polling (static configuration data)
- Instant page navigation when cache is fresh
- Added
placeholderData: keepPreviousDatafor instant snapshots - Preserves form editing state while benefiting from cache
- Replaced manual
- Timeline Not Using React Query: Converted TimelinePanel from manual state management to React Query
- Replaced
useState+useEffectwithuseQueryhooks - Timeline data now cached for 30 seconds (fresh), 30 minutes (in memory)
- Automatic background polling every 60 seconds for real-time updates
- Instant page navigation when cache is fresh
- Added
placeholderData: keepPreviousDatafor instant snapshots
- Replaced
- Timeline Database Optimizations Verified: Confirmed all backend optimizations already implemented
- Database-level GROUP BY aggregation using SQLite
strftime()(TimelineService.cs:573-673) - MITRE technique optimization: 500 records (was 180K) for 360x improvement
- Database indexes: IX_SecurityEvents_Timestamp and composite indexes in place
- Combined with React Query: First load <2s, repeat visits <50ms (94% faster than original 32s)
- Database-level GROUP BY aggregation using SQLite
- Silenced Expected 404s: Removed console logging for batch endpoint fallback in
castellanDataProvider.ts- Batch endpoint check is expected behavior (not all resources have batch endpoints)
- Cleaner console output without noise from expected fallbacks
- Backend Warmup System: Implemented WarmupHostedService to reduce cold-start latency
- Runs automatically 15 seconds after application startup
- Primes EF Core connection pool with minimal database query
- Warms 6 configurable API endpoints (system-status, dashboard-consolidated, database-pool, security-event-rules, yara-summary, threat-scanner-progress)
- 5-second timeout with detailed timing metrics logging
- Configurable via
Warmupsection in appsettings.json
- Frontend Warmup System: Implemented useDashboardWarmup hook for early prefetch
- Early SignalR connection initialization after authentication
- Automatic dashboard data prefetch 1.5 seconds after login
- Joins dashboard updates group before navigation
- One-time execution per session with smart idle detection
- Non-blocking implementation with graceful error handling
- Frontend Resource Hints: Added preconnect and dns-prefetch hints to index.html for faster API connections
- Feature Flags: Complete configuration control for each warmup component (endpoints, SignalR, Qdrant)
- Performance Impact: Reduces first dashboard visit latency after cold start by warming critical paths before users arrive
- Phase 3 Complete: Dashboard instant load plan 100% implemented
- Skeleton Components: Created reusable skeleton components for consistent loading states
MetricCardSkeleton- Reusable metric card skeleton with staggered animation supportChartSkeleton- Flexible chart skeleton (pie, bar, area, rectangular types)- Organized in
dashboard/src/components/skeletons/directory
- Top-Bar Progress Indicator: Added fixed LinearProgress component for visual loading feedback
- Staggered Animations: Implemented cascading effect with 0s, 0.1s, 0.2s delays for polished UX
- All Sub-Components: Added skeletons to ApiDiagnostic, YaraSummaryCard, Connection Pool Monitor, System Metrics, Geographic Threat Map, Performance Dashboard, and Threat Intelligence Health
- Performance Impact: Dashboard structure now renders in <50ms (95% improvement)
- Pie Chart Colors: Updated Security Events by Risk Level chart to use proper risk-level-specific colors
- Critical: #f44336 (red)
- High: #ff9800 (orange)
- Medium: #8bc34a (light green - improved visibility on white background)
- Low: #2e7d32 (dark green)
- Unknown: #757575 (gray)
- Development Mode Default: Changed start.ps1 to default to React dev mode instead of production build
- New Parameter: Added
-ProductionBuildflag to explicitly use production build when needed - Usage:
.\scripts\start.ps1now starts dev mode,.\scripts\start.ps1 -ProductionBuildfor production
- Timeline Service Performance: Converted TimelineService to use IDbContextFactory for connection pooling
GetTimelineDataAsyncmethod optimized (src/Castellan.Worker/Services/TimelineService.cs:36)GetTimelineStatsAsyncmethod optimized (src/Castellan.Worker/Services/TimelineService.cs:114)GetTimelineDataPointsViaSqlAsyncmethod optimized (src/Castellan.Worker/Services/TimelineService.cs:576)- All database queries now benefit from 100-connection pool with WAL mode
- Added AsNoTracking() for read-only queries to improve performance
- Saved Search Service Performance: Converted SavedSearchService to use IDbContextFactory for connection pooling
- All 8 service methods now use pooled database contexts
- Includes: GetUserSavedSearchesAsync, GetSavedSearchAsync, CreateSavedSearchAsync, UpdateSavedSearchAsync, DeleteSavedSearchAsync, RecordSearchUsageAsync, GetMostUsedSearchesAsync, SearchSavedSearchesAsync
- Dashboard Performance: Optimized dashboard widget loading to eliminate unnecessary loading states
- Initialize dashboard state from SignalR context data to avoid loading skeletons
- Fixed useMemo dependencies to use consolidated data instead of legacy state
- Removed duplicate SignalR context hook calls
- Dashboard now renders instantly if data is already in SignalR context
- Data Provider Fixes: Removed 'dashboard' from predictive preloading to fix 404 errors
- Dashboard uses SignalR and direct fetch to
/api/dashboarddata/consolidatedinstead of standard resource endpoint - Cleaned up
getPredictedResources(),invalidateRelatedCaches(), andttlConfigin enhancedDataProvider.ts
- Dashboard uses SignalR and direct fetch to
- Single Database Architecture: Consolidated all database operations to use
/data/castellan.dbas the single source of truth - Database Path Fixes: Updated DatabaseYaraRuleStore, YaraImportTool, YaraConfigurationController, and DailyRefreshHostedService to use centralized database
- Malware Detection Rule Deduplication: Confirmed UPSERT logic prevents duplicate rules using
ON CONFLICT(Name) DO UPDATE - YARA Auto-Updates: Verified automatic rule update functionality with configurable frequency (1-365 days)
- Malware Detection Rule Management: 70 active rules with preserved performance metrics and user preferences across updates
- Database Performance: Implemented database-level pagination in MalwareRulesController, reducing load times from 7-10s to 1-3s (70-80% improvement)
- Data Migration: Successfully migrated all existing data from scattered database files to central location
- Performance Optimization Phase 1: Implemented instant page loading with sub-150ms transitions
- Smart Preloading System:
- Created preload utilities with network and memory awareness
- Navigation pattern prediction and learning algorithm
- Connection-aware preloading that adapts to network conditions
- MenuWithPreloading Component:
- Hover-based component and data preloading
- Predictive preloading based on user navigation patterns
- Intelligent tracking of frequently accessed pages
- EnhancedDataProvider with Caching:
- Cache-first strategy for instant data loading
- Background refresh for stale data (50% TTL trigger)
- Smart cache invalidation with related resource tracking
- Configurable TTL per resource type (5s to 5m based on update frequency)
- Webpack Optimization:
- Added prefetch hints to all lazy-loaded components
- Optimized chunk loading for better performance
- Reduced page load times by 81% (800ms → 150ms)
- Performance Improvements:
- 90% faster data fetch times with intelligent caching
- 80%+ cache hit rate for predicted pages
- Minimal bundle size increase (+2KB) for massive performance gains
- SOC2 Framework Activation: Moved SOC2ComplianceFramework from disabled to active frameworks folder and fixed compilation errors
- React UI Framework Correction: Updated hardcoded framework choices in ComplianceReports.tsx to show correct organizational frameworks (HIPAA, SOX, PCI DSS, ISO 27001, SOC2)
- Comprehensive Framework Verification: All 5 organizational compliance frameworks now have 100% functional report creation success
- PreloadManager Timelines Fix: Resolved "No import mapping for component: timelines" error by correcting component path to import from Timelines resource
- Framework Database Consistency: Ensured all organizational frameworks are properly visible via API and Tailwind Dashboard interface
- Enhanced Report Generation: Comprehensive ComplianceReportGenerationService with advanced reporting capabilities
- Multiple Report Formats: Support for JSON, HTML, PDF, CSV, and Markdown export formats
- Audience-Specific Templates: Report templates tailored for Executive, Technical, Auditor, and Operations audiences
- Professional PDF Generation: iTextSharp integration with formatted tables, charts, and professional layouts
- Advanced Report Sections: Executive Summary, Compliance Overview, Control Assessment, Risk Analysis, Recommendations, and Trend Analysis
- New API Controller: ComplianceReportGenerationController with 6 comprehensive endpoints:
POST /api/compliance-report-generation/comprehensive/{framework}- Generate full framework reportsPOST /api/compliance-report-generation/executive-summary- Multi-framework executive summariesPOST /api/compliance-report-generation/comparison- Framework comparison reportsPOST /api/compliance-report-generation/trend/{framework}- Historical trend analysis reportsGET /api/compliance-report-generation/formats- List supported export formatsGET /api/compliance-report-generation/audiences- List supported report audiences
- Report Customization: Configurable report sections, audience targeting, and format selection
- Organizational Scope Enforcement: Strict filtering to prevent access to application-level frameworks
- Build System Enhancement: Zero compilation errors and warnings with optimized dependencies
- Compliance Posture API: New CompliancePostureController with 5 endpoints for organizational framework monitoring
GET /api/compliance-posture/summary- Overall compliance health across all frameworksGET /api/compliance-posture/framework/{framework}- Detailed posture for specific frameworkPOST /api/compliance-posture/compare- Compare multiple frameworks side-by-sideGET /api/compliance-posture/trends- Historical compliance trend analysis (7/30/90 days)GET /api/compliance-posture/actions- Prioritized action recommendations with urgency scoring
- Risk Analysis: Advanced risk level calculations (Critical/High/Medium/Low/Minimal)
- Trend Analysis: Historical compliance tracking with configurable time ranges
- Action Prioritization: Smart recommendations based on implementation percentage and risk scores
- Framework Comparison: Multi-framework analysis for comprehensive compliance overview
- Build Optimization: Achieved zero build warnings through targeted fixes and suppressions
- Application-scope frameworks (CIS Controls v8, Windows Security Baselines) hidden from users
- SOC2 Organization-scope framework with 15 Trust Service Criteria controls
- ComplianceFrameworkService for visibility filtering between Application/Organization scopes
- ApplicationComplianceBackgroundService for automatic 6-hour assessment cycles
- Controller validation preventing user access to Application frameworks
- Total of 7 frameworks with 95 controls (70 Organization + 25 Application)
- Compliance Framework Implementation: Four operational compliance frameworks with real assessment engine
- SOX Framework: Complete Sarbanes-Oxley implementation with 11 controls for financial compliance
- PCI-DSS Framework: Payment Card Industry Data Security Standard with 12 controls for payment security
- ISO 27001 Framework: Information Security Management System with 15 controls for security governance
- Enhanced HIPAA Framework: Expanded from 17 controls with improved assessment logic
- Database Seeding: Automatic seeding of all 55 compliance controls across 4 frameworks on startup
- Framework Name Mapping: Added
NormalizeFrameworkName()to handle UI/backend naming differences (e.g., "ISO27001" → "ISO 27001") - Enhanced DI Registration: Fixed framework injection as
IEnumerable<IComplianceFramework>for proper service resolution - Real Assessment Engine: Security event-based compliance assessment replacing mock scoring system
- Files Updated:
PCIDSSComplianceFramework.cs,ISO27001ComplianceFramework.cs,ComplianceAssessmentService.cs,Program.cs
- Admin Menu Components: Fixed missing menu items issue where only 3 of 11 admin interface pages were visible
- Root Cause: Permission structure mismatch between Tailwind Dashboard and auth provider -
usePermissions()expected an array but received an object - Solution: Updated
authProvider.getPermissions()to return permissions array directly and enhanced admin user permissions - Enhanced Admin Permissions: Added
security.read,analytics.read,system.read,compliance.read,role:adminto backend JWT tokens - Files Updated:
AuthController.cs(backend permissions),authProvider.ts(frontend structure),MenuWithPreloading.tsx(permission logic) - Impact: All 11 admin interface pages now fully accessible (Dashboard, Security Events, MITRE Techniques, Malware Detection Rules, YARA Matches, Timeline, Trend Analysis, System Status, Threat Scanner, Compliance Reports, Configuration)
- Component Preloading: Enhanced MenuWithPreloading system now successfully preloads all menu components for instant navigation
- Root Cause: Permission structure mismatch between Tailwind Dashboard and auth provider -
- EventLogWatcher Implementation: Real-time Windows Event Log monitoring system
- Sub-second Latency: Replaces 30-60 second polling delays with <1 second event capture
- 95%+ Performance Improvement: Alert latency reduced from 30-60 seconds to <1 second
- Zero Event Loss: Interrupt-driven event capture ensures no missed events
- 70-80% CPU Reduction: Consistent low CPU usage vs. periodic spikes
- 10x+ Throughput: Process 10,000+ events/second vs. polling-limited ~1000/poll
- Bookmark Persistence: Resume from last processed event across service restarts
- Multi-Channel Support: Security, Sysmon, PowerShell, and Windows Defender channels
- XPath Filtering: Configurable event filtering for relevant security events
- Real-time SignalR: Immediate event broadcasting and dashboard updates
- Bounded Queues: Backpressure handling with configurable queue sizes
- Auto-Recovery: Automatic reconnection and error handling
- Files Added:
WindowsEventLogWatcherService.cs,WindowsEventChannelWatcher.cs,EventNormalizationHandler.cs,DatabaseEventBookmarkStore.cs,EventLogBookmarkEntity.cs - Configuration: Complete
WindowsEventLogconfiguration section with channel settings - Database Migration:
20250101000000_AddEventLogBookmarks.csfor bookmark persistence - Documentation: Comprehensive setup guide, performance validation, and implementation summary
- Integration Tests: Complete test suite for service integration and event processing
- Dashboard Data Consolidation: High-performance dashboard optimization system
- Single SignalR Stream: Replaces 4+ separate REST API calls with consolidated real-time data delivery
- 80%+ Performance Improvement: Dashboard load times reduced from 2-5 seconds to <1 second
- Parallel Data Fetching: Consolidated service fetches security events, system status, compliance reports, and threat scanner data simultaneously
- Real-time Updates: Live event counts (1786 events) with automatic 30-second refresh intervals
- Caching Strategy: Memory caching with 30-second TTL for optimal performance
- Automatic Fallback: Graceful fallback to REST API when SignalR unavailable
- Background Service:
DashboardDataBroadcastServiceprovides continuous real-time updates - REST API Endpoints:
/api/dashboarddata/consolidatedand/api/dashboarddata/broadcastfor API access - Files Added:
DashboardDataConsolidationService.cs,DashboardDataBroadcastService.cs,DashboardDataController.cs,DashboardData.cs - Frontend Integration: Enhanced
useSignalR.tshook and updatedDashboard.tsxfor consolidated data consumption
- Security Events Real-Time Updates: Complete real-time security event broadcasting system (September 24, 2025)
- SignalRSecurityEventStore: Decorator pattern implementation for automatic event broadcasting
- Instant Threat Alerts: Security events now broadcast immediately (previously 30-second delay)
- Correlation Alert Broadcasting: Real-time correlation alerts via BroadcastCorrelationAlert()
- YARA Match Notifications: Immediate malware detection alerts via BroadcastYaraMatch()
- Frontend Integration:
useSecurityEventsSignalR.tsReact hook for real-time subscriptions - Performance Optimization: Event log polling reduced from 5s to 30s, dashboard loads in 118ms
- Connection Status: Dual connection indicators for metrics and security events
- Risk-Based Notifications: Automatic notifications based on threat risk levels
- Files Added:
SignalRSecurityEventStore.cs, enhancedScanProgressHub.cs - Frontend Files:
useSecurityEventsSignalR.ts, updatedSecurityEvents.tsx
- Full Scan Progress Bar: Fixed progress tracking for Full Scan operations in threat scanner
- Root Cause: Service scoping issue - IThreatScanner was scoped, causing progress loss across HTTP requests
- Solution: Implemented shared singleton progress store (IThreatScanProgressStore) for persistent progress tracking
- Files Updated: Created
IThreatScanProgressStore.cs,ThreatScanProgressStore.cs; UpdatedThreatScannerService.cs,ThreatScannerController.cs,Program.cs - Impact: Progress bars now correctly display during Full Scan operations
- Advanced Correlation Engine: Comprehensive threat pattern detection and analysis system
- Temporal Burst Detection: Identifies rapid event sequences from same source (5+ events in 5 minutes)
- Brute Force Attack Detection: Recognizes failed authentication patterns followed by success (3+ failures in 10 minutes)
- Lateral Movement Detection: Tracks similar activities across multiple machines (3+ hosts in 30 minutes)
- Privilege Escalation Detection: Monitors escalation attempts and suspicious privilege changes (2+ events in 15 minutes)
- Attack Chain Analysis: Sequential attack pattern recognition with MITRE ATT&CK mapping
- Real-time Correlation: Sub-second threat correlation with configurable confidence thresholds
- Machine Learning Integration: Model training with confirmed correlations for improved accuracy
- Rule Management: Customizable correlation rules with time windows, event counts, and confidence levels
- Statistics & Metrics: Comprehensive correlation analytics with pattern trending and risk assessment
- REST API: Complete
/api/correlation/endpoints for statistics, rules, correlations, and analysis - Comprehensive Testing: 100+ unit and integration tests covering all correlation scenarios
- MITRE Configuration Tab: New configuration interface for managing MITRE ATT&CK techniques
- Database status display with technique count
- Manual import functionality for MITRE techniques
- Information panel about MITRE framework features
- Real-time status updates after import operations
- Timeline Icon Update: Changed Timeline menu icon from
TimelinetoSchedulefor better visual distinction from Trend Analysis icon - MitreController Enhancement: Added Tailwind Dashboard compatible pagination and sorting support
- Database Corruption Issue: Fixed SQLite database corruption preventing Worker API from starting
- Resolved "malformed database schema" error
- Automatic fresh database creation on corruption detection
- Trend Analysis with ML.NET
- Historical trend visualization with predictive analytics
- Machine learning-based forecasting
- Time series analysis and predictions
- Integrated into main dashboard
- SignalR Connection Persistence: Resolved critical issue where real-time connection disconnected on page navigation
- Root Cause: SignalR connection was component-scoped in Dashboard, got destroyed on Tailwind Dashboard page changes
- Solution: Implemented global SignalR context provider at application level
- Impact: Real-time updates now persist seamlessly across all menu navigation
- Files Updated:
SignalRContext.tsx(new),App.tsx,Dashboard.tsx,RealtimeSystemMetrics.tsx,NotificationSystem.tsx
- Global SignalR Context: New context provider for persistent real-time connections
- Navigation Stability: Users can now switch between Security Events, System Status, and other pages without losing live updates
- Updated
SIGNALR_REALTIME_INTEGRATION.mdwith new context-based architecture - Added troubleshooting section for navigation-related connection issues
- Troubleshooting Documentation: Added known issues for start.ps1 hanging and Worker API status false negatives
- MITRE Technique Fetch Errors: Better error handling for missing MITRE data
- Updated troubleshooting guide with startup script issues
- Frontend Caching Features: Cache Inspector Tool, localStorage persistence, and optimized cache TTLs were removed during development phase
- Advanced Search & Filtering Frontend: Complete UI implementation for enhanced security event search
- AdvancedSearchDrawer Component: Comprehensive search interface with accordion-style filter sections
- Multi-criteria filtering: date ranges, risk levels, event types, MITRE ATT&CK techniques
- Full-text search with exact match and fuzzy search options
- Numeric range sliders for confidence, correlation, burst, and anomaly scores
- MITRE technique filtering with 25+ common security techniques organized by tactic
- Real-time filter counting and active filter indicators
- Supporting Components: Complete component library for advanced filtering
- FullTextSearchInput with search mode toggles and help tooltips
- DateRangePicker with quick presets (24h, 7d, 30d, 90d) and manual input
- MultiSelectFilter with color coding and bulk operations
- RangeSliderFilter with dual-thumb sliders and manual input fields
- MitreTechniqueFilter with searchable technique database and tactic grouping
- State Management: Complete React hook and API service implementation
- useAdvancedSearch hook with URL synchronization for bookmarkable searches
- advancedSearchService API client with error handling and export functionality
- Debounced search with loading states and comprehensive error management
- Export functionality for CSV, JSON, XLSX formats
- SecurityEvents Integration: Seamless integration into existing SecurityEvents page
- Custom toolbar with Advanced Search, Share, and Export buttons
- Real-time search result summaries with performance metrics
- URL persistence for shareable search states
- Professional loading and error states with user-friendly feedback
- TypeScript Support: Complete type definitions for all API interactions
- Full type coverage for search requests, responses, and UI state
- Type-safe filter conversion between UI and API formats
- AdvancedSearchDrawer Component: Comprehensive search interface with accordion-style filter sections
- Database Optimization: Enhanced SQLite performance with FTS5 full-text search
- Composite indexes on security events for complex query optimization
- SQLite FTS5 virtual table for high-performance text search
- Database migration system with rollback support
- Optimized query patterns for sub-2-second response times
- ESLint Cleanup: Significant codebase maintenance and optimization
- Reduced ESLint warnings from 100+ to ~70 (major improvement)
- Fixed critical useEffect dependency issues to prevent infinite loops
- Removed unused imports and variables across component library
- Enhanced TypeScript compatibility and type safety
- Analytics & Reporting: Dashboard widgets and trend analysis
- Saved Searches: Bookmark and manage frequently used search configurations
- Advanced Correlation: Machine learning-based event correlation
- Configuration Backend API: Complete threat intelligence settings management
- Backend API: ThreatIntelligenceConfigController with RESTful endpoints
GET /api/settings/threat-intelligence- Retrieve current configuration with defaultsPUT /api/settings/threat-intelligence- Update configuration with validation
- Persistent Storage: File-based JSON storage in
data/threat-intelligence-config.json - Comprehensive Validation: Rate limits (1-1000/min), API key management
- Multi-Provider Support: VirusTotal, MalwareBazaar, AlienVault OTX configuration
- Tailwind Dashboard Integration: Enhanced dataProvider with configuration resource mapping
- Default Fallbacks: Sensible defaults when no configuration file exists
- Error Handling: Comprehensive validation with detailed error messages
- Backend API: ThreatIntelligenceConfigController with RESTful endpoints
-
Security Event Timeline Visualization: Complete timeline interface for event analysis
- Frontend Components: TimelinePanel, TimelineChart, and TimelineToolbar React components
- Interactive granularity control (minute, hour, day, week, month)
- Date range filtering with datetime-local pickers
- Real-time data refresh with loading states and error handling
- Responsive two-column layout with timeline chart and summary statistics
- DataProvider Integration: Extended castellanDataProvider with Timeline API methods
getTimelineData()- Aggregated timeline data with customizable granularitygetTimelineEvents()- Detailed event listing with time range filteringgetTimelineHeatmap()- Activity heatmap data for visualizationgetTimelineStats()- Summary statistics and risk level breakdowngetTimelineAnomalies()- Anomaly detection and alert analysis
- Tailwind Dashboard Integration: Timeline resource with Material-UI Timeline icon
- Read-only timeline resource for visual security event analysis
- Consistent design with existing admin interface components
- TypeScript support with full type safety and error handling
- Frontend Components: TimelinePanel, TimelineChart, and TimelineToolbar React components
-
Export Service & API: Complete data export functionality for security events
- Backend Export Service: IExportService and ExportService implementation
- CSV export with configurable field selection and filtering
- JSON export with structured data formatting
- PDF export with formatted reports and security event summaries
- Background export processing with progress tracking
- REST API Endpoints: ExportController with comprehensive export capabilities
GET /api/export/formats- Available export format discoveryPOST /api/export/security-events- Security event export with filteringGET /api/export/stats- Export usage statistics and metrics- JWT authentication with proper authorization checks
- Service Integration: Registered in dependency injection container
- Clean service architecture with interface-based design
- Comprehensive error handling and validation
- Memory-efficient streaming for large data exports
- Backend Export Service: IExportService and ExportService implementation
-
Frontend Configuration UI: Complete Tailwind Dashboard interface for threat intelligence settings
- Configuration Components: Comprehensive form-based configuration management
- Three-panel layout: VirusTotal, MalwareBazaar, AlienVault OTX providers
- Provider toggle switches with conditional field display
- Password-type API key fields with show/hide functionality
- Rate limit validation controls (1-1000/min)
- Real-time configuration validation with detailed error messages
- Security Features: Secure configuration management
- API keys stored as password fields in UI (no plaintext display)
- Configuration persisted to secure JSON file storage
- JWT authentication for all configuration endpoints
- Compliance with security rules (no plaintext passwords in repository)
- Integration: Seamless Tailwind Dashboard integration
- Custom dataProvider methods for configuration resource
- Optimistic UI updates with error rollback handling
- Consistent Material-UI design with existing interface components
- Configuration Components: Comprehensive form-based configuration management
-
Malware Detection System: Complete signature-based malware detection platform
- Rule Management API: Full REST API for malware detection rule CRUD operations
GET/POST/PUT/DELETE /api/malware-rules- Complete rule management- Rule filtering by category, tag, MITRE technique, and enabled status
- Pagination support for large rule sets with performance optimization
- Rule testing and validation endpoints with syntax checking
- Bulk operations for importing/exporting rule collections
- Frontend Integration: Complete Tailwind Dashboard YARA management interface
- MalwareRules resource with full CRUD capabilities and rule editor
- YaraMatches resource for viewing detection results and analysis
- YARA analytics dashboard with rule performance metrics
- Health monitoring widgets for scanning service status
- Storage & Performance: Advanced rule storage and execution
- Thread-safe file-based JSON storage with versioning support
- Performance metrics tracking (execution time, hit count, false positives)
- MITRE ATT&CK technique mapping and categorization
- Rule metadata management (author, description, threat level, priority)
- Optimized rule compilation and caching for improved scan performance
- Security Integration: Production-ready malware scanning
- JWT-authenticated API with comprehensive validation
- Basic YARA syntax validation and rule testing capabilities
- Rule category management (Malware, Ransomware, Trojan, Backdoor, etc.)
- False positive reporting and tracking system
- Integration with security event pipeline for automated threat detection
- Dependencies: Added dnYara and dnYara.NativePack for .NET YARA integration
- Rule Management API: Full REST API for malware detection rule CRUD operations
-
Performance Monitoring Enhancement: Extended system monitoring capabilities
- Performance alert service with configurable thresholds
- Enhanced metrics collection for malware detection rule execution
- Additional performance indicators for malware detection workflows
- Real-time system resource monitoring with health dashboards
- Database Architecture Consolidation (v0.9 - Late October 2025): PostgreSQL migration (primary remaining work)
- Migrating from SQLite to PostgreSQL for enhanced performance and JSON querying
- Eliminating JSON file storage duplication (FileBasedSecurityEventStore)
- Implementing unified retention policies across PostgreSQL and Qdrant
- Adding time-series partitioning for security events optimization
- Maintaining Qdrant for vector embeddings and similarity search operations
- Status: Only major technical work remaining after September 2025 completion of all other phases
-
Enhanced Performance Metrics Dashboard: Complete full-stack monitoring implementation
- Backend API: PerformanceController with 7 comprehensive API endpoints
/api/performance/dashboard-summary- Overall system health and metrics summary/api/performance/metrics- Historical performance data with time range support (1h-7d)/api/performance/alerts- Performance alerts and alert history management/api/performance/cache-stats- Cache performance statistics and effectiveness/api/performance/database- Database and Qdrant performance metrics/api/performance/system-resources- System resource utilization (CPU, memory, disk, network)/api/performance/alert-thresholds- Configurable alert threshold management
- Frontend Dashboard: React component with Material-UI and Recharts integration
- Real-time monitoring with 30-second auto-refresh
- Interactive time range selection (1h, 6h, 24h, 7d)
- Performance summary cards with health scores and status indicators
- Multi-axis charts combining response time, CPU, memory, and request metrics
- Active alerts display with severity levels and threshold information
- System resource visualization with progress bars and trend indicators
- Service Layer: PerformanceMetricsService and PerformanceAlertService
- Windows performance counter integration with cross-platform fallbacks
- Memory caching with variable TTL (5-30 seconds) for performance optimization
- Comprehensive data models (30+ classes) for all performance aspects
- Backend API: PerformanceController with 7 comprehensive API endpoints
-
Threat Intelligence Health Monitoring Dashboard: Service status monitoring system
- Backend API: ThreatIntelligenceHealthController for comprehensive service health
/api/threat-intelligence-health- Complete health status of all TI services- Service monitoring for VirusTotal, MalwareBazaar, and AlienVault OTX
- API rate limit tracking with remaining quotas and utilization
- Cache efficiency metrics and error rate monitoring per service
- Automated alerting for service degradation and failures
- Frontend Dashboard: React component with service status visualization
- Service grid view with individual health cards for each TI service
- Rate limit visualization with progress bars and quota tracking
- Performance comparison charts (response times, requests per service)
- Usage distribution pie charts showing query patterns
- Service-specific alerts with automatic generation and display
- Uptime tracking with formatted duration display
- Health Monitoring: Real-time service health assessment
- 60-second auto-refresh for current service status
- Service availability simulation with 90% success rates
- Comprehensive service metrics including API key status validation
- Backend API: ThreatIntelligenceHealthController for comprehensive service health
-
Dashboard Integration: Seamless integration with main dashboard
- Updated main Dashboard.tsx to include both new dashboard components
- Proper service registration in Program.cs dependency injection container
- Material-UI design system consistency across all dashboard components
- Responsive grid layouts that work on all screen sizes
- Error handling with retry mechanisms and graceful degradation
-
Dashboard Security Events Count: Fixed incorrect total events display in dashboard KPI cards
- Root cause: Dashboard used paginated data array length (
data.length= 10) instead of API total field (total= 2168+) - Impact: Dashboard now shows accurate total security events count matching Security Events page
- Files:
dashboard/src/components/Dashboard.tsx(Lines 242-244, 313) - Technical fix: Modified API response parsing to extract both
eventsarray andtotalcount - Result: Consistent event counts across dashboard and detail pages
- Root cause: Dashboard used paginated data array length (
-
Tailwind Dashboard Interface: Fixed missing RealtimeSystemMetrics component compilation failure
- Root cause: Missing
RealtimeSystemMetrics.tsxcomponent referenced in Dashboard - Impact: Tailwind Dashboard now compiles successfully and displays real-time system metrics
- Files:
dashboard/src/components/RealtimeSystemMetrics.tsx - Features: Real-time health overview, component metrics, auto-refresh every 10 seconds
- Result: Full dashboard functionality restored with Material UI integration
- Root cause: Missing
-
System Status Dashboard: Enhanced real-time monitoring capabilities
- Added comprehensive system metrics visualization
- Integrated response time, uptime, and error rate monitoring
- System resource tracking (CPU, memory usage) when available
- Material UI components for consistent dashboard experience
- Error handling with retry functionality for failed metric requests
-
Compiler Warning Cleanup: Eliminated all CS1998 and CS0649 warnings for clean builds
- Fixed async methods without await operators in multiple services
- Added pragma directives for planned infrastructure
- Ensures professional, warning-free development experience
-
Enhanced Logging Integration: Improved OllamaEmbedder logging
- Production: Automatic logger injection via dependency injection with Serilog output
- Tests: Clean test output by passing null logger instances to suppress logging
- Maintains backward compatibility with optional logger parameter
- 🔧 Worker API Authentication: Fixed Worker API auth by rebuilding service after BCrypt hash update
- Root cause: Authentication service needed rebuild after security enhancements
- Impact: Worker API now properly authenticates with updated security system
- Result: All services working correctly with enhanced security
-
Worker API Stability: Fixed critical
SemaphoreFullExceptioncausing immediate crashes- Root cause: Mismatched semaphore acquisition/release logic in Pipeline.cs
- Impact: Worker API now runs stable in background without crashes
- Files:
src/Castellan.Worker/Pipeline.cs(Lines 98-122, 389-469) - Result: Services can run for extended periods without interruption
-
MITRE ATT&CK DataProvider: Resolved "dataProvider error" in Tailwind Dashboard interface
- Root cause: MITRE endpoints return
{ techniques: [...] }format, dataProvider expected arrays - Impact: MITRE ATT&CK Techniques page now displays 50+ techniques properly
- Files:
dashboard/src/dataProvider/castellanDataProvider.ts(Lines 86-109) - Result: Full MITRE integration functional in web interface
- Root cause: MITRE endpoints return
-
Authentication Error Handling: Enhanced login experience and error messaging
- Root cause: Confusing "No tokens found" errors on initial page load
- Impact: Cleaner login flow with better backend unavailability messages
- Files: Tailwind Dashboard auth provider components
- Result: Improved user experience during authentication
- Background Service Management: Reliable PowerShell job-based service startup
- MITRE Data Import: Successfully imported 823 MITRE ATT&CK techniques
- Service Monitoring: Enhanced status verification and health checking
- Added comprehensive fix documentation in
SEPTEMBER_2025_FIXES.md - Updated troubleshooting guide with resolved issue sections
- Enhanced README.md with recent fixes summary
- Added verification steps and service management improvements
-
Qdrant Connection Pool: Enterprise-grade connection pool architecture for 15-25% I/O optimization
- New
QdrantConnectionPoolservice with intelligent connection reuse and management - Support for multiple Qdrant instances with automatic load balancing
- Configurable pool sizes:
MaxConnectionsPerInstance(default: 10 per instance) - Connection timeout management:
ConnectionTimeout(default: 10s),RequestTimeout(default: 1m) - Thread-safe connection acquisition and release with proper resource disposal
- Complete metrics collection for connection usage, performance, and health
- New
-
Health Monitoring: Automatic instance health monitoring with failover capabilities
- Background health checks with configurable
CheckInterval(default: 30s) - Consecutive failure/success thresholds for intelligent health state management
- Automatic instance marking as Healthy/Unhealthy based on consecutive check results
- Health status tracking with detailed reporting and trend analysis
- Configurable
MinHealthyInstancesrequirement for service availability
- Background health checks with configurable
-
Load Balancing: Advanced load balancing algorithms for optimal performance
- Round Robin: Equal distribution across all healthy instances
- Weighted Round Robin: Performance-based distribution with dynamic weight adjustment
- Instance performance tracking with response time and error rate metrics
- Automatic weight adjustment based on instance performance characteristics
- Sticky session support for connection affinity (configurable)
-
Batch Processing Integration: Seamless integration with existing vector batch processing
- New
QdrantPooledVectorStorethat wraps existingQdrantVectorStorewith pooling - Maintains full compatibility with existing
BatchUpsertAsyncand vector operations - Automatic failover during batch operations if instances become unhealthy
- Performance metrics integration for pooled operations
- New
-
Configuration Options: Comprehensive connection pool configuration
{ "ConnectionPools": { "Qdrant": { "Instances": [{ "Host": "localhost", "Port": 6333, "Weight": 100, "UseHttps": false }], "MaxConnectionsPerInstance": 10, "HealthCheckInterval": "00:00:30", "ConnectionTimeout": "00:00:10", "RequestTimeout": "00:01:00", "EnableFailover": true, "MinHealthyInstances": 1 }, "HealthMonitoring": { "Enabled": true, "CheckTimeout": "00:00:05", "ConsecutiveFailureThreshold": 3, "ConsecutiveSuccessThreshold": 2, "EnableAutoRecovery": true }, "LoadBalancing": { "Algorithm": "WeightedRoundRobin", "EnableHealthAwareRouting": true } } }
- Vector Store Interface: Enhanced
IVectorStorewith pooled implementation support - Service Registration: Automatic connection pool registration in dependency injection
- Performance Monitoring: Integration with existing performance monitoring services
- Error Handling: Enhanced error handling with connection pool health awareness
- Resource Management: Proper disposal of connection pool resources and connections
- Thread Safety: Concurrent connection access with proper locking mechanisms
- Connection Lifecycle: Complete connection lifecycle management from creation to disposal
- Health State Machine: Sophisticated health state transitions with hysteresis
- Metrics Collection: Comprehensive metrics for pool utilization, performance, and health
- Test Coverage: 393 tests passing including comprehensive connection pool validation
-
Vector Batch Processing: High-performance batch operations for 3-5x improvement
- New
BatchUpsertAsyncmethod in IVectorStore interface for batch vector operations - Smart buffering system with size-based (100 vectors) and time-based (5s) flushing
- Thread-safe concurrent buffer management with proper locking
- Automatic fallback to individual operations on batch failures
- New configuration options:
EnableVectorBatching,VectorBatchSize,VectorBatchTimeoutMs - Performance metrics tracking for batch operations and efficiency
- Complete QdrantVectorStore and MockVectorStore implementations
- New
-
Semaphore-Based Throttling: Configurable concurrency limits with graceful degradation
- New
EnableSemaphoreThrottlingoption to enable/disable semaphore-based throttling MaxConcurrentTaskssetting to control maximum concurrent pipeline tasksSemaphoreTimeoutMsfor configurable timeout on semaphore acquisitionSkipOnThrottleTimeoutoption for handling timeout scenarios
- New
-
Enhanced Pipeline Configuration: 20+ new configuration options
- Memory management settings:
MemoryHighWaterMarkMB,EventHistoryRetentionMinutes - Queue management:
MaxQueueDepth,EnableQueueBackPressure,DropOldestOnQueueFull - Adaptive throttling:
EnableAdaptiveThrottling,CpuThrottleThreshold - Performance monitoring:
EnableDetailedMetrics,MetricsIntervalMs
- Memory management settings:
-
Comprehensive Performance Monitoring: Advanced metrics tracking
- Pipeline throttling metrics with queue depth and wait times
- Detailed pipeline metrics with throughput improvement calculations
- Memory pressure monitoring with automatic cleanup triggers
- Baseline performance tracking for improvement measurements
-
Configuration Validation Enhancements:
- DataAnnotations validation for all new pipeline options
- Comprehensive business logic validation with warnings
- Startup validation prevents invalid configurations
- Clear error messages for configuration issues
-
Documentation:
- Complete performance tuning guide (
docs/performance_tuning.md) - Performance baseline documentation (
BASELINE.md) - Updated configuration templates with Phase 3 options
- Complete performance tuning guide (
- Pipeline Processing: Updated to use
IOptionsMonitor<PipelineOptions>for dynamic configuration - Performance Monitor Service: Extended with new metrics for throttling and memory pressure
- Pipeline Throttling: Integrated semaphore-based concurrency control throughout pipeline
- Error Handling: Enhanced with correlation ID tracking and structured logging
- Implemented proper resource disposal for semaphore objects
- Added graceful degradation when throttling limits are exceeded
- Enhanced logging with structured data and performance indicators
- Dynamic configuration updates without service restart
- OllamaEmbedder Logging Integration: Replaced Console.WriteLine with proper ILogger integration
- Production: Automatic logger injection via dependency injection with Serilog output
- Tests: Clean test output by passing null logger instances to suppress logging
- Maintains backward compatibility with optional logger parameter
- Compiler Warning Cleanup: Eliminated all CS1998 and CS0649 warnings for clean builds
- Fixed async methods without await operators in multiple services
- Added pragma directives for planned semaphore throttling infrastructure
- Ensures professional, warning-free development experience
- Service Lifetime Audit: Comprehensive review and documentation of all DI registrations
- Configuration Validation: Startup validation for Authentication, Qdrant, and Pipeline options
- Global Exception Handling: Consistent error responses with correlation ID tracking
- Structured Logging: Enhanced Serilog configuration with correlation IDs and performance metrics
- Service Lifetimes: Fixed SystemHealthService lifetime from Scoped to Singleton
- Error Responses: Standardized error response format across all endpoints
- Request Logging: Added request/response logging with performance metrics
- BCrypt Password Hashing: Secure password storage with salt generation
- Refresh Token System: Proper token rotation and revocation with audit trail
- JWT Token Blacklisting: Server-side token invalidation with memory-based cache
- Security Services: Complete authentication overhaul with proper interfaces
- Authentication Security: Eliminated plaintext password comparison
- Token Management: Comprehensive token validation and lifecycle management
- Security Middleware: JWT validation middleware with blacklist checking
- Teams/Slack Integration: Complete notification system with webhook management
- Rate limiting and proper error handling
- React admin interface for webhook configuration
- Support for both Microsoft Teams and Slack platforms
- Documentation: Updated to reflect Teams/Slack as open source features
- Event Processing Pipeline: Enhanced cloud security support
- JSON Compatibility: Fixed property mapping and unit test issues
- Core Security Pipeline: Initial security event processing pipeline
- Windows Event Log Integration: Collection and analysis of Windows security events
- LLM-Powered Analysis: Ollama integration for intelligent security analysis
- Vector Search: Qdrant integration for similarity search capabilities
- MITRE ATT&CK Integration: Automatic MITRE ATT&CK data integration and tooltips
- Web Interface: React-based administrative interface
- Database: SQLite database for application metadata
- Real-time Alerts: Security alert and notification system
- Cross-platform Scripts: PowerShell script compatibility improvements
- Configuration Management: Moved configuration files to Worker directory
The following new configuration options are available for performance tuning:
{
"Pipeline": {
// New batch processing settings
"EnableVectorBatching": true,
"VectorBatchSize": 100,
"VectorBatchTimeoutMs": 5000,
"VectorBatchProcessingTimeoutMs": 30000,
// New throttling settings
"EnableSemaphoreThrottling": true,
"MaxConcurrentTasks": 8,
"SemaphoreTimeoutMs": 15000,
// New memory management
"MemoryHighWaterMarkMB": 1024,
"EventHistoryRetentionMinutes": 60,
// New performance monitoring
"EnableDetailedMetrics": true,
"MetricsIntervalMs": 30000
}
}Breaking Changes: None - all new settings have sensible defaults and are backward compatible.
Performance Impact:
- Parallel Processing: 20% improvement achieved (12,000+ EPS)
- Intelligent Caching: 30-50% improvement
- Horizontal Scaling: Architecture with fault tolerance
- Connection Pooling: 15-25% I/O optimization
- Vector Batch Processing: Expected 3-5x improvement for vector operations
- Target: 50,000+ events per second with advanced optimizations
Note: This changelog follows semantic versioning and Keep a Changelog format.