Skip to content

Testing: Streamlit Chat UI Test Suite for Issues #7, #8, #9Β #12

@ma3u

Description

@ma3u

πŸ§ͺ Testing: Comprehensive Test Suite for Streamlit Chat UI

Problem Statement

The Streamlit Chat UI implementation (issues #7, #8, #9) requires comprehensive testing to ensure all features work as designed and match the mockup specifications. We need automated and manual test cases covering functionality, integration, and user experience.

Objectives

Create a complete test suite that validates:

Test Categories

1. UI/UX Testing (Mockup Validation)

Verify Streamlit implementation matches the mockup:

  • Header displays "Neo4j RAG + BitNet Chat (local developer mode)"
  • Service health cards show correct status (Neo4j, RAG, BitNet)
  • Compact stats display below chat (5 metrics)
  • Sidebar sections in correct order: RAG Config β†’ LLM Config β†’ Upload β†’ Actions β†’ Display
  • Dark theme colors match mockup (#0E1117, #262730, #FF4B4B)
  • Chat messages styled correctly (user: blue, assistant: gray)
  • Sources expander functionality works
  • Performance badges display correctly
  • Full statistics modal opens and closes properly
  • All interactive elements (sliders, toggles, buttons) function

2. Feature Testing: Chat Interface (Issue #7)

Test Suite: Chat Functionality

  • TC-7.1: User can send message via chat input
  • TC-7.2: Message appears in chat history immediately
  • TC-7.3: RAG service returns response within 5 seconds
  • TC-7.4: Assistant response displays in chat
  • TC-7.5: Sources expand/collapse correctly
  • TC-7.6: Performance metrics shown per query (when enabled)
  • TC-7.7: Message history persists during session
  • TC-7.8: Enter key sends message
  • TC-7.9: Empty messages are rejected
  • TC-7.10: Long messages display correctly

Test Suite: Settings Configuration

  • TC-7.11: Max results slider (1-10) affects query
  • TC-7.12: Similarity threshold slider (0.0-1.0) affects results
  • TC-7.13: BitNet toggle switches LLM on/off
  • TC-7.14: Temperature slider affects response style
  • TC-7.15: Show Sources toggle works
  • TC-7.16: Show Performance toggle works
  • TC-7.17: Show Timestamps toggle works
  • TC-7.18: Settings persist during session
  • TC-7.19: Clear chat button empties history
  • TC-7.20: Export chat button (placeholder)

3. Feature Testing: Document Upload (Issue #8)

Test Suite: Upload Functionality

  • TC-8.1: File uploader accepts PDF files
  • TC-8.2: File uploader accepts TXT files
  • TC-8.3: File uploader accepts MD files
  • TC-8.4: File uploader accepts DOCX files
  • TC-8.5: File uploader rejects unsupported types (e.g., .exe, .zip)
  • TC-8.6: Files over 10MB are rejected with error message
  • TC-8.7: Multiple files can be selected simultaneously
  • TC-8.8: Upload button appears when files selected
  • TC-8.9: Upload progress shown with spinner
  • TC-8.10: Success message displays for successful uploads

Test Suite: Upload Integration

  • TC-8.11: Uploaded documents appear in recent uploads
  • TC-8.12: Document count increases after upload
  • TC-8.13: Uploaded content is searchable via chat
  • TC-8.14: RAG retrieves chunks from uploaded documents
  • TC-8.15: Failed uploads show error messages
  • TC-8.16: Upload history shows timestamps
  • TC-8.17: Multiple uploads processed in sequence
  • TC-8.18: Large files (near 10MB) upload successfully
  • TC-8.19: Duplicate filenames handled gracefully
  • TC-8.20: Upload works with special characters in filename

4. Feature Testing: Monitoring Dashboard (Issue #9)

Test Suite: Service Health Monitoring

  • TC-9.1: Neo4j health card displays correct status
  • TC-9.2: RAG service health card displays correct status
  • TC-9.3: BitNet LLM health card displays correct status
  • TC-9.4: Health cards update with accurate response times
  • TC-9.5: Service offline shows red status
  • TC-9.6: Service slow shows yellow warning
  • TC-9.7: Port numbers display correctly (7687, 8000, 8001)
  • TC-9.8: Health checks don't block UI
  • TC-9.9: Failed health check shows error gracefully
  • TC-9.10: Multiple service failures handled

Test Suite: Performance Metrics

  • TC-9.11: Document count accurate
  • TC-9.12: Chunk count accurate
  • TC-9.13: Response time reflects actual queries
  • TC-9.14: Memory usage from stats API
  • TC-9.15: Cache hit rate calculates correctly
  • TC-9.16: Delta indicators show improvements
  • TC-9.17: Metrics update after queries
  • TC-9.18: Metrics update after uploads
  • TC-9.19: Zero-state metrics display correctly
  • TC-9.20: Large numbers formatted properly

Test Suite: Full Statistics Modal

  • TC-9.21: "View Full Statistics" button opens modal
  • TC-9.22: Modal displays 12 metric cards
  • TC-9.23: Performance trend chart visible
  • TC-9.24: Query analytics shows recent queries
  • TC-9.25: Close button closes modal
  • TC-9.26: ESC key closes modal (if implemented)
  • TC-9.27: All statistics accurate from API
  • TC-9.28: Uptime displays correctly
  • TC-9.29: Database size shows actual size
  • TC-9.30: Back to chat navigation works

5. Integration Testing

Test Suite: Service Integration

  • TC-INT.1: Streamlit connects to Neo4j successfully
  • TC-INT.2: Streamlit connects to RAG service successfully
  • TC-INT.3: RAG service connects to BitNet successfully
  • TC-INT.4: End-to-end query flow works (Streamlit β†’ RAG β†’ BitNet β†’ Neo4j)
  • TC-INT.5: Document upload flow works (Streamlit β†’ RAG β†’ Neo4j)
  • TC-INT.6: Health checks work for all services
  • TC-INT.7: Stats endpoint returns complete data
  • TC-INT.8: Network connectivity between containers
  • TC-INT.9: Service restart recovery
  • TC-INT.10: Concurrent users supported

Test Suite: Error Handling

  • TC-ERR.1: RAG service offline shows error message
  • TC-ERR.2: Neo4j offline shows error in health card
  • TC-ERR.3: BitNet timeout handled gracefully
  • TC-ERR.4: Invalid API response handled
  • TC-ERR.5: Network errors don't crash app
  • TC-ERR.6: Malformed query handled
  • TC-ERR.7: Upload failure shows user-friendly error
  • TC-ERR.8: Stats API timeout handled
  • TC-ERR.9: Session state corruption recovery
  • TC-ERR.10: Container restart doesn't lose data

6. Performance Testing

Test Suite: Response Times

  • TC-PERF.1: Chat input responds within 100ms
  • TC-PERF.2: Query response < 5s for simple queries
  • TC-PERF.3: Health checks complete < 2s
  • TC-PERF.4: Stats loading < 1s
  • TC-PERF.5: File upload < 30s for 5MB file
  • TC-PERF.6: Modal opens instantly
  • TC-PERF.7: Sidebar interactions < 100ms
  • TC-PERF.8: 10 concurrent queries handled
  • TC-PERF.9: Memory usage < 512MB
  • TC-PERF.10: CPU usage < 50% average

Test Suite: Scalability

  • TC-SCALE.1: 100+ messages in chat history
  • TC-SCALE.2: 50+ uploaded documents
  • TC-SCALE.3: Multiple concurrent users (5+)
  • TC-SCALE.4: Long-running session (1+ hour)
  • TC-SCALE.5: Large file upload (10MB)

7. Browser/Device Compatibility

Test Suite: Responsive Design

  • TC-RESP.1: Desktop view (1920x1080) displays correctly
  • TC-RESP.2: Laptop view (1366x768) displays correctly
  • TC-RESP.3: Tablet view (768px width) responsive
  • TC-RESP.4: Mobile view (375px width) functional
  • TC-RESP.5: Sidebar collapses on mobile
  • TC-RESP.6: Chat scrollable on all devices
  • TC-RESP.7: Touch interactions work on mobile

Test Suite: Browser Support

  • TC-BROW.1: Chrome/Edge latest version
  • TC-BROW.2: Firefox latest version
  • TC-BROW.3: Safari latest version
  • TC-BROW.4: Mobile Safari (iOS)
  • TC-BROW.5: Chrome Mobile (Android)

Testing Implementation

Manual Testing Checklist

Setup:

# 1. Start all services
docker-compose -f scripts/docker-compose.optimized.yml up -d

# 2. Verify services are healthy
docker-compose ps

# 3. Access Streamlit UI
open http://localhost:8501

Test Execution:

  1. Go through each test case systematically
  2. Document results (Pass/Fail)
  3. Capture screenshots for UI validation
  4. Note any deviations from mockup
  5. Report bugs as separate issues

Automated Testing (Future)

Playwright Tests (Recommended):

# tests/streamlit_test.py
import pytest
from playwright.sync_api import Page, expect

def test_chat_interface(page: Page):
    page.goto("http://localhost:8501")
    page.get_by_placeholder("Ask a question").fill("What is BitNet?")
    page.get_by_role("button", name="Send").click()
    expect(page.get_by_text("What is BitNet?")).to_be_visible()

def test_document_upload(page: Page):
    page.goto("http://localhost:8501")
    page.get_by_label("Upload documents").set_input_files("test.pdf")
    page.get_by_role("button", name="Upload").click()
    expect(page.get_by_text("uploaded successfully")).to_be_visible()

Streamlit Testing:

# tests/test_app.py
from streamlit.testing.v1 import AppTest

def test_initial_state():
    at = AppTest.from_file("app.py")
    at.run()
    assert not at.exception
    assert len(at.sidebar.slider) == 3  # 3 sliders for settings

Acceptance Criteria

For Test Suite Completion:

  • All 100+ test cases documented
  • Manual testing completed with results
  • Automated tests written for critical paths
  • Bug reports filed for failures
  • Screenshots captured for UI validation
  • Performance benchmarks documented
  • Integration tests passing
  • Browser compatibility verified

For Feature Sign-off:

Test Data Requirements

Sample Documents:

  • 5x PDF files (various sizes: 100KB, 1MB, 5MB, 10MB)
  • 3x TXT files with different encodings
  • 2x MD files with markdown formatting
  • 1x DOCX file with complex formatting
  • 1x Invalid file type (.exe) for negative testing

Sample Queries:

  • Simple: "What is BitNet?"
  • Complex: "How does Neo4j vector search compare to traditional databases?"
  • Edge cases: Empty query, very long query (500+ chars)
  • Non-existent topics: "quantum mechanics in ancient Rome"

Related Issues

Implementation Timeline

Estimated Effort: 2-3 days

  • Day 1: Manual testing and documentation
  • Day 2: Automated test setup (Playwright/Streamlit tests)
  • Day 3: Bug fixes and re-testing

Priority: High - Required before production deployment

Resources

Success Metrics

  • βœ… All critical test cases passing (TC-7., TC-8., TC-9.*)
  • βœ… 95%+ overall test pass rate
  • βœ… Zero critical bugs remaining
  • βœ… Performance within acceptable ranges
  • βœ… UI matches mockup specifications
  • βœ… Integration tests confirm service connectivity

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions