# Testing Guide - Lobster AI Testing Framework ## ๐ŸŽฏ Overview This guide provides comprehensive documentation for the Lobster AI testing framework, targeting 95%+ code coverage with scientifically accurate testing scenarios. The testing infrastructure covers unit tests, integration tests, system tests, and performance benchmarks across all bioinformatics workflows. ## ๐Ÿ—๏ธ Testing Architecture ### Test Categories #### 1. **Unit Tests** (`tests/unit/`) - **Purpose**: Test individual functions, classes, and methods in isolation - **Duration**: ~2 minutes for full suite - **Coverage**: Individual components (services, agents, utilities) - **Execution**: `pytest tests/unit/` #### 2. **Integration Tests** (`tests/integration/`) - **Purpose**: Test component interactions and workflows - **Duration**: ~15 minutes for full suite - **Coverage**: Agent-service integration, data flow validation - **Execution**: `pytest tests/integration/` #### 3. **System Tests** (`tests/system/`) - **Purpose**: Test complete end-to-end workflows - **Duration**: ~30 minutes for full suite - **Coverage**: Full analysis pipelines, CLI interactions - **Execution**: `pytest tests/system/` #### 4. **Performance Tests** (`tests/performance/`) - **Purpose**: Benchmark performance and memory usage - **Duration**: ~45 minutes for full suite - **Coverage**: Large dataset handling, algorithmic efficiency - **Execution**: `pytest tests/performance/` ### Directory Structure ``` tests/ โ”œโ”€โ”€ ๐Ÿ“ unit/ # Unit tests (20+ files) โ”‚ โ”œโ”€โ”€ core/ # Core system components โ”‚ โ”‚ โ”œโ”€โ”€ test_data_manager_v2.py โ”‚ โ”‚ โ”œโ”€โ”€ test_client.py โ”‚ โ”‚ โ””โ”€โ”€ test_adapters.py โ”‚ โ”œโ”€โ”€ agents/ # AI agent functionality โ”‚ โ”‚ โ”œโ”€โ”€ test_data_expert.py โ”‚ โ”‚ โ””โ”€โ”€ test_transcriptomics_expert.py โ”‚ โ”œโ”€โ”€ services/ # Analysis services (7+ files) โ”‚ โ”‚ โ”œโ”€โ”€ test_quality_service.py โ”‚ โ”‚ โ”œโ”€โ”€ test_clustering_service.py โ”‚ โ”‚ โ””โ”€โ”€ test_differential_service.py โ”‚ โ””โ”€โ”€ tools/ # Analysis tools (12+ files) โ”œโ”€โ”€ ๐Ÿ“ integration/ # Integration tests (5 files) โ”‚ โ”œโ”€โ”€ test_agent_workflows.py โ”‚ โ”œโ”€โ”€ test_data_pipelines.py โ”‚ โ””โ”€โ”€ test_service_chains.py โ”œโ”€โ”€ ๐Ÿ“ system/ # System tests (3 files) โ”‚ โ”œโ”€โ”€ test_end_to_end.py โ”‚ โ”œโ”€โ”€ test_cli_commands.py โ”‚ โ””โ”€โ”€ test_multi_modal.py โ”œโ”€โ”€ ๐Ÿ“ performance/ # Performance tests (3 files) โ”œโ”€โ”€ ๐Ÿ“ mock_data/ # Synthetic data generation โ”‚ โ”œโ”€โ”€ generators.py # High-level data generators โ”‚ โ”œโ”€โ”€ factories.py # Data factory classes โ”‚ โ””โ”€โ”€ base.py # Base configurations โ”œโ”€โ”€ conftest.py # Global fixtures and configuration โ””โ”€โ”€ README.md # Testing documentation ``` ## ๐Ÿ”ง Test Configuration ### Pytest Configuration (`pytest.ini`) ```ini [tool:pytest] minversion = 6.0 addopts = --strict-markers --disable-warnings --verbose --tb=short --cov=lobster --cov-report=html:htmlcov --cov-report=xml:coverage.xml --cov-report=term-missing --cov-fail-under=80 --durations=10 testpaths = tests markers = unit: mark test as a unit test integration: mark test as an integration test system: mark test as a system test performance: mark test as a performance benchmark slow: mark test as slow running requires_gpu: mark test as requiring GPU requires_network: mark test as requiring network access filterwarnings = ignore::DeprecationWarning ignore::PendingDeprecationWarning ignore::FutureWarning:scanpy ``` ### Global Test Fixtures (`conftest.py`) ```python # Core fixtures available to all tests import pytest import tempfile from pathlib import Path from unittest.mock import Mock from lobster.core.data_manager_v2 import DataManagerV2 from lobster.core.client import AgentClient from tests.mock_data.generators import ( generate_synthetic_single_cell, generate_synthetic_bulk_rnaseq, generate_synthetic_proteomics ) @pytest.fixture def temp_workspace(): """Create temporary workspace for testing.""" with tempfile.TemporaryDirectory(prefix="lobster_test_") as tmpdir: workspace_path = Path(tmpdir) yield workspace_path @pytest.fixture def mock_data_manager(temp_workspace): """Create DataManagerV2 instance with temporary workspace.""" return DataManagerV2(workspace_path=temp_workspace) @pytest.fixture def sample_single_cell_data(): """Generate synthetic single-cell data for testing.""" return generate_synthetic_single_cell( n_cells=100, n_genes=50, n_cell_types=3 ) @pytest.fixture def sample_bulk_data(): """Generate synthetic bulk RNA-seq data.""" return generate_synthetic_bulk_rnaseq( n_samples=12, n_genes=100 ) @pytest.fixture def sample_proteomics_data(): """Generate synthetic proteomics data.""" return generate_synthetic_proteomics( n_samples=20, n_proteins=80, missing_rate=0.3 ) @pytest.fixture def mock_agent_client(mock_data_manager): """Create mock AgentClient for testing.""" client = Mock(spec=AgentClient) client.data_manager = mock_data_manager return client @pytest.fixture(scope="session") def test_config(): """Test configuration parameters.""" return { 'timeout': 300, 'max_memory': '2GB', 'test_data_size': 'medium', 'enable_gpu': False } ``` ## ๐Ÿงช Writing Unit Tests ### Service Unit Test Template ```python # tests/unit/tools/test_your_service.py import pytest import numpy as np import pandas as pd from unittest.mock import Mock, patch, MagicMock from lobster.tools.your_service import YourService, YourServiceError from tests.mock_data.generators import generate_synthetic_single_cell class TestYourService: """Comprehensive unit tests for YourService.""" @pytest.fixture def service(self): """Create service instance for testing.""" return YourService() @pytest.fixture def mock_adata(self): """Create mock AnnData for testing.""" return generate_synthetic_single_cell(n_cells=50, n_genes=30) def test_service_initialization(self, service): """Test service initializes correctly.""" assert service is not None assert service.progress_callback is None assert service.current_progress == 0 assert hasattr(service, 'total_steps') def test_progress_callback_setting(self, service): """Test progress callback functionality.""" callback_calls = [] def mock_callback(progress, message): callback_calls.append((progress, message)) service.set_progress_callback(mock_callback) assert service.progress_callback is not None # Test progress update service.total_steps = 2 service._update_progress("Test step") assert len(callback_calls) == 1 assert callback_calls[0][0] == 50 # 50% progress assert "Test step" in callback_calls[0][1] def test_main_analysis_success(self, service, mock_adata): """Test successful analysis execution.""" # Test with default parameters result_adata, statistics = service.main_analysis_method( mock_adata, parameter1=1.0, parameter2="default" ) # Validate results structure assert result_adata is not None assert isinstance(statistics, dict) # Validate AnnData structure preservation assert result_adata.n_obs == mock_adata.n_obs assert result_adata.n_vars == mock_adata.n_vars # Validate statistics content required_stats = ['n_observations', 'n_features', 'analysis_timestamp'] for stat in required_stats: assert stat in statistics # Validate analysis metadata stored assert 'your_analysis' in result_adata.uns assert result_adata.uns['your_analysis']['method'] == 'default' def test_parameter_validation(self, service, mock_adata): """Test comprehensive parameter validation.""" # Test invalid parameter1 (should be positive) with pytest.raises(ValueError, match="Parameter1 must be positive"): service.main_analysis_method(mock_adata, parameter1=-1.0) with pytest.raises(ValueError, match="Parameter1 must be positive"): service.main_analysis_method(mock_adata, parameter1=0.0) # Test invalid parameter2 (should be from allowed options) with pytest.raises(ValueError, match="Invalid parameter2"): service.main_analysis_method(mock_adata, parameter2="invalid_option") # Test empty parameter3 list with pytest.raises(ValueError, match="Parameter3 cannot be empty list"): service.main_analysis_method(mock_adata, parameter3=[]) def test_empty_data_handling(self, service): """Test handling of edge cases in data.""" # Empty observations empty_obs_adata = generate_synthetic_single_cell(n_cells=0, n_genes=10) with pytest.raises(ValueError, match="Input data is empty"): service.main_analysis_method(empty_obs_adata) # Empty features empty_vars_adata = generate_synthetic_single_cell(n_cells=10, n_genes=0) with pytest.raises(ValueError, match="Input data has no features"): service.main_analysis_method(empty_vars_adata) def test_statistical_accuracy(self, service, mock_adata): """Test statistical calculations are mathematically correct.""" result_adata, statistics = service.main_analysis_method(mock_adata) # Verify basic statistics match input data assert statistics['n_observations'] == mock_adata.n_obs assert statistics['n_features'] == mock_adata.n_vars # Test statistical calculations if applicable if 'mean_expression' in statistics: expected_mean = np.mean(mock_adata.X) np.testing.assert_almost_equal( statistics['mean_expression'], expected_mean, decimal=5 ) def test_error_propagation(self, service, mock_adata, monkeypatch): """Test error handling and propagation.""" # Mock internal method to raise exception def mock_preprocess_error(*args, **kwargs): raise RuntimeError("Preprocessing failed") monkeypatch.setattr(service, '_preprocess_data', mock_preprocess_error) # Should wrap in service-specific error with pytest.raises(YourServiceError, match="Unexpected error"): service.main_analysis_method(mock_adata) def test_reproducibility(self, service, mock_adata): """Test that analyses are reproducible.""" # Run same analysis twice result1_adata, stats1 = service.main_analysis_method( mock_adata, parameter1=1.5, parameter2="option1" ) result2_adata, stats2 = service.main_analysis_method( mock_adata.copy(), parameter1=1.5, parameter2="option2" ) # Results should be deterministic for same inputs if 'option1' == 'option1': # Same parameters np.testing.assert_array_almost_equal( result1_adata.X, result1_adata.X # Compare with self for structure ) @pytest.mark.parametrize("param1,param2,expected_error", [ (-1.0, "default", ValueError), (1.0, "invalid", ValueError), (1.0, "default", None), # Should succeed ]) def test_parameter_combinations(self, service, mock_adata, param1, param2, expected_error): """Test various parameter combinations.""" if expected_error: with pytest.raises(expected_error): service.main_analysis_method(mock_adata, parameter1=param1, parameter2=param2) else: result_adata, statistics = service.main_analysis_method( mock_adata, parameter1=param1, parameter2=param2 ) assert result_adata is not None assert isinstance(statistics, dict) def test_memory_efficiency(self, service): """Test memory usage doesn't grow excessively.""" import psutil import os process = psutil.Process(os.getpid()) initial_memory = process.memory_info().rss # Run analysis on moderately sized data large_adata = generate_synthetic_single_cell(n_cells=1000, n_genes=500) result_adata, statistics = service.main_analysis_method(large_adata) final_memory = process.memory_info().rss memory_growth = final_memory - initial_memory # Memory growth should be reasonable (less than 500MB for this test) assert memory_growth < 500 * 1024 * 1024 # 500MB threshold def test_concurrent_usage(self, service, mock_adata): """Test service can be used concurrently (stateless requirement).""" import threading results = [] errors = [] def run_analysis(data): try: result = service.main_analysis_method(data.copy()) results.append(result) except Exception as e: errors.append(e) # Run multiple analyses concurrently threads = [] for i in range(3): thread = threading.Thread(target=run_analysis, args=(mock_adata,)) threads.append(thread) thread.start() # Wait for all to complete for thread in threads: thread.join() # Should have no errors and expected number of results assert len(errors) == 0 assert len(results) == 3 ``` ### Agent Unit Test Template ```python # tests/unit/agents/test_your_agent.py import pytest from unittest.mock import Mock, patch, MagicMock from lobster.agents.your_agent import your_agent_factory from lobster.core.data_manager_v2 import DataManagerV2 class TestYourAgent: """Unit tests for YourAgent.""" @pytest.fixture def mock_data_manager(self): """Create mock DataManagerV2.""" mock_dm = Mock(spec=DataManagerV2) mock_dm.list_modalities.return_value = [] mock_dm.get_modality.return_value = None mock_dm.log_tool_usage = Mock() return mock_dm @pytest.fixture def agent(self, mock_data_manager): """Create agent instance for testing.""" with patch('lobster.agents.your_agent.get_settings') as mock_settings: mock_settings.return_value.get_agent_llm_params.return_value = { 'model': 'test-model', 'temperature': 0.1 } return your_agent_factory(mock_data_manager) def test_agent_creation(self, mock_data_manager): """Test agent factory creates agent successfully.""" with patch('lobster.agents.your_agent.get_settings') as mock_settings: mock_settings.return_value.get_agent_llm_params.return_value = {} agent = your_agent_factory(mock_data_manager) assert agent is not None def test_agent_tools_available(self, agent): """Test that agent has expected tools.""" # Agent tools are typically accessible via agent.get_graph().nodes # Implementation depends on LangGraph structure # This is a conceptual test - actual implementation may vary tools = getattr(agent, 'tools', []) tool_names = [tool.name for tool in tools if hasattr(tool, 'name')] expected_tools = ['check_available_modalities', 'perform_domain_analysis'] for expected_tool in expected_tools: # Check if tool exists (test implementation may need adjustment) pass @patch('lobster.agents.your_agent.YourService') def test_tool_service_integration(self, mock_service_class, agent, mock_data_manager): """Test that agent tools properly integrate with services.""" # Setup mock service mock_service = Mock() mock_service.perform_analysis.return_value = (Mock(), {'metric': 1.0}) mock_service_class.return_value = mock_service # Setup data manager mock_adata = Mock() mock_data_manager.list_modalities.return_value = ['test_data'] mock_data_manager.get_modality.return_value = mock_adata # This test would need to be implemented based on how tools are exposed # in the actual LangGraph agent structure ``` ## ๐Ÿ”— Writing Integration Tests ### Agent-Service Integration Template ```python # tests/integration/test_agent_service_integration.py import pytest from lobster.core.client import AgentClient from lobster.core.data_manager_v2 import DataManagerV2 from tests.mock_data.generators import generate_synthetic_single_cell class TestAgentServiceIntegration: """Integration tests for agent-service workflows.""" @pytest.fixture def client_with_data(self, temp_workspace, sample_single_cell_data): """Create client with loaded data.""" data_manager = DataManagerV2(workspace_path=temp_workspace) data_manager.modalities['test_data'] = sample_single_cell_data client = AgentClient(data_manager=data_manager) return client def test_quality_assessment_workflow(self, client_with_data): """Test complete quality assessment workflow.""" # Request quality assessment response = client_with_data.query( "Assess the quality of test_data modality" ) # Validate response structure assert response['success'] is True assert 'quality_assessed' in response['response'] # Check that new modality was created modalities = client_with_data.data_manager.list_modalities() assert any('quality_assessed' in mod for mod in modalities) def test_clustering_analysis_workflow(self, client_with_data): """Test complete clustering workflow.""" # First, assess quality quality_response = client_with_data.query( "Assess quality of test_data" ) assert quality_response['success'] # Then perform clustering cluster_response = client_with_data.query( "Perform clustering analysis on the quality-assessed data" ) assert cluster_response['success'] assert 'clustered' in cluster_response['response'] def test_multi_agent_handoff(self, client_with_data): """Test handoffs between different agents.""" # Start with data expert data_response = client_with_data.query( "Load and prepare the test data for transcriptomics analysis" ) assert data_response['success'] # Should handoff to transcriptomics expert analysis_response = client_with_data.query( "Now perform clustering and find marker genes" ) assert analysis_response['success'] def test_error_handling_integration(self, client_with_data): """Test error handling across agent-service boundaries.""" # Request analysis on non-existent data error_response = client_with_data.query( "Analyze the modality called 'nonexistent_data'" ) # Should handle error gracefully assert error_response['success'] is False assert 'not found' in error_response['response'].lower() ``` ## ๐ŸŒ Writing System Tests ### End-to-End Test Template ```python # tests/system/test_end_to_end.py import pytest from pathlib import Path import tempfile from lobster.core.client import AgentClient from lobster.core.data_manager_v2 import DataManagerV2 class TestEndToEndWorkflows: """System tests for complete analysis workflows.""" @pytest.fixture def full_client_setup(self): """Set up complete client with real configuration.""" with tempfile.TemporaryDirectory() as tmpdir: workspace = Path(tmpdir) data_manager = DataManagerV2(workspace_path=workspace) client = AgentClient(data_manager=data_manager) yield client def test_complete_single_cell_pipeline(self, full_client_setup): """Test complete single-cell analysis pipeline.""" client = full_client_setup # Step 1: Load synthetic data response1 = client.query( "Generate synthetic single-cell data with 500 cells and 100 genes" ) assert response1['success'] # Step 2: Quality assessment response2 = client.query( "Assess the quality of the generated data" ) assert response2['success'] # Step 3: Preprocessing response3 = client.query( "Filter and normalize the data based on quality metrics" ) assert response3['success'] # Step 4: Clustering response4 = client.query( "Perform clustering analysis and generate UMAP visualization" ) assert response4['success'] # Step 5: Marker gene analysis response5 = client.query( "Find marker genes for each cluster" ) assert response5['success'] # Validate final state modalities = client.data_manager.list_modalities() expected_stages = ['generated', 'quality_assessed', 'normalized', 'clustered'] for stage in expected_stages: assert any(stage in mod for mod in modalities) def test_bulk_rnaseq_differential_expression(self, full_client_setup): """Test bulk RNA-seq differential expression workflow.""" client = full_client_setup # Generate bulk RNA-seq data with conditions response1 = client.query( "Generate bulk RNA-seq data with treatment and control conditions" ) assert response1['success'] # Perform differential expression response2 = client.query( "Perform differential expression analysis between conditions" ) assert response2['success'] # Validate results contain expected elements assert 'differential' in response2['response'] assert any('differential' in mod for mod in client.data_manager.list_modalities()) @pytest.mark.slow def test_large_dataset_handling(self, full_client_setup): """Test system performance with larger datasets.""" client = full_client_setup # Generate larger dataset response = client.query( "Generate single-cell data with 10000 cells and 2000 genes" ) assert response['success'] # Perform computationally intensive analysis cluster_response = client.query( "Perform clustering with high resolution and generate comprehensive visualizations" ) assert cluster_response['success'] ``` ## ๐Ÿ“Š Performance Testing ### Performance Test Template ```python # tests/performance/test_service_performance.py import pytest import time import psutil import os from pathlib import Path from lobster.tools.clustering_service import ClusteringService from tests.mock_data.generators import generate_synthetic_single_cell class TestServicePerformance: """Performance benchmarks for services.""" @pytest.mark.performance def test_clustering_performance_small(self, benchmark): """Benchmark clustering on small dataset.""" service = ClusteringService() adata = generate_synthetic_single_cell(n_cells=1000, n_genes=500) def run_clustering(): return service.cluster_and_visualize(adata) result = benchmark(run_clustering) # Validate performance assert benchmark.stats.stats.mean < 30.0 # Should complete in <30 seconds @pytest.mark.performance @pytest.mark.slow def test_clustering_performance_large(self, benchmark): """Benchmark clustering on larger dataset.""" service = ClusteringService() adata = generate_synthetic_single_cell(n_cells=10000, n_genes=2000) def run_clustering(): return service.cluster_and_visualize(adata, demo_mode=True) result = benchmark(run_clustering) # Should still complete within reasonable time in demo mode assert benchmark.stats.stats.mean < 120.0 # 2 minutes max def test_memory_usage_monitoring(self): """Monitor memory usage during analysis.""" service = ClusteringService() process = psutil.Process(os.getpid()) initial_memory = process.memory_info().rss # Run analysis adata = generate_synthetic_single_cell(n_cells=5000, n_genes=1000) result_adata, stats = service.cluster_and_visualize(adata) peak_memory = process.memory_info().rss memory_growth = peak_memory - initial_memory # Memory growth should be reasonable (less than 2GB) assert memory_growth < 2 * 1024**3 # Clean up and check memory returns close to initial del adata, result_adata import gc gc.collect() final_memory = process.memory_info().rss memory_leak = final_memory - initial_memory # Should not have significant memory leaks (less than 100MB) assert memory_leak < 100 * 1024**2 ``` ## ๐Ÿ› ๏ธ Mock Data Generation ### Synthetic Data Generators ```python # tests/mock_data/custom_generators.py """Custom generators for specific test scenarios.""" import numpy as np import pandas as pd import anndata as ad from typing import Optional, Dict, Any def generate_realistic_single_cell( n_cells: int = 1000, n_genes: int = 500, cell_types: Optional[list] = None, batch_effects: bool = False, doublet_rate: float = 0.05 ) -> ad.AnnData: """ Generate realistic single-cell data with biological features. Args: n_cells: Number of cells n_genes: Number of genes cell_types: List of cell type names batch_effects: Whether to include batch effects doublet_rate: Proportion of doublet cells Returns: AnnData with realistic single-cell features """ if cell_types is None: cell_types = ['T_cells', 'B_cells', 'NK_cells', 'Monocytes'] # Generate base expression matrix with biological structure np.random.seed(42) # Create cell type-specific expression patterns n_cell_types = len(cell_types) cells_per_type = n_cells // n_cell_types X = np.zeros((n_cells, n_genes)) cell_type_labels = [] for i, cell_type in enumerate(cell_types): start_idx = i * cells_per_type end_idx = start_idx + cells_per_type if i < n_cell_types - 1 else n_cells # Base expression for this cell type base_expression = np.random.negative_binomial(10, 0.3, size=(end_idx - start_idx, n_genes)) # Add cell type-specific marker genes marker_genes = slice(i * 50, (i + 1) * 50) # 50 markers per type base_expression[:, marker_genes] *= np.random.uniform(2, 5, size=(end_idx - start_idx, 50)) X[start_idx:end_idx, :] = base_expression cell_type_labels.extend([cell_type] * (end_idx - start_idx)) # Create AnnData object adata = ad.AnnData(X=X.astype(np.float32)) # Add cell metadata adata.obs['cell_type'] = cell_type_labels adata.obs['n_genes'] = (adata.X > 0).sum(axis=1) adata.obs['total_counts'] = adata.X.sum(axis=1) # Add mitochondrial genes mt_genes = np.random.choice(n_genes, size=int(0.05 * n_genes), replace=False) adata.var['mt'] = False adata.var.iloc[mt_genes, adata.var.columns.get_loc('mt')] = True # Calculate mitochondrial percentage adata.obs['pct_counts_mt'] = ( adata[:, adata.var['mt']].X.sum(axis=1) / adata.obs['total_counts'] * 100 ) # Add batch effects if requested if batch_effects: n_batches = 3 batch_assignments = np.random.choice(n_batches, size=n_cells) adata.obs['batch'] = [f'batch_{i}' for i in batch_assignments] # Apply batch-specific scaling for batch_id in range(n_batches): batch_mask = batch_assignments == batch_id batch_effect = np.random.uniform(0.8, 1.2) adata.X[batch_mask, :] *= batch_effect # Add doublets if requested if doublet_rate > 0: n_doublets = int(n_cells * doublet_rate) doublet_indices = np.random.choice(n_cells, size=n_doublets, replace=False) adata.obs['is_doublet'] = False adata.obs.iloc[doublet_indices, adata.obs.columns.get_loc('is_doublet')] = True # Doublets have higher total counts adata.X[doublet_indices, :] *= np.random.uniform(1.5, 2.0, size=(n_doublets, 1)) # Add gene metadata gene_names = [f'Gene_{i:04d}' for i in range(n_genes)] adata.var.index = gene_names adata.var['highly_variable'] = False # Mark some genes as highly variable hv_genes = np.random.choice(n_genes, size=int(0.2 * n_genes), replace=False) adata.var.iloc[hv_genes, adata.var.columns.get_loc('highly_variable')] = True return adata def generate_differential_expression_data( n_samples_per_group: int = 6, n_genes: int = 1000, n_de_genes: int = 100, effect_size: float = 2.0 ) -> ad.AnnData: """Generate bulk RNA-seq data with known differential expression.""" n_samples = n_samples_per_group * 2 # Generate base counts X = np.random.negative_binomial(20, 0.3, size=(n_samples, n_genes)) # Add differential expression de_gene_indices = np.random.choice(n_genes, size=n_de_genes, replace=False) # Treatment group gets higher expression for DE genes treatment_samples = slice(n_samples_per_group, n_samples) X[treatment_samples, :][:, de_gene_indices] *= effect_size # Create AnnData adata = ad.AnnData(X=X.astype(np.float32)) # Add sample metadata conditions = ['control'] * n_samples_per_group + ['treatment'] * n_samples_per_group adata.obs['condition'] = conditions adata.obs['sample_id'] = [f'sample_{i:02d}' for i in range(n_samples)] # Add gene metadata gene_names = [f'Gene_{i:04d}' for i in range(n_genes)] adata.var.index = gene_names adata.var['is_de'] = False adata.var.iloc[de_gene_indices, adata.var.columns.get_loc('is_de')] = True return adata ``` ## ๐Ÿš€ Running Tests ### Basic Test Execution ```bash # Run all tests make test # Run specific test categories pytest tests/unit/ # Unit tests only pytest tests/integration/ # Integration tests only pytest tests/system/ # System tests only pytest tests/performance/ # Performance tests only # Run tests with specific markers pytest -m "unit and not slow" # Fast unit tests only pytest -m "integration" # Integration tests pytest -m "performance" # Performance benchmarks # Run specific test files pytest tests/unit/test_clustering_service.py pytest tests/integration/test_agent_workflows.py # Run with coverage reporting pytest --cov=lobster --cov-report=html # Run with performance benchmarks pytest --benchmark-only # Only benchmark tests pytest --benchmark-compare # Compare with previous runs ``` ### Advanced Test Options ```bash # Run tests in parallel (requires pytest-xdist) pytest -n auto # Auto-detect CPU cores pytest -n 4 # Use 4 processes # Run with specific output formats pytest --tb=long # Long traceback format pytest --tb=short # Short traceback format pytest -v # Verbose output pytest -s # Don't capture stdout # Run tests matching pattern pytest -k "test_clustering" # Tests with 'clustering' in name pytest -k "not slow" # Skip slow tests # Run with timeout (requires pytest-timeout) pytest --timeout=300 # 5 minute timeout per test # Generate test reports pytest --html=reports/report.html # HTML report pytest --junitxml=reports/junit.xml # JUnit XML report ``` ## ๐ŸŽฏ End-to-End Acceptance Testing System Lobster AI features a **sophisticated 3-tier testing architecture** that includes a comprehensive End-to-End acceptance testing system for validating complete user workflows. ### ๐ŸŒŸ Testing Architecture Overview 1. **Traditional pytest Framework** - Unit/Integration/System/Performance tests 2. **End-to-End Acceptance Testing** - Natural language workflow validation 3. **Hybrid Execution** - Combined reporting and analytics ### ๐Ÿš€ E2E Testing Components #### **Core Files** - **`tests/test_cases.json`** - 30+ realistic user scenarios with validation criteria - **`tests/run_integration_tests.py`** - Advanced test runner with performance monitoring - **`tests/run_tests.sh`** - User-friendly bash wrapper for easy execution #### **Key Features** - **Natural Language Testing**: Validates actual conversational interface users experience - **Performance Monitoring**: Real-time CPU, memory, disk I/O tracking during execution - **Scalable Architecture**: Tag-based filtering, priorities, parallel execution - **Response Validation**: Keyword matching, length checks, error detection - **Workspace Management**: Isolated test environments with automatic cleanup ### ๐ŸŽฎ Quick Start Commands ```bash # User-friendly bash wrapper (recommended) ./tests/run_tests.sh # Run all scenarios sequentially ./tests/run_tests.sh --parallel # Run in parallel ./tests/run_tests.sh --parallel -w 8 # 8 parallel workers # Advanced Python runner with full control python tests/run_integration_tests.py --categories basic,advanced --parallel python tests/run_integration_tests.py --performance-monitoring --workers 4 python tests/run_integration_tests.py --run-pytest-integration --output results.json ``` ### ๐Ÿ“Š Test Categorization & Filtering ```bash # Filter by categories python tests/run_integration_tests.py --categories basic,advanced,performance,error_handling # Filter by biological domains python tests/run_integration_tests.py --tags geo,proteomics,multiomics,spatial,qc # Filter by priority levels (1-5) python tests/run_integration_tests.py --priorities 1,2,3 # Combine filters for targeted testing python tests/run_integration_tests.py --categories advanced --tags geo,qc --parallel ``` ### ๐Ÿงช Test Scenarios (30+ Realistic Workflows) #### **Categories:** - **`basic`** - Simple workflows (GEO download, basic QC) - **`advanced`** - Complex analysis (multi-omics, trajectory analysis) - **`performance`** - Large dataset processing - **`error_handling`** - Edge cases and error recovery #### **Biological Domain Tags:** - **`geo`** - GEO dataset workflows - **`qc`** - Quality control processes - **`visualization`** - Plotting and visual analysis - **`multiomics`** - Cross-platform integration - **`spatial`** - Spatial transcriptomics - **`proteomics`** - Mass spec and affinity proteomics - **`clustering`** - Cell/sample grouping analysis #### **Example Test Scenarios:** ```json { "test_geo_download_with_qc_umap": { "inputs": [ "Download GEO dataset GSE291670 and do the quality control", "Generate the UMAP with resolution 0.7" ], "category": "basic", "description": "Test complete workflow from download to UMAP visualization", "expected_duration": 120.0, "timeout": 400.0, "tags": ["geo", "qc", "umap", "visualization"], "priority": 3, "validation_criteria": { "input_0": { "required_keywords": ["quality control", "downloaded"], "no_errors": true }, "input_1": { "required_keywords": ["UMAP", "resolution"], "no_errors": true } } } } ``` ### โšก Performance Monitoring The E2E system includes comprehensive performance monitoring: ```bash # Enable performance monitoring python tests/run_integration_tests.py --performance-monitoring # Features monitored: # - CPU usage percentage (average and peak) # - Memory consumption (RSS, peak usage) # - Disk I/O operations (read/write MB) # - Network activity (sent/received MB) # - Test execution duration vs expected # - Resource usage trends across test categories ``` ### ๐Ÿ”„ Hybrid pytest Integration Combine traditional pytest tests with E2E scenarios for comprehensive validation: ```bash # Run both pytest and E2E tests together python tests/run_integration_tests.py --run-pytest-integration # Features: # - Unified success/failure reporting # - Combined coverage analytics # - Category-wise performance breakdowns # - Comprehensive JSON output with both test types ``` ### ๐ŸŽฏ Advanced E2E Features #### **Dependency Resolution** Tests can specify dependencies for automatic ordering: ```json { "dependencies": ["test_geo_download", "test_basic_qc"], "priority": 4 } ``` #### **Retry Logic** Configurable retry attempts for flaky tests: ```json { "retry_count": 2, "timeout": 300.0 } ``` #### **Response Validation** Sophisticated validation of AI responses: ```json { "validation_criteria": { "input_0": { "required_keywords": ["downloaded", "GSE109564"], "forbidden_keywords": ["error", "failed"], "min_length": 50, "no_errors": true } } } ``` ### ๐Ÿ“‹ Adding New E2E Test Scenarios Add realistic user scenarios to **`tests/test_cases.json`**: ```json { "test_my_custom_workflow": { "inputs": [ "Download GSE123456 and perform quality control", "Apply batch correction using Harmony", "Create publication-ready UMAP plot" ], "category": "advanced", "description": "Test batch correction workflow", "tags": ["geo", "batch_correction", "visualization"], "priority": 3, "timeout": 600.0, "expected_duration": 240.0, "validation_criteria": { "input_0": { "required_keywords": ["downloaded", "quality control"], "no_errors": true }, "input_1": { "required_keywords": ["batch correction", "Harmony"], "no_errors": true }, "input_2": { "required_keywords": ["UMAP", "publication"], "no_errors": true } } } } ``` ### ๐Ÿ“Š E2E Test Results & Analytics The E2E system generates comprehensive reports: ```json { "summary": { "test_execution_summary": { "total_tests": 25, "passed_tests": 23, "failed_tests": 2, "success_rate": 0.92, "total_duration": 1800.0, "average_duration": 72.0 }, "category_breakdown": { "basic": {"passed": 8, "failed": 0, "total": 8}, "advanced": {"passed": 12, "failed": 2, "total": 14}, "performance": {"passed": 3, "failed": 0, "total": 3} }, "performance_summary": { "avg_cpu_percent": 15.2, "avg_memory_mb": 1024.5, "max_memory_mb": 2048.0 } } } ``` ## ๐Ÿ“ˆ Coverage and Quality Metrics ### Coverage Requirements - **Minimum Coverage**: 80% (enforced by CI) - **Target Coverage**: 95% - **Critical Components**: 100% coverage required for core services and agents ### Coverage Analysis ```bash # Generate coverage report pytest --cov=lobster --cov-report=html --cov-report=term # View coverage in browser open htmlcov/index.html # Check coverage for specific modules pytest --cov=lobster.tools --cov-report=term-missing # Fail build if coverage below threshold pytest --cov=lobster --cov-fail-under=80 ``` ### Quality Metrics - **Test Execution Time**: Unit tests <2min, Integration <15min, System <30min - **Memory Usage**: No test should use >2GB RAM - **Test Reliability**: <1% flaky test rate - **Scientific Accuracy**: All biological algorithms must be validated ## ๐Ÿ” Debugging Tests ### Common Debugging Techniques ```python # Add debug logging in tests import logging logging.basicConfig(level=logging.DEBUG) # Use pytest debugging pytest --pdb # Drop to debugger on failures pytest --pdbcls=IPython.terminal.debugger:TerminalPdb # Use IPython debugger # Add debug prints (use capsys to capture) def test_with_debug(capsys): print("Debug information here") # ... test code ... captured = capsys.readouterr() print(f"Captured output: {captured.out}") # Temporary test isolation pytest -x # Stop on first failure pytest --lf # Run only last failed tests pytest --ff # Run failures first ``` ### Mock Data Debugging ```python # Inspect generated data def test_inspect_mock_data(): adata = generate_synthetic_single_cell() print(f"Shape: {adata.shape}") print(f"Obs columns: {adata.obs.columns.tolist()}") print(f"Var columns: {adata.var.columns.tolist()}") print(f"Uns keys: {list(adata.uns.keys())}") # Save for manual inspection adata.write_h5ad('/tmp/debug_data.h5ad') ``` ## ๐ŸŽฏ Best Practices Summary ### Test Design Principles 1. **Test Isolation**: Each test should be independent 2. **Realistic Data**: Use biologically plausible synthetic data 3. **Scientific Validation**: Verify biological correctness, not just code correctness 4. **Performance Awareness**: Monitor memory and time usage 5. **Error Coverage**: Test both success and failure paths ### Naming Conventions ```python class TestServiceName: def test_method_success_case(self): # Happy path pass def test_method_edge_case_empty_data(self): # Edge cases pass def test_method_error_invalid_params(self): # Error conditions pass def test_method_performance_large_data(self): # Performance pass ``` ### Test Organization - **One test class per component** being tested - **Group related tests** in the same class - **Use descriptive test names** that explain the scenario - **Keep tests focused** - one concept per test - **Use fixtures** for common setup/teardown This comprehensive testing guide ensures that the Lobster AI platform maintains high quality, reliability, and scientific accuracy across all bioinformatics workflows.