|
| 1 | +# Semantic Search Bug Report for Fixing |
| 2 | +**Date**: 2025-09-15T03:35:00Z |
| 3 | +**Reporter**: claude-parser team (first production dogfooding) |
| 4 | +**For**: semantic-search Claude Code session to fix |
| 5 | + |
| 6 | +## 🔴 Critical Bugs Found |
| 7 | + |
| 8 | +### Bug 1: Keyword Matching Instead of Semantic Understanding |
| 9 | +**Severity**: HIGH |
| 10 | +**Impact**: Core functionality broken - not doing "semantic" search |
| 11 | + |
| 12 | +#### Reproduction |
| 13 | +```python |
| 14 | +from mcp_semantic_search import discover_code_patterns |
| 15 | + |
| 16 | +# Search for testing patterns |
| 17 | +results = discover_code_patterns("black box test real data API test integration") |
| 18 | + |
| 19 | +# EXPECTED: Test files, test utilities, testing patterns |
| 20 | +# ACTUAL: Random files with word "test" in them: |
| 21 | +# - test_single.py (not a test, just has "test" in name) |
| 22 | +# - settings.py (has "test" credentials) |
| 23 | +# - empty verify_spec.py (0 bytes) |
| 24 | +# - analytics/__init__.py (unrelated) |
| 25 | +``` |
| 26 | + |
| 27 | +#### Root Cause |
| 28 | +The embeddings appear to be doing keyword matching rather than understanding concepts: |
| 29 | +- Matches ANY file with "test" in filename |
| 30 | +- Doesn't understand "black box testing" as a concept |
| 31 | +- Returns files with matching keywords but no semantic relevance |
| 32 | + |
| 33 | +#### Fix Needed |
| 34 | +1. Use code-aware embeddings (like CodeBERT or similar) |
| 35 | +2. Weight actual test files (test_*.py pattern) higher |
| 36 | +3. Understand testing terminology semantically |
| 37 | +4. Consider file structure (tests/ directory should rank higher) |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +### Bug 2: Timeout Issues (RESOLVED but worth noting) |
| 42 | +**Severity**: MEDIUM (was HIGH) |
| 43 | +**Status**: Fixed locally but not in main branch? |
| 44 | + |
| 45 | +#### What Was Fixed |
| 46 | +- Added 30s timeout to httpx.AsyncClient |
| 47 | +- Issue: find_violations() was timing out after 5 seconds |
| 48 | + |
| 49 | +#### Location of Fix |
| 50 | +- File: mcp_server.py:115-138 |
| 51 | +- Functions: find_violations(), check_architecture_compliance() |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +### Bug 3: CWD Context Not Auto-Detected (RESOLVED) |
| 56 | +**Severity**: LOW |
| 57 | +**Status**: Fixed locally |
| 58 | + |
| 59 | +#### What Was Fixed |
| 60 | +- MCP tools required explicit project parameter even when in project directory |
| 61 | +- Added os.getcwd() fallback to auto-detect project from CWD |
| 62 | + |
| 63 | +--- |
| 64 | + |
| 65 | +## 🟡 Accuracy Issues Found |
| 66 | + |
| 67 | +### Issue 1: Mixed Accurate and Inaccurate Results |
| 68 | +The audit returned a mix of: |
| 69 | +- ✅ **ACCURATE**: Empty domain folders (verified and deleted) |
| 70 | +- ✅ **ACCURATE**: filters.py with 82 lines (LOC violation, now fixed) |
| 71 | +- ❓ **QUESTIONABLE**: Some test files that may not have existed |
| 72 | +- ❌ **WRONG**: Returned files based on keyword not semantic meaning |
| 73 | + |
| 74 | +### Issue 2: Pattern Detection Limitations |
| 75 | +**Query**: "find violations" |
| 76 | +**Expected**: Find LNCA pattern violations |
| 77 | +**Actual**: Found some real violations but also false positives |
| 78 | + |
| 79 | +--- |
| 80 | + |
| 81 | +## 📊 Test Cases for Verification |
| 82 | + |
| 83 | +### Test 1: Semantic Understanding |
| 84 | +```python |
| 85 | +def test_semantic_understanding(): |
| 86 | + # Should understand concepts, not keywords |
| 87 | + results = discover_code_patterns("testing patterns for black box") |
| 88 | + |
| 89 | + # Should return actual test files |
| 90 | + assert any("test_" in r['file_name'] for r in results) |
| 91 | + |
| 92 | + # Should NOT return non-test files with "test" in name |
| 93 | + assert not any("settings.py" in r['file_name'] for r in results) |
| 94 | +``` |
| 95 | + |
| 96 | +### Test 2: Code-Aware Search |
| 97 | +```python |
| 98 | +def test_code_aware_search(): |
| 99 | + # Should understand code patterns |
| 100 | + results = discover_code_patterns("singleton pattern implementation") |
| 101 | + |
| 102 | + # Should find actual singleton implementations |
| 103 | + # Not just files with word "singleton" |
| 104 | +``` |
| 105 | + |
| 106 | +### Test 3: Architecture Pattern Search |
| 107 | +```python |
| 108 | +def test_architecture_patterns(): |
| 109 | + # Should find LNCA violations |
| 110 | + results = find_violations(project="claude-parser") |
| 111 | + |
| 112 | + # Should identify: |
| 113 | + # - Files >80 LOC |
| 114 | + # - Custom code instead of framework delegation |
| 115 | + # - DRY violations |
| 116 | +``` |
| 117 | + |
| 118 | +--- |
| 119 | + |
| 120 | +## 💡 Suggestions for Improvement |
| 121 | + |
| 122 | +### 1. Use Code-Specific Embeddings |
| 123 | +- Current: General text embeddings |
| 124 | +- Needed: Code-aware models like: |
| 125 | + - CodeBERT |
| 126 | + - GraphCodeBERT |
| 127 | + - CodeT5 |
| 128 | + - Or OpenAI's code-specific embeddings |
| 129 | + |
| 130 | +### 2. Add Pattern Recognition |
| 131 | +```python |
| 132 | +# Recognize common patterns |
| 133 | +PATTERNS = { |
| 134 | + "test": r"test_*.py", |
| 135 | + "fixture": r"*fixture*.py", |
| 136 | + "mock": r"*mock*.py", |
| 137 | + "config": r"*config*.py|*settings*.py" |
| 138 | +} |
| 139 | +``` |
| 140 | + |
| 141 | +### 3. Weight File Structure |
| 142 | +```python |
| 143 | +def score_result(file_path, query): |
| 144 | + score = base_embedding_score |
| 145 | + |
| 146 | + # Boost test files for test queries |
| 147 | + if "test" in query and "test_" in file_path: |
| 148 | + score *= 1.5 |
| 149 | + |
| 150 | + # Boost files in tests/ directory |
| 151 | + if "test" in query and "/tests/" in file_path: |
| 152 | + score *= 1.3 |
| 153 | + |
| 154 | + return score |
| 155 | +``` |
| 156 | + |
| 157 | +### 4. Add Semantic Context |
| 158 | +- Include file imports in embeddings |
| 159 | +- Include function signatures |
| 160 | +- Include class hierarchies |
| 161 | +- Include docstrings |
| 162 | + |
| 163 | +--- |
| 164 | + |
| 165 | +## 🔄 Dogfooding Benefits |
| 166 | + |
| 167 | +This is the FIRST production use of semantic-search and we found: |
| 168 | +1. Real bugs that need fixing |
| 169 | +2. Accuracy issues to improve |
| 170 | +3. Feature gaps to fill |
| 171 | + |
| 172 | +The dogfooding cycle: |
| 173 | +- **semantic-search** → finds violations in **claude-parser** |
| 174 | +- **claude-parser** → parses conversations for **semantic-search** |
| 175 | +- Both improve through real use |
| 176 | + |
| 177 | +--- |
| 178 | + |
| 179 | +## 📝 Priority Order for Fixes |
| 180 | + |
| 181 | +1. **CRITICAL**: Fix keyword matching → implement semantic understanding |
| 182 | +2. **HIGH**: Add code-aware embeddings |
| 183 | +3. **MEDIUM**: Improve pattern recognition |
| 184 | +4. **LOW**: Add file structure weighting |
| 185 | + |
| 186 | +--- |
| 187 | + |
| 188 | +## 🧪 How to Test Your Fixes |
| 189 | + |
| 190 | +After implementing fixes, test with claude-parser: |
| 191 | +```bash |
| 192 | +cd /Volumes/AliDev/ai-projects/claude-parser |
| 193 | + |
| 194 | +# Run the audit again |
| 195 | +python -c " |
| 196 | +from semantic_search import find_violations |
| 197 | +violations = find_violations() |
| 198 | +print(f'Found {len(violations)} violations') |
| 199 | +" |
| 200 | + |
| 201 | +# Test semantic understanding |
| 202 | +python -c " |
| 203 | +from semantic_search import discover_code_patterns |
| 204 | +results = discover_code_patterns('black box testing patterns') |
| 205 | +# Should return actual test files, not random files |
| 206 | +" |
| 207 | +``` |
| 208 | + |
| 209 | +--- |
| 210 | + |
| 211 | +## 📌 Remember |
| 212 | + |
| 213 | +This bug report is part of the dogfooding cycle. Fix these bugs and claude-parser will: |
| 214 | +1. Better analyze your conversations |
| 215 | +2. Find more bugs for you to fix |
| 216 | +3. Create a virtuous improvement cycle |
| 217 | + |
| 218 | +Good luck with the fixes! 🚀 |
0 commit comments