|
| 1 | +# Search Result Validation |
| 2 | + |
| 3 | +The Gemini Search Plugin includes comprehensive validation to ensure search results are relevant, accurate, and accessible. |
| 4 | + |
| 5 | +## Validation Layers |
| 6 | + |
| 7 | +### 1. False Positive Detection |
| 8 | + |
| 9 | +**Purpose**: Filter out irrelevant search results |
| 10 | + |
| 11 | +**How it works**: |
| 12 | +- Calculates relevance score by matching query terms against result title, snippet, and URL |
| 13 | +- Minimum relevance threshold: 50% |
| 14 | +- Returns results with relevance scores |
| 15 | + |
| 16 | +**Example**: |
| 17 | +```bash |
| 18 | +Query: "Claude Code plugins" |
| 19 | +Result: "Plugin Development Guide - Claude Code Documentation" |
| 20 | +Relevance: 100% (all terms matched) |
| 21 | +Status: VALID |
| 22 | +``` |
| 23 | + |
| 24 | +### 2. Static Link Validation |
| 25 | + |
| 26 | +**Purpose**: Verify URLs exist and are accessible |
| 27 | + |
| 28 | +**How it works**: |
| 29 | +- Sends HTTP HEAD request to check URL accessibility |
| 30 | +- Validates HTTP status codes (200-399 = valid) |
| 31 | +- Supports redirects (max 5 by default) |
| 32 | +- Times out after 10 seconds |
| 33 | +- Falls back gracefully if no HTTP client available |
| 34 | + |
| 35 | +**Tools used** (in order of preference): |
| 36 | +1. `curl` (primary) |
| 37 | +2. `wget` (fallback) |
| 38 | +3. Skip validation (if neither available) |
| 39 | + |
| 40 | +**Example**: |
| 41 | +```bash |
| 42 | +URL: https://docs.claude.com/plugins |
| 43 | +Status: HTTP 200 |
| 44 | +Result: accessible ✓ |
| 45 | +``` |
| 46 | + |
| 47 | +### 3. URL Format Validation |
| 48 | + |
| 49 | +**Purpose**: Ensure URLs have valid structure |
| 50 | + |
| 51 | +**Checks**: |
| 52 | +- Protocol: Must be `http://` or `https://` |
| 53 | +- Domain: Valid domain name format |
| 54 | +- Path: Optional, any valid path |
| 55 | + |
| 56 | +**Examples**: |
| 57 | +```bash |
| 58 | +✓ https://docs.claude.com/plugins |
| 59 | +✓ http://example.org/page |
| 60 | +✗ not-a-url |
| 61 | +✗ ftp://example.com |
| 62 | +``` |
| 63 | + |
| 64 | +### 4. Domain Blacklist |
| 65 | + |
| 66 | +**Purpose**: Filter out test and invalid domains |
| 67 | + |
| 68 | +**Blacklisted domains**: |
| 69 | +- `example.com` |
| 70 | +- `test.com` |
| 71 | +- `invalid.com` |
| 72 | +- `localhost` |
| 73 | +- `127.0.0.1` |
| 74 | +- `0.0.0.0` |
| 75 | +- `::1` |
| 76 | +- `*.local` |
| 77 | + |
| 78 | +**Example**: |
| 79 | +```bash |
| 80 | +URL: https://example.com/test |
| 81 | +Status: INVALID (blacklisted domain) |
| 82 | +``` |
| 83 | + |
| 84 | +## Configuration |
| 85 | + |
| 86 | +### Enable/Disable Validation |
| 87 | + |
| 88 | +```bash |
| 89 | +# Enable static link validation (default) |
| 90 | +export ENABLE_LINK_VALIDATION=true |
| 91 | + |
| 92 | +# Disable static link validation (faster, less accurate) |
| 93 | +export ENABLE_LINK_VALIDATION=false |
| 94 | +``` |
| 95 | + |
| 96 | +### Timeout Configuration |
| 97 | + |
| 98 | +```bash |
| 99 | +# HTTP request timeout in seconds (default: 10) |
| 100 | +export TIMEOUT_SECONDS=10 |
| 101 | + |
| 102 | +# Maximum HTTP redirects to follow (default: 5) |
| 103 | +export MAX_REDIRECTS=5 |
| 104 | +``` |
| 105 | + |
| 106 | +### Relevance Threshold |
| 107 | + |
| 108 | +Currently hardcoded to 50%. To modify, edit `scripts/search-wrapper.sh`: |
| 109 | + |
| 110 | +```bash |
| 111 | +# Line 175 |
| 112 | +if [[ $relevance_percentage -ge 50 ]] && [[ "$is_valid" == "true" ]]; then |
| 113 | +``` |
| 114 | +
|
| 115 | +## Validation Output Format |
| 116 | +
|
| 117 | +Results include validation metadata: |
| 118 | +
|
| 119 | +``` |
| 120 | +VALID|85|accessible |
| 121 | +``` |
| 122 | +
|
| 123 | +Format: `STATUS|RELEVANCE_SCORE|URL_STATUS` |
| 124 | +
|
| 125 | +- **STATUS**: `VALID` or `INVALID` |
| 126 | +- **RELEVANCE_SCORE**: 0-100 percentage |
| 127 | +- **URL_STATUS**: `accessible`, `inaccessible`, or `unknown` |
| 128 | +
|
| 129 | +## Performance Considerations |
| 130 | +
|
| 131 | +### With Link Validation Enabled |
| 132 | +
|
| 133 | +**Pros**: |
| 134 | +- ✅ Filters out broken links |
| 135 | +- ✅ Higher quality results |
| 136 | +- ✅ Better user experience |
| 137 | +
|
| 138 | +**Cons**: |
| 139 | +- ⏱️ Slower (adds ~1-2s per result) |
| 140 | +- 🌐 Requires network access |
| 141 | +- 💾 Not cached |
| 142 | +
|
| 143 | +**Best for**: Production use, critical searches |
| 144 | +
|
| 145 | +### With Link Validation Disabled |
| 146 | +
|
| 147 | +**Pros**: |
| 148 | +- ⚡ Faster results |
| 149 | +- 📡 Works offline |
| 150 | +- 💨 Lower latency |
| 151 | +
|
| 152 | +**Cons**: |
| 153 | +- ❌ May return broken links |
| 154 | +- ⚠️ Lower quality assurance |
| 155 | +
|
| 156 | +**Best for**: Development, testing, offline use |
| 157 | +
|
| 158 | +## Testing Validation |
| 159 | +
|
| 160 | +### Unit Tests |
| 161 | +
|
| 162 | +Run validation tests: |
| 163 | +
|
| 164 | +```bash |
| 165 | +bash tests/test-link-validation.sh |
| 166 | +``` |
| 167 | +
|
| 168 | +### Manual Testing |
| 169 | +
|
| 170 | +Test individual validation functions: |
| 171 | +
|
| 172 | +```bash |
| 173 | +# Source the validation script |
| 174 | +source scripts/validate-links.sh |
| 175 | + |
| 176 | +# Test URL format |
| 177 | +validate_url_format "https://docs.claude.com/plugins" |
| 178 | +echo $? # 0 = valid, 1 = invalid |
| 179 | + |
| 180 | +# Test URL exists |
| 181 | +check_url_exists "https://docs.claude.com/plugins" |
| 182 | +echo $? # 0 = exists, 1 = doesn't exist |
| 183 | + |
| 184 | +# Test blacklist |
| 185 | +check_url_blacklist "https://example.com/test" |
| 186 | +echo $? # 0 = not blacklisted, 1 = blacklisted |
| 187 | + |
| 188 | +# Calculate relevance |
| 189 | +calculate_relevance_score "claude plugins" "Claude Plugin Guide" "Guide to plugins" "https://claude.com/plugins" |
| 190 | +# Returns: 100 |
| 191 | +``` |
| 192 | +
|
| 193 | +### Full Validation Test |
| 194 | +
|
| 195 | +```bash |
| 196 | +bash scripts/validate-links.sh \ |
| 197 | + "claude code plugins" \ |
| 198 | + "Plugin Development Guide" \ |
| 199 | + "https://docs.claude.com/plugins" \ |
| 200 | + "Comprehensive guide to developing plugins for Claude Code" |
| 201 | +``` |
| 202 | +
|
| 203 | +Output: |
| 204 | +```json |
| 205 | +{ |
| 206 | + "valid": true, |
| 207 | + "url": "https://docs.claude.com/plugins", |
| 208 | + "url_status": "accessible", |
| 209 | + "relevance_score": 100, |
| 210 | + "relevance_threshold": 50, |
| 211 | + "failure_reasons": [] |
| 212 | +} |
| 213 | +``` |
| 214 | +
|
| 215 | +## Debugging Validation Issues |
| 216 | +
|
| 217 | +### Enable Debug Logging |
| 218 | +
|
| 219 | +```bash |
| 220 | +export LOG_FILE="/tmp/gemini-search-debug.log" |
| 221 | + |
| 222 | +# Run search |
| 223 | +/search "your query" |
| 224 | + |
| 225 | +# View logs |
| 226 | +tail -f /tmp/gemini-search-debug.log | grep "Validating\|accessible" |
| 227 | +``` |
| 228 | +
|
| 229 | +### Common Issues |
| 230 | +
|
| 231 | +#### Issue: All results marked INVALID |
| 232 | +
|
| 233 | +**Cause**: Link validation timing out |
| 234 | +
|
| 235 | +**Solution**: |
| 236 | +```bash |
| 237 | +# Increase timeout |
| 238 | +export TIMEOUT_SECONDS=30 |
| 239 | + |
| 240 | +# Or disable link validation |
| 241 | +export ENABLE_LINK_VALIDATION=false |
| 242 | +``` |
| 243 | +
|
| 244 | +#### Issue: Validation too slow |
| 245 | +
|
| 246 | +**Cause**: HTTP requests taking too long |
| 247 | +
|
| 248 | +**Solution**: |
| 249 | +```bash |
| 250 | +# Reduce timeout |
| 251 | +export TIMEOUT_SECONDS=5 |
| 252 | + |
| 253 | +# Reduce max redirects |
| 254 | +export MAX_REDIRECTS=2 |
| 255 | +``` |
| 256 | +
|
| 257 | +#### Issue: "No HTTP client available" |
| 258 | +
|
| 259 | +**Cause**: Neither curl nor wget installed |
| 260 | +
|
| 261 | +**Solution**: |
| 262 | +```bash |
| 263 | +# Install curl (Ubuntu/Debian) |
| 264 | +sudo apt-get install curl |
| 265 | + |
| 266 | +# Install curl (macOS) |
| 267 | +brew install curl |
| 268 | + |
| 269 | +# Install curl (Windows/Chocolatey) |
| 270 | +choco install curl |
| 271 | +``` |
| 272 | +
|
| 273 | +## Validation Statistics |
| 274 | +
|
| 275 | +View validation performance: |
| 276 | +
|
| 277 | +```bash |
| 278 | +/search-stats |
| 279 | +``` |
| 280 | +
|
| 281 | +Shows: |
| 282 | +- Total searches |
| 283 | +- Cache hit rate |
| 284 | +- Average relevance scores (future feature) |
| 285 | +- URL accessibility rate (future feature) |
| 286 | +
|
| 287 | +## Future Enhancements |
| 288 | +
|
| 289 | +Planned validation improvements: |
| 290 | +
|
| 291 | +- [ ] SSL certificate validation |
| 292 | +- [ ] Content-type checking (HTML only) |
| 293 | +- [ ] Duplicate URL detection |
| 294 | +- [ ] Custom blacklist configuration |
| 295 | +- [ ] Whitelist support |
| 296 | +- [ ] Validation result caching |
| 297 | +- [ ] Async validation (parallel checks) |
| 298 | +- [ ] Configurable relevance thresholds |
| 299 | +- [ ] Machine learning relevance scoring |
| 300 | +
|
| 301 | +## Best Practices |
| 302 | +
|
| 303 | +### For Users |
| 304 | +
|
| 305 | +1. **Enable link validation in production** |
| 306 | + - Ensures high-quality results |
| 307 | + - Prevents dead links |
| 308 | +
|
| 309 | +2. **Disable link validation for development** |
| 310 | + - Faster iteration |
| 311 | + - Works offline |
| 312 | +
|
| 313 | +3. **Monitor validation logs** |
| 314 | + - Identify patterns |
| 315 | + - Tune thresholds |
| 316 | +
|
| 317 | +### For Developers |
| 318 | +
|
| 319 | +1. **Test with validation enabled and disabled** |
| 320 | + - Ensure both modes work |
| 321 | + - Handle graceful degradation |
| 322 | +
|
| 323 | +2. **Add validation tests** |
| 324 | + - Test new validation rules |
| 325 | + - Prevent regressions |
| 326 | +
|
| 327 | +3. **Document validation behavior** |
| 328 | + - Update VALIDATION.md |
| 329 | + - Add examples |
| 330 | +
|
| 331 | +## Related Documentation |
| 332 | +
|
| 333 | +- [README.md](../README.md) - Overview and features |
| 334 | +- [TESTING.md](../TESTING.md) - Testing guide |
| 335 | +- [DEPLOYMENT.md](../DEPLOYMENT.md) - Deployment procedures |
| 336 | +- [scripts/validate-links.sh](../scripts/validate-links.sh) - Validation implementation |
0 commit comments