|
| 1 | +# Language Coverage Status |
| 2 | + |
| 3 | +## Currently Supported Languages (AST-based detection) |
| 4 | + |
| 5 | +✅ **C** - 9 fixture files, OpenSSL library detection |
| 6 | +✅ **C++** - 12 fixture files, OpenSSL/Botan/CryptoPP library detection |
| 7 | +✅ **Rust** - 13 fixture files, Ring/RustCrypto library detection |
| 8 | +✅ **Python** - 16 fixture files, Cryptography library detection |
| 9 | +✅ **Java** - 13 fixture files, JCA/BouncyCastle library detection |
| 10 | +✅ **Go** - 12 fixture files, std-crypto/x-crypto library detection |
| 11 | + |
| 12 | +**Total**: 37 ground truth files generated for supported languages |
| 13 | + |
| 14 | +## Languages with Fixtures but No AST Support Yet |
| 15 | + |
| 16 | +⏳ **PHP** - 9 fixture files (OpenSSL/Sodium libraries) |
| 17 | +⏳ **Swift** - 9 fixture files (CryptoKit/CommonCrypto libraries) |
| 18 | +⏳ **Kotlin** - 5 fixture files (JCA library) |
| 19 | +⏳ **Objective-C** - 9 fixture files (CommonCrypto/OpenSSL libraries) |
| 20 | +⏳ **Erlang** - 5 fixture files (OTP-crypto library) |
| 21 | + |
| 22 | +**Total**: 37 fixture files without AST support |
| 23 | + |
| 24 | +## Why Some Languages Aren't Supported Yet |
| 25 | + |
| 26 | +The missing languages have inconsistent or incompatible tree-sitter parser APIs: |
| 27 | + |
| 28 | +- **tree-sitter-php**: Uses different function naming convention |
| 29 | +- **tree-sitter-swift**: Uses `LANGUAGE` constant instead of `language()` function |
| 30 | +- **tree-sitter-kotlin**: Different API structure |
| 31 | +- **Objective-C**: No stable tree-sitter parser available |
| 32 | +- **Erlang**: No stable tree-sitter parser available |
| 33 | + |
| 34 | +## Detection Quality by Language |
| 35 | + |
| 36 | +### High Quality Detection |
| 37 | +- **C/C++**: Detects include statements and function calls accurately |
| 38 | +- **Python**: Detects import statements and algorithm usage |
| 39 | +- **Java**: Detects import statements for crypto APIs |
| 40 | +- **Go**: Detects import statements for crypto packages |
| 41 | + |
| 42 | +### Needs Refinement |
| 43 | +- **Rust**: Currently generates many false positives (51 findings for a simple file) |
| 44 | + - AST patterns are too broad and match every identifier |
| 45 | + - Need more specific patterns for crypto-specific usage |
| 46 | + |
| 47 | +## How to Add New Language Support |
| 48 | + |
| 49 | +1. **Add tree-sitter parser dependency** to Cargo.toml |
| 50 | +2. **Add parser initialization** in `AstDetector::new()` |
| 51 | +3. **Add language matching** in `find_matches()` method |
| 52 | +4. **Define AST patterns** for the language in `default_patterns()` or patterns.toml |
| 53 | +5. **Add detector** in CLI main.rs |
| 54 | +6. **Test and validate** with existing fixtures |
| 55 | + |
| 56 | +## Future Improvements |
| 57 | + |
| 58 | +1. **Refine Rust patterns** to reduce false positives |
| 59 | +2. **Add support for missing languages** with stable tree-sitter parsers |
| 60 | +3. **Enhance algorithm detection** with parameter extraction |
| 61 | +4. **Add more library patterns** for comprehensive coverage |
| 62 | +5. **Consider fallback regex detection** for languages without AST support |
| 63 | + |
| 64 | +## Current Tool Usage |
| 65 | + |
| 66 | +The tool currently works well for the 6 supported languages: |
| 67 | + |
| 68 | +```bash |
| 69 | +./target/release/cipherscope fixtures/python/cryptography/ --patterns patterns.toml |
| 70 | +./target/release/cipherscope fixtures/c/openssl/ --patterns patterns.toml |
| 71 | +./target/release/cipherscope fixtures/java/jca/ --patterns patterns.toml |
| 72 | +``` |
| 73 | + |
| 74 | +For unsupported languages, the tool will simply skip the files (no errors, just no output). |
0 commit comments