A comprehensive tool to index and analyze codebases using CodeBERT embeddings.
The codebase has been refactored into a modular structure:
models/- Data modelscode_models.py- Contains theCodeFileandCodeEmbeddingdataclasses
parsers/- Code parsing utilitiescode_parser.py- Contains theCodeParserclass for language detection and code analysis
indexers/- Embedding and indexing functionalitycodebert_indexer.py- Contains theCodeBERTIndexerclass for generating embeddings
cli.py- Command-line interfacecodebert_indexer.py- Main entry point
python cli.py --scan /path/to/codebase --index-dir ./code_index --statspython cli.py --search "def calculate_distance(point1, point2):" --index-dir ./code_index --top-k 5You can also use the codebert_indexer.py entry point:
python codebert_indexer.py --scan /path/to/codebase- Python 3.7+
- PyTorch
- Transformers
- NumPy
- scikit-learn
- pandas
[Include license information here]