Skip to content

Latest commit

 

History

History
51 lines (35 loc) · 1.19 KB

File metadata and controls

51 lines (35 loc) · 1.19 KB

CodeBERT Code Indexing System

A comprehensive tool to index and analyze codebases using CodeBERT embeddings.

Project Structure

The codebase has been refactored into a modular structure:

  • models/ - Data models
    • code_models.py - Contains the CodeFile and CodeEmbedding dataclasses
  • parsers/ - Code parsing utilities
    • code_parser.py - Contains the CodeParser class for language detection and code analysis
  • indexers/ - Embedding and indexing functionality
    • codebert_indexer.py - Contains the CodeBERTIndexer class for generating embeddings
  • cli.py - Command-line interface
  • codebert_indexer.py - Main entry point

Usage

Scanning and Indexing a Codebase

python cli.py --scan /path/to/codebase --index-dir ./code_index --stats

Searching for Similar Code

python cli.py --search "def calculate_distance(point1, point2):" --index-dir ./code_index --top-k 5

Alternative Entry Point

You can also use the codebert_indexer.py entry point:

python codebert_indexer.py --scan /path/to/codebase

Requirements

  • Python 3.7+
  • PyTorch
  • Transformers
  • NumPy
  • scikit-learn
  • pandas

License

[Include license information here]