This repository was archived by the owner on Sep 23, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Extension: Intent-driven crate search with embeddings #3
Copy link
Copy link
Open
Description
Overview
Implement semantic/intent-driven crate discovery using embeddings for natural language queries like 'I need something like Express.js but for Rust' or 'async web server with middleware support'.
Vision: Intent-Driven Search
Instead of keyword matching, understand intent and context:
// Natural language queries
let crates = Eg::find_similar("I need something like Express.js but for Rust").await?;
// → finds axum, warp, actix-web
let crates = Eg::find_similar("HTTP client with automatic retries").await?;
// → finds reqwest, ureq, surf
let crates = Eg::find_similar("like numpy but for Rust").await?;
// → finds ndarray, nalgebra
// Combined with example search
let similar = Eg::find_similar("async task scheduling").await?;
for crate_info in similar {
let examples = Eg::rust_crate(&crate_info.name)
.pattern(r"schedule|timer|interval")?
.search().await?;
}Technical Approach
1. Embedding Generation
- Periodic sync of crate metadata from crates.io
- Generate embeddings for: descriptions, READMEs, keywords, categories
- Use local embedding model (e.g., sentence-transformers, all-MiniLM)
- Store embeddings with metadata in vector database
2. Vector Database
- Options: qdrant, chroma, or simple in-memory with faiss
- Index: crate embeddings with metadata (name, description, downloads, etc.)
- Support similarity search with configurable thresholds
3. Query Processing
- Embed user query using same model
- Vector similarity search to find relevant crates
- Rank by similarity + popularity (downloads, recent updates)
- Return structured results with similarity scores
Benefits
- Natural interaction: LLMs can ask 'what they mean' not just keywords
- Discovery: Find relevant crates even with imperfect terminology
- Learning: Understand ecosystem relationships and alternatives
- Workflow: Seamless discovery → example exploration pipeline
Implementation Phases
Phase 1: Basic Semantic Search
- Crate metadata sync and embedding generation
- Vector database integration
- Simple similarity search API
- Integration with existing example search
Phase 2: Enhanced Intelligence
- Usage-based similarity
- Multi-modal ranking
- Query understanding improvements
- Performance optimization
This represents the future vision of truly intelligent crate discovery!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels