Skip to content
This repository was archived by the owner on Sep 23, 2025. It is now read-only.

Extension: Intent-driven crate search with embeddings #3

@nikomatsakis

Description

@nikomatsakis

Overview

Implement semantic/intent-driven crate discovery using embeddings for natural language queries like 'I need something like Express.js but for Rust' or 'async web server with middleware support'.

Vision: Intent-Driven Search

Instead of keyword matching, understand intent and context:

// Natural language queries
let crates = Eg::find_similar("I need something like Express.js but for Rust").await?;
// → finds axum, warp, actix-web

let crates = Eg::find_similar("HTTP client with automatic retries").await?;  
// → finds reqwest, ureq, surf

let crates = Eg::find_similar("like numpy but for Rust").await?;
// → finds ndarray, nalgebra

// Combined with example search
let similar = Eg::find_similar("async task scheduling").await?;
for crate_info in similar {
    let examples = Eg::rust_crate(&crate_info.name)
        .pattern(r"schedule|timer|interval")?
        .search().await?;
}

Technical Approach

1. Embedding Generation

  • Periodic sync of crate metadata from crates.io
  • Generate embeddings for: descriptions, READMEs, keywords, categories
  • Use local embedding model (e.g., sentence-transformers, all-MiniLM)
  • Store embeddings with metadata in vector database

2. Vector Database

  • Options: qdrant, chroma, or simple in-memory with faiss
  • Index: crate embeddings with metadata (name, description, downloads, etc.)
  • Support similarity search with configurable thresholds

3. Query Processing

  • Embed user query using same model
  • Vector similarity search to find relevant crates
  • Rank by similarity + popularity (downloads, recent updates)
  • Return structured results with similarity scores

Benefits

  • Natural interaction: LLMs can ask 'what they mean' not just keywords
  • Discovery: Find relevant crates even with imperfect terminology
  • Learning: Understand ecosystem relationships and alternatives
  • Workflow: Seamless discovery → example exploration pipeline

Implementation Phases

Phase 1: Basic Semantic Search

  • Crate metadata sync and embedding generation
  • Vector database integration
  • Simple similarity search API
  • Integration with existing example search

Phase 2: Enhanced Intelligence

  • Usage-based similarity
  • Multi-modal ranking
  • Query understanding improvements
  • Performance optimization

This represents the future vision of truly intelligent crate discovery!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions