Skip to content

Evaluate Model2Vec for 500x faster inference with acceptable accuracy trade-off #7

@nikomatsakis

Description

@nikomatsakis

Evaluate Model2Vec for 500x faster inference

Related to: #6 (Rust port tracking issue)

Overview

Model2Vec represents a paradigm shift that achieves 500x faster inference by distilling transformer models into static embeddings. This could provide even better performance than FastEmbed-rs for CLI usage.

Key Characteristics

  • Speed: 500x faster inference than traditional transformers
  • Size: 8-30MB model vs 100MB+ original all-MiniLM-L6-v2
  • Startup: Under 50ms cold starts (negligible)
  • Accuracy: Retains 85-95% accuracy for semantic similarity tasks
  • Implementation: Simple Rust API available

Research Findings

From Claude AI research report:

Model2Vec achieves 500x faster inference by distilling transformer models into static embeddings. The technique requires just 30 seconds to convert all-MiniLM-L6-v2 into an 8-30MB model while retaining 85-95% accuracy.

Evaluation Tasks

  • Convert all-MiniLM-L6-v2 to Model2Vec format
  • Benchmark inference speed vs FastEmbed-rs and Python sentence-transformers
  • Measure semantic similarity accuracy on representative test cases
  • Evaluate startup time and memory usage
  • Test with Hippo's actual search workload

Success Criteria

  • Startup time < 50ms
  • Inference speed significantly faster than FastEmbed-rs
  • Semantic similarity accuracy acceptable for Hippo's use cases (>90% correlation with original)
  • Integration complexity reasonable

Timeline

This is a follow-up evaluation after the primary FastEmbed-rs implementation is complete and validated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature development

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions