[Feature]: Add CLI to launch FastEmbed as an Embedding API service (OpenAI-compatible)

### What feature would you like to request?


## **Feature Request: Add CLI to launch FastEmbed as an Embedding API service (OpenAI-compatible)**

**Title**  
> CLI to start FastEmbed as a local HTTP service providing `/v1/embeddings` endpoint

---

### **Background / Motivation**

Currently FastEmbed is a Python library — users need to write Python code to create embeddings.  
In many production setups:
- Teams want to **run FastEmbed as a self-contained service** (like OpenAI embeddings API), locally or on an internal server.
- Many existing apps/tools (LangChain, LlamaIndex, etc.) expect an OpenAI-style `/v1/embeddings` REST API, so they can work without code changes.
- This would also make FastEmbed easier to integrate into non-Python environments.

---

### **Proposed Solution**

Add a CLI command (e.g. `fastembed serve`) to start a HTTP server.  
Server will host a REST API compatible with OpenAI's embeddings format:

**CLI Example**  
```bash
fastembed serve \
  --model-name BAAI/bge-small-zh-v1.5 \
  --device cuda \
  --model-path /path/to/local/model \
  --port 8080 \
  --host 0.0.0.0
```

**Optional flags**:
- `--model-path` : Specify local ONNX/quantized model file path (avoid re-download)
- `--model-name` : HuggingFace model name (fallback to remote download if path not given)
- `--device` : `cpu` / `cuda`
- `--port` : Listening port
- `--host` : Bind address

---

### **API Specification (`POST /v1/embeddings`)**

**Request:**
```json
{
  "input": [
    "Artificial intelligence is the future",
    "FastEmbed makes embeddings blazing fast"
  ]
}
```

**Response:**
```json
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [-0.0246, -0.0536, -0.0010, ...]
    },
    {
      "object": "embedding",
      "index": 1,
      "embedding": [0.0123, -0.0456, 0.0231, ...]
    }
  ],
  "model": "BAAI/bge-small-zh-v1.5",
  "usage": {
    "prompt_tokens": 0,
    "total_tokens": 0
  }
}
```

---

### **Benefits**
- **Drop-in replacement for OpenAI embeddings** — developers can point existing code to local FastEmbed service by only changing `OPENAI_API_BASE`.
- Works in **multi-language environments** (Java, Go, JS, etc.) via HTTP.
- **No cloud dependency** — model runs locally, respects data privacy.
- **Easy deployment**:
  - Local dev via CLI  
  - Docker container for production use (`docker run ... fastembed serve`)

---

### **Alternatives**
- Users manually wrap FastEmbed in Flask/FastAPI (but requires extra coding).
- Run embedding in Python only — limits cross-language integration.

---

### **Additional Context**
- Similar approach in **Ollama** (`ollama serve`) and **HuggingFace Inference Server**.
- Many users (including me) have internal projects using `/v1/embeddings`, switching from OpenAI to FastEmbed would be **zero code change** with this feature.
- Could later extend to `/v1/reRank` for ColBERT / BGE reranking models.

---

**Possible Implementation**
Use FastAPI / Uvicorn internally:
```python
from fastapi import FastAPI
from fastembed import TextEmbedding

app = FastAPI()
embedder = TextEmbedding(model_name="BAAI/bge-small-zh-v1.5")

@app.post("/v1/embeddings")
async def create_embeddings(request: dict):
    inputs = request.get("input", [])
    vectors = list(embedder.embed(inputs))
    return {
        "object": "list",
        "data": [
            {"object": "embedding", "index": i, "embedding": vec.tolist()}
            for i, vec in enumerate(vectors)
        ],
        "model": "BAAI/bge-small-zh-v1.5",
        "usage": {"prompt_tokens": 0, "total_tokens": 0}
    }
```



### Is there any additional information you would like to provide?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Add CLI to launch FastEmbed as an Embedding API service (OpenAI-compatible) #571

What feature would you like to request?

Feature Request: Add CLI to launch FastEmbed as an Embedding API service (OpenAI-compatible)

Background / Motivation

Proposed Solution

API Specification (`POST /v1/embeddings`)

Benefits

Alternatives

Additional Context

Is there any additional information you would like to provide?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Add CLI to launch FastEmbed as an Embedding API service (OpenAI-compatible) #571

Description

What feature would you like to request?

Feature Request: Add CLI to launch FastEmbed as an Embedding API service (OpenAI-compatible)

Background / Motivation

Proposed Solution

API Specification (POST /v1/embeddings)

Benefits

Alternatives

Additional Context

Is there any additional information you would like to provide?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

API Specification (`POST /v1/embeddings`)