Skip to content

Conversation

@xenova
Copy link
Collaborator

@xenova xenova commented Jul 1, 2025

NeoBERT is a next-generation encoder model for English text representation, pre-trained from scratch on the RefinedWeb dataset. NeoBERT integrates state-of-the-art advancements in architecture, modern data, and optimized pre-training methodologies. It is designed for seamless adoption: it serves as a plug-and-play replacement for existing base models, relies on an optimal depth-to-width ratio, and leverages an extended context length of 4,096 tokens. Despite its compact 250M parameter footprint, it is the most efficient model of its kind and achieves state-of-the-art results on the massive MTEB benchmark, outperforming BERT large, RoBERTa large, NomicBERT, and ModernBERT under identical fine-tuning conditions.

You can compute embeddings using the pipeline API:

import { pipeline } from "@huggingface/transformers";

// Create feature extraction pipeline
const extractor = await pipeline("feature-extraction", "onnx-community/NeoBERT-ONNX");

// Compute embeddings
const text = "NeoBERT is the most efficient model of its kind!";
const embedding = await extractor(text, { pooling: "cls" });
console.log(embedding.dims); // [1, 768]

Or manually with the model and tokenizer classes:

import { AutoModel, AutoTokenizer } from "@huggingface/transformers";

// Load model and tokenizer
const model_id = "onnx-community/NeoBERT-ONNX";
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const model = await AutoModel.from_pretrained(model_id);

// Tokenize input text
const text = "NeoBERT is the most efficient model of its kind!";
const inputs = tokenizer(text);

// Generate embeddings
const outputs = await model(inputs);
const embedding = outputs.last_hidden_state.slice(null, 0);
console.log(embedding.dims); // [1, 768]

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@xenova xenova merged commit 9b9bd21 into main Jul 2, 2025
4 checks passed
@xenova xenova deleted the add-neobert branch July 2, 2025 03:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants