Misaligned tokenizer on embeddings?

### Issue description

The nomic-embed-text-v1.5-GGUF model does not seem to calculate embedding dimensions properly.

### Expected Behavior

No errors or warnings, embedding dimensions are calculated properly. As per now, I'm getting non-relevant embeddings during similarity searches.

It works okay with Ollama and the Python bindings of llama.cpp.

### Actual Behavior

These errors/warnings is produced when loading the model:

```
[node-llama-cpp] llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
[node-llama-cpp] Using this model ("./nomic-embed-text-v1.5.f16.gguf") to tokenize text and then detokenize it resulted in a different text. There might be an issue with the model or the tokenizer implementation. Using this model may not work as intended
```

As you will notice, the embeddings are generated but I suspect the way they are calculated is wrong, as similarity searches on vectors do not relevant content.

### Steps to reproduce

Download the model https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF

And then run it using `node-llama-cpp`:

```javascript
import { getLlama } from "node-llama-cpp";
import path from "path";
import { fileURLToPath } from "url";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

async function embedDocuments(documents) {
  const embeddings = new Map();

  await Promise.all(
    documents.map(async (document) => {
      const embedding = await context.getEmbeddingFor(document);
      embeddings.set(document, embedding);

      console.debug(
        `${embeddings.size}/${documents.length} documents embedded`
      );
    })
  );

  return embeddings;
}

function findSimilarDocuments(
  embedding,
  documentEmbeddings,
) {
  const similarities = new Map();
  for (const [otherDocument, otherDocumentEmbedding] of documentEmbeddings)
    similarities.set(
      otherDocument,
      embedding.calculateCosineSimilarity(otherDocumentEmbedding)
    );

  return Array.from(similarities.keys())
    .sort((a, b) => similarities.get(b) - similarities.get(a));
}

const llama = await getLlama();

// https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF
const model = await llama.loadModel({
  modelPath: path.join(__dirname, "nomic-embed-text-v1.5.f16.gguf"),
});

const context = await model.createEmbeddingContext();

const response = await context.getEmbeddingFor("crime");


const documentEmbeddings = await embedDocuments([
  "The sky is clear and blue today",
  "I love eating pizza with extra cheese",
  "Dogs love to play fetch with their owners",
  "The capital of France is Paris",
  "Drinking water is important for staying hydrated",
  "Mount Everest is the tallest mountain in the world",
  "A warm cup of tea is perfect for a cold winter day",
  "Painting is a form of creative expression",
  "Not all the things that shine are made of gold",
  "Cleaning the house is a good way to keep it tidy"
]);

const query = "Do you like pizza?";
const queryEmbedding = await context.getEmbeddingFor(query);

const similarDocuments = findSimilarDocuments(
  queryEmbedding,
  documentEmbeddings
);
const topSimilarDocument = similarDocuments[0];

console.log("query:", query);
console.log("Document:", topSimilarDocument); // Drinking water is important for staying hydrated
```

The returned vector will be:

"Drinking water is important for staying hydrated"

But it should be:

"I love eating pizza with extra cheese",

### My Environment

OS: macOS 24.1.0 (arm64)
Node: 20.18.1 (arm64)
TypeScript: 5.6.3
node-llama-cpp: 3.3.0

Metal: available

Metal device: Apple M3 Pro
Metal used VRAM: 0% (80KB/27GB)
Metal free VRAM: 99.99% (27GB/27GB)
Metal unified memory: 27GB (100%)

CPU model: Apple M3 Pro
Math cores: 6
Used RAM: 96.66% (34.8GB/36GB)
Free RAM: 3.33% (1.2GB/36GB)
Used swap: 0% (0B/0B)
Max swap size: dynamic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misaligned tokenizer on embeddings? #391

Issue description

Expected Behavior

Actual Behavior

Steps to reproduce

My Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Misaligned tokenizer on embeddings? #391

Description

Issue description

Expected Behavior

Actual Behavior

Steps to reproduce

My Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions