`gpt2` tokenizer returning `undefined` tokens everywhere

### System Info

```
"@huggingface/transformers": "^3.5.1",
"typescript": "^5.8.3"
```

### Environment/Platform

- [ ] Website/web-app
- [ ] Browser extension
- [x] Server-side (e.g., Node.js, Deno, Bun)
- [ ] Desktop app (e.g., Electron)
- [ ] Other (e.g., VSCode extension)

### Description

The `gpt2` tokenizer is leaving weird `undefined` tokens everywhere in the text. Since it's a Byte-Fallback BPE tokenizer, it should be able to encode everything and yet, it's not.

Here's what I am seeing from a unit test: 
```bash
Tokens: [
  15496, 11,    995,
  0,     770,   318,
  257,   1332,  286,
  262,   11241, undefined,
  13,    632,   815,
  307,   1498,  284,
  16058, 428,   2420,
  656,   4833,  5207,
  13
]

```

### Reproduction

```typescript

async function testGPT2() {
    try {
        const tokenizer = await AutoTokenizer.from_pretrained("gpt2");
        const text = "Hello, world! This is a test of the token chunker. It should be able to chunk this text into smaller pieces.";
        const encoding = await tokenizer.encode(text);
        console.log("Raw encoding:", encoding);
        console.log("Input IDs:", encoding);

        // Check for undefined in input_ids
        const hasUndefined = encoding.some(id => id === undefined);
        console.log("Input IDs contain undefined:", hasUndefined);

        const decodedText = tokenizer.decode(encoding, { skip_special_tokens: true });
        console.log("Decoded text:", decodedText);

    } catch (error) {
        console.error("Error during direct tokenizer test:", error);
    }
}

testGPT2();
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`gpt2` tokenizer returning `undefined` tokens everywhere #1313

System Info

Environment/Platform

Description

Reproduction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gpt2 tokenizer returning undefined tokens everywhere #1313

Description

System Info

Environment/Platform

Description

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`gpt2` tokenizer returning `undefined` tokens everywhere #1313