[BUG] LayoutLMv2Tokenizer crashes on NER inputs and batched padding/truncation

### System Info

* `transformers` version: `5.0.0.dev0`
* Platform: `Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39`
* Python version: `3.12.3`
* `huggingface_hub` version: `1.3.2`
* `safetensors` version: `0.7.0`
* `accelerate` version: `1.12.0`
* Accelerate config: `not installed`
* DeepSpeed version: `not installed`
* PyTorch version (accelerator?): `2.9.1+cu128 (CUDA)`
* GPU type: `NVIDIA L4`
* NVIDIA driver version: `550.90.07`
* CUDA version: `12.4`

### Who can help?

@zucchini-nlp (multimodal model)
@ArthurZucker (tokenizer)

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

**NER use case:**

```python
from transformers import LayoutLMv2Tokenizer

tokenizer = LayoutLMv2Tokenizer.from_pretrained("microsoft/layoutlmv2-base-uncased")
words = ["Total", "Amount", ":", "$1,234.56"]
boxes = [[100, 200, 300, 250], [310, 200, 450, 250], [460, 200, 480, 250], [490, 200, 650, 250]]
word_labels = [0, 0, 0, 1]

try:
    encoding = tokenizer(words, boxes=boxes, word_labels=word_labels)
    print(encoding["labels"])
except Exception as e:
    print(e)
```

**Batched training data prep with truncation/padding:**

```python
from transformers import LayoutLMv2Processor
from datasets import load_dataset
import textwrap

try:
    processor = LayoutLMv2Processor.from_pretrained(
        "microsoft/layoutlmv2-base-uncased",
        apply_ocr=False
    )
    dataset = load_dataset("nielsr/funsd", split="train")
    images = [img.convert("RGB") for img in dataset["image"]]
    words = list(dataset["words"])
    boxes = list(dataset["bboxes"])
    word_labels = list(dataset["ner_tags"])
    encoding = processor(
        images,
        words,
        boxes=boxes,
        word_labels=word_labels,
        padding="max_length",
        truncation=True,
        return_tensors="pt",
    )
    print(encoding["input_ids"].shape)
except Exception as e:
    print("\n".join(textwrap.wrap(str(e), width=160)))
```

[LayoutLMv2Tokenizer](https://github.com/huggingface/transformers/blob/main/src/transformers/models/layoutlmv2/tokenization_layoutlmv2.py#L112) crash with an `AttributeError` when `word_labels` is passed for NER token classification. In a different use case, calling the processor with `padding="max_length"` and `truncation=True` raises a downstream `ValueError` asking to set the aforementioned flags (more details in the PR; the screenshots in the PR show what happens after the first attr issue is fixed but before the second fix is made), despite both flags being set correctly.

**Current Repro Output:**

<img width="500" height="700" alt="Image" src="https://github.com/user-attachments/assets/4311018a-3fc5-4e5a-89c0-46a4b25d0387" />

### Expected behavior

→ `encoding["labels"]` should return a list in which subword tokens are masked with the [default ignore_index](https://docs.pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) (`-100`) in `nn.CrossEntropyLoss`
→ `encoding["input_ids"].shape` should return the expected `torch.Size()`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] LayoutLMv2Tokenizer crashes on NER inputs and batched padding/truncation #44186

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] LayoutLMv2Tokenizer crashes on NER inputs and batched padding/truncation #44186

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions