Unsupported features in WebVTT parser

### Bug

The current backend parser for WebVTT files lacks of some features. In addition, the transformation into DoclingDocument could be improved to help users separate the cue text from the cue metadata.

- Cue blocks with text like `P&L` fail to parse since `&` is forbidden according to the specs (https://www.w3.org/TR/webvtt1/#webvtt-cue-text-span)
  - We could relax this constraint
- The voice annotation (the speaker), is parsed in Docling by adding it to the cue span text as a prefix. E.g. `<v Narrator>Welcome</v>` becomes `Narrator: Welcome`
  - We could put this as a label of the text item, to avoid missing cue text and cue metadata.
- Cue text spans with mixed formatted text are parsed into Docling inline groups, but spaces are ignore and therefore it is not possible to reproduce the correct spacing of the text without formatting.
- The language annotation "language:en-US" is not parsed (a warning message is sent). 
- REGION and STYLE blocks are not addressed and trigger warnings
- The WebVTT cue class span is not addressed.

### Steps to reproduce

Check the following script that illustrates the gaps and shows the expected parsed text without metadata and formatting.

[test_process_docling_vtt.py](https://github.com/user-attachments/files/23148104/test_process_docling_vtt.py)

### Docling version

Docling version: 2.58.0
Docling Core version: 2.48.4
Docling IBM Models version: 3.9.1
Docling Parse version: 4.7.0
Python: cpython-312 (3.12.10)
Platform: macOS-14.7.1-arm64-arm-64bit

### Python version

Python 3.12.10


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unsupported features in WebVTT parser #2525

Bug

Steps to reproduce

Docling version

Python version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unsupported features in WebVTT parser #2525

Description

Bug

Steps to reproduce

Docling version

Python version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions