TL;DR: Enable "editing" for markdown text - allow annotators to edit content (detoxify, refine, curate). Critical for LLM training data curation workflows.
❌ Is your feature request related to a problem? Please describe.
I'm working on LLM fine-tuning data curation where domain experts need to simultaneously edit and label markdown-formatted documents. Current Label Studio only supports view-only annotation, but our workflow requires "editing".
Pain points:
- 🚫 No content editing capability (can only annotate existing text)
- 🔄 Users must copy-paste to external editors (VSCode), edit, paste back - breaking the workflow
- 📝 No modification tracking
This is critical for data curation tasks where experts need to:
- ✏️ Edit: Remove toxic/outdated content, adapt for legal compliance
- 🏷️ Label: Classify content (license types, political sensitivity, domain categories)
✅ Describe the solution you'd like
Integrate a markdown editor (Monaco Editor/CodeMirror) with split-pane live preview to enable "labeling".
Key features:
- 📱 Split-pane interface: Raw markdown editor (left) + live rendered preview (right) with synchronized scrolling
- 🔍 Change tracking: Track modifications as annotations, view diff, export both original and edited versions
- 📤 Export: Both edited markdown content and annotations/labels
Typical workflow:
1. 📥 Import JSON documents (markdown text field)
2. 👀 Annotator reviews in split-pane view
3. ✏️ Edit content (detoxify, remove outdated info, refine)
4. 🏷️ Add labels/tags (e.g., "license-type", "political-content")
5. 📤 Export as JSON dataset with refined content + annotations
🤔 Describe alternatives you've considered
-
External editors (current workaround): Copy to VSCode → edit → paste back
❌ Problem: Breaks workflow, no tracking, error-prone
-
Pre-render markdown to HTML: Import pre-rendered HTML
❌ Problem: No editing, only annotation
⚠️ None support the integrated "editing" workflow inside Label Studio needed for data curation.
📋 Additional context
Use cases:
- 🧹 Content detoxification (remove toxic/outdated content)
- ⚖️ License classification per paragraph
- 🗳️ Political content identification
- 🎯 Domain-specific refinement for LLM training data
Technical details:
- 💻 Monaco Editor (VSCode's editor) recommended - mature, excellent markdown support
- 📏 Document size: ~1000 chars per block (browser-friendly)
- 🔧 Should integrate with Label Studio's existing labeling XML configuration
Sample data format:
{
"data": {
"text": "# Heading\n\nParagraph with **bold** text...",
"metadata": {"source": "book-v1", "document_id": "doc-123"}
}
}
Why this matters:
- 🚀 Extends Label Studio's paradigm: From "annotate existing content" to "curate and annotate"
- 📝 Markdown is the de facto format for LLM training data
- ⚡ Data curation is a critical bottleneck in LLM fine-tuning
- 🔗 Unifies editing and annotation workflows in one tool
Visual mockup:
┌───────────────────────────────────────────────────────┐
│ Editor (Raw Markdown) │ Preview (Rendered) │
├──────────────────────────┼────────────────────────────┤
│ # Title │ Title │
│ ## Section 1 │ ══════ │
│ This is **important** │ Section 1 │
│ - Item 1 │ This is important │
│ │ • Item 1 │
├──────────────────────────┴────────────────────────────┤
│ Labels: [Political] [Legal-Review] [License: CC-BY] │
└───────────────────────────────────────────────────────┘
💬 I'm happy to contribute or provide more details about this workflow!
TL;DR: Enable "editing" for markdown text - allow annotators to edit content (detoxify, refine, curate). Critical for LLM training data curation workflows.
❌ Is your feature request related to a problem? Please describe.
I'm working on LLM fine-tuning data curation where domain experts need to simultaneously edit and label markdown-formatted documents. Current Label Studio only supports view-only annotation, but our workflow requires "editing".
Pain points:
This is critical for data curation tasks where experts need to:
✅ Describe the solution you'd like
Integrate a markdown editor (Monaco Editor/CodeMirror) with split-pane live preview to enable "labeling".
Key features:
Typical workflow:
🤔 Describe alternatives you've considered
External editors (current workaround): Copy to VSCode → edit → paste back
❌ Problem: Breaks workflow, no tracking, error-prone
Pre-render markdown to HTML: Import pre-rendered HTML
❌ Problem: No editing, only annotation
📋 Additional context
Use cases:
Technical details:
Sample data format:
{ "data": { "text": "# Heading\n\nParagraph with **bold** text...", "metadata": {"source": "book-v1", "document_id": "doc-123"} } }Why this matters:
Visual mockup:
💬 I'm happy to contribute or provide more details about this workflow!