Skip to content

[Clippy] feat(html): render DOCX text boxes as inline-block divs in WmlToHtmlConverter#166

Draft
github-actions[bot] wants to merge 1 commit intomasterfrom
clippy/improve-html-textbox-support-3b0bb3dc024366df
Draft

[Clippy] feat(html): render DOCX text boxes as inline-block divs in WmlToHtmlConverter#166
github-actions[bot] wants to merge 1 commit intomasterfrom
clippy/improve-html-textbox-support-3b0bb3dc024366df

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This PR was created by Clippy, an automated AI assistant.

Closes #65

Summary

Text boxes in DOCX (w:drawing containing wps:wsp/wps:txbx) were silently dropped by WmlToHtmlConverter. The converter's ProcessDrawing method returned null for any drawing that wasn't an image (no a:blipFill), so all text box content was lost in HTML output.

Root Cause

ProcessDrawing only handled the image path (Pic.blipFill). There was no code path for wps:wsp / wps:txbx shapes that contain text content.

Fix

Added ProcessTextBoxDrawing called from ConvertToHtmlTransform before the existing ProcessImage call:

  1. Detection: checks a:graphicData/wps:wsp/wps:txbx in both wp:anchor (floating) and wp:inline containers.
  2. Content extraction: walks w:txbxContent paragraphs through the existing recursive HTML transform pipeline — all paragraph formatting, runs, hyperlinks, nested tables, etc. are handled correctly.
  3. Sizing: uses wp:extent cx/cy (in EMUs) to set width and min-height CSS properties.
  4. Float: wp:anchor (floating) text boxes get float: left so surrounding body text flows around them.

Emitted HTML

(div style="display: inline-block; float: left; min-height: 1.00in; overflow: hidden; padding: 2pt; width: 2.00in;")
  <p>...text box content rendered through normal paragraph pipeline...</p>
(/div)
```

## Trade-offs / Limitations

- **Absolute positioning** is not implemented — anchored text boxes use `float: left` rather than `position: absolute` with computed coordinates. This is intentional: computing exact pixel offsets requires knowledge of the container's layout dimensions, which the converter doesn't track. The float approach preserves text flow without data loss.
- **Inline text boxes** (`wp:inline`) get `display: inline-block` without float, which is typically correct for inline shapes.
- **No border/fill styling** from `wps:spPr` shape properties — the div has no border by default (shape fill and outline are ignored). This keeps the implementation minimal.

## Test Added

`HC063_TextBoxRenderedAsDiv` — creates a minimal DOCX with a floating text box programmatically and verifies:
- Text box content appears in the HTML output (not dropped)
- A `div` with `display: inline-block` is present
- The div has `width:`, `min-height:`, and `float: left` from the drawing extent

## Test Status

```
Test run summary: Passed!
  total: 1208
  failed: 0
  succeeded: 1207
  skipped: 1 (pre-existing)
  duration: 1m 22s

✅ All 1207 tests pass. dotnet csharpier check . passes.

Generated by Clippy ·

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@346204513ecfa08b81566450d7d599556807389f

…onverter

Previously, w:drawing elements containing a wps:wsp text box (wps:txbx)
were silently dropped by the HTML converter because ProcessDrawing only
handled blipFill (image) drawings.

This adds ProcessTextBoxDrawing which:
- Detects wps:wsp/wps:txbx in both wp:anchor (floating) and wp:inline drawings
- Extracts w:txbxContent paragraphs and recursively transforms them via the
  existing HTML conversion pipeline
- Emits a <div> with display:inline-block, width/min-height from wp:extent,
  and float:left for anchored (floating) text boxes

Fixes: #65

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding textBox and table wrapped around text when converting to HTML

0 participants