-
-
Notifications
You must be signed in to change notification settings - Fork 0
Add Word format support for .docx and .doc files with Pandoc and hidden text extraction #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the Comment |
Co-authored-by: hawkeyexl <[email protected]>
Co-authored-by: hawkeyexl <[email protected]>
Co-authored-by: hawkeyexl <[email protected]>
|
@copilot Did you evaluate the parent issue before starting? How can I specify inline comments in the Word file via this method of conversion? |
Co-authored-by: hawkeyexl <[email protected]>
I evaluated the issue description which mentioned "Support converting Word files into Doc Detective tests" but did not include details about inline comment requirements. To address your question: You can now specify inline test specifications in Word documents by typing HTML comment syntax as plain text. For example: The conversion process unescapes the HTML characters so these comments are parsed correctly. I've added:
All supported inline specification types (test start, step, test end, ignore blocks) work in Word documents when typed as plain text. |
|
@copilot Instead of mammoth.js, use Pandoc to convert to DOCX to Markdown, and use a custom Lua filter to extract Hidden text from the Word file as HTML-style comments in the Markdown. |
Co-authored-by: hawkeyexl <[email protected]>
Implemented in commit 4f316bb. Changes made:
The approach provides:
All 37 tests passing. Documentation updated with instructions for using hidden text in Word. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Overview
This PR adds support for Microsoft Word documents (.docx and .doc files) as input for Doc Detective test detection and resolution. Word documents are automatically converted to Markdown using Pandoc with a custom Lua filter and processed using the existing Markdown parsing rules.
Changes
Core Implementation
src/word-hidden-text-filter.lua) to extract hidden text from Word documents and convert it to HTML commentsconvertWordToMarkdown()function insrc/utils.jsthat:parseTests()function to detect Word documents by extension and automatically convert them before processingwordfile type definition insrc/config.jswith support for .docx and .doc extensionsFeatures Supported
All Markdown-based test detection features now work with Word documents:
Inline Test Specifications with Hidden Text
The preferred method for adding inline test specifications is to use Word's hidden text feature. This provides a cleaner user experience compared to typing HTML comments as visible text.
How to use hidden text in Word:
<!-- test { "id": "my-test" } -->)The custom Lua filter extracts hidden text during conversion and wraps it in HTML comment syntax for Doc Detective to parse.
Example:
In your Word document, create hidden text containing:
Then write your visible documentation:
Add another hidden text section:
Alternative: You can still type HTML comments as plain text (visible in the document) for backward compatibility.
Testing
src/word.test.jstest/artifacts/sample-test.docxandtest/artifacts/sample-with-inline-specs.docx) for testingDocumentation
docs/word-format-support.mdwith usage examples, features, and limitationssrc/word-hidden-text-filter.mddocumenting the Lua filterUsage Example
Given a Word document with content like:
Doc Detective will automatically detect click and find actions from the bold text.
For inline test specifications, use Word's hidden text feature (Ctrl+D > Check "Hidden"):
Requirements
Pandoc must be installed on your system:
apt-get install pandocorbrew install pandocKnown Limitations
Related Issue
Fixes #67
Original prompt
Fixes #67
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.