DocSharp is a pure C# library to convert between document formats without Office interop or native dependencies (except for some special packages, see requirements).
The following packages are currently available:
- DocSharp.Binary: convert Office 97-2003 binary documents (doc, xls, ppt) to OpenXML documents (docx, xlsx, pptx). This is a fork of the abandoned b2xtranslator project which provides critical fixes.
Note: pre-97 formats and XLSB are very different and not supported. - DocSharp.Docx: convert DOCX to RTF, HTML, Markdown and plain text (.txt). Possible applications include generating Open XML documents in C# and exporting for other editors/services, or loading Microsoft Word documents in a RichTextBox / RichEditBox control.
- DocSharp.Markdown: convert Markdown to DOCX or RTF using custom Markdig renderers.
Packages can be installed via NuGet:
The optional extra packages DocSharp.ImageSharp, DocSharp.SystemDrawing, DocSharp.MagickNET allow to convert unsupported images (e.g. GIF / TIFF for DOCX -> RTF or WMF / EMF / TIFF for DOCX -> Markdown/HTML). Each of these has pros and cons, the choice depends on your requirements. More information can be found in the Wiki.
The codebase also contains few experimental converters that are not ready and not published on NuGet yet:
- DocSharp.Renderer: provides basic DOCX and XLSX to PDF/images/SVG/XPS conversion using QuestPDF.
- DocSharp.Epub: provides basic EPUB to DOCX (via HTML) conversion.
- RTF to DOCX converter class in the DocSharp.Docx project
There is no common DOM to manipulate or generate documents, this library is mainly for conversion. Some helper methods on top of the Open XML SDK and format-specific writers are available, but they are mostly intended for internal use; however they could be extended/improved in the future.
You can consider using the Open XML SDK itself or other recommended libraries for documents creation and manipulation. Some of these are used in the sample app to test two-steps conversions, compare results, or generate documents in multiple formats with the same code.
DocSharp provides methods to accept/return a WordprocessingDocument directly (in addition to file path / Stream / byte array), and a SaveTo extension method for WordprocessingDocument.
- Binary formats: most doc/xls/ppt features were supported by the original project, but exceptions occurred when using .NET (rather than .NET Framework) or loading specific documents. The most noticeable issues have been fixed, but more work is needed to make the library reliable; if you find other bugs, you are welcome to open an issue (please attach a sample file if the issue only occurs for specific documents).
- DOCX, RTF, Markdown: supported elements vary depending on input and output formats, see Supported features for an overview.
- Supported targets are .NET 8, 9, 10 and .NET Framework 4.6.2 (minimum netfx version still supported).
- DocSharp.SystemDrawing is for Windows only (.NET Framework or net*-windows), as System.Drawing.Common is based on GDI+ and only supported on Windows since .NET 6.
- DocSharp.ImageSharp is cross-platform for .NET 8+, as ImageSharp is fully managed C# code but does not support .NET Framework.
- DocSharp.MagickNET is cross-platform for both .NET and .NET Framework, but Magick.NET bundles many native libraries that might not work on non-desktop platforms (Android / iOS /
- DocSharp.Renderer depends on QuestPDF, which currently supports Windows x64 / x86, macOS x64 / ARM64, Linux x64 / ARM64. Windows ARM64, Android, iOS are not supported yet, due to a custom Skia build. Plus, the XPS generation is only supported on Windows.
You can refer to the project Wiki or sample app.
- Finish and publish experimental converters
- Support more elements and attributes, and fix issues on edge cases
- Reduce code duplication, cleanup
- Async functions/progress callback (some tasks such as downloading images referenced in Markdown may take some time)
- Improve support for right-to-left and complex script languages
- Make converters thread-safe
Dependencies:
- Open XML SDK
- Markdig - for DocSharp.Markdown
- ImageSharp and VectSharp - for DocSharp.ImageSharp
- System.Drawing.Common and SVG.NET - for DocSharp.SystemDrawing
- CoreJ2K - for JPEG2000 support in both DocSharp.ImageSharp and DocSharp.System.Drawing
- Magick.NET-Q8-AnyCPU - for DocSharp.MagickNET
- QuestPDF - for DocSharp.Renderer
- EpubCore, Html2OpenXml, PreMailer.Net, AngleSharp - for DocSharp.Epub (AngleSharp is a dependency of Html2OpenXml and PreMailer.Net)
Forked:
Others:
- Html2OpenXml for images header decoding and unit conversions.
- dwml_cs for Office Math (OMML) to LaTex conversion
- addFormula2docx for Office Math (OMML) to MathML conversion
- RtfPipe, FridaysForks.RtfPipe, RtfConverter for part of the RTF parsing logic.
Other recommended libraries (some of these are used in the sample app, not dependencies when installing packages):
- Read, write, manipulate docuents:
- OfficeIMO - DOCX, XLSX, PPTX, Markdown, CSV; can also merge, compare and convert some formats
- Clippit - DOCX, XLSX, PPTX; can also merge, compare and convert some formats
- ShapeCrawler - PPTX; can also render slides to images
- ClosedXML - XLSX
- Sylvan.Data.Excel - XLSX, XLS, XLSB
- NPOI - DOCX, XLSX, XLS; partial port of Apache POI
- FluentNPOI - XLSX, XLS; HTML/PDF export
- Extract data:
- GustavoHennig/b2xtranslator - DOC prior to Office 97
- ExcelDataReader - XLS (pre-97 too), XLSB, XLSX, CSV
- PdfPig, Tabula.Csv - PDF
- OpenMcdf - Microsoft Compound format
- Generate documents:
- PDF, XPS, SVG, images: QuestPDF, FossPDF.NET
- PDF and RTF: PdfSharp / MigraDoc
- PDF and XLSX: PdfRpt.Core
- PDF, RTF, HTML: iTextSharp.LGPLv2.Core
- DOCX: SharpDocx, DocxTemplater, MiniWord
- XLSX: MiniExcel, ClosedXML.Report
- XLSX, ODS, CSV: FreeDataExports
- Convert or render documents:
DocSharp is licensed under MIT license and can be used for both open source and commercial projects.
DocSharp.ImageSharp and DocSharp.MagickNET are licensed under Apache 2.0 license.
ImageSharp has a dual license, please visit their repository for more information.
VectSharp is used under LGPL in this project (GPL packages are not used).
DocSharp.Renderer is itself licensed under MIT, but depends on QuestPDF which has additional requirements for companies and may require purchasing a license. Please check their repository for information on the Community and Commercial licenses.
- If you know how to fix a bug, feel free to open a Pull Request.
- To implement a new feature, please open an issue or discussion to propose it before working on the pull request.
- If you find the library useful, adding a star is highly appreciated. Stars are a way to guide other developers towards helpful libraries and tools.
- This is a hobby project. You are welcome to donate to financially support its further development, if you wish (sponsor links for GitHub, LiberaPay, Ko-Fi, BuyMeACoffee and Thanks.dev are available in the repo page).