Skip to content

Pure C# library to convert between document formats (Office 97-2003, Open XML, RTF, Markdown)

License

MIT, Apache-2.0 licenses found

Licenses found

MIT
LICENSE
Apache-2.0
LICENSE-Apache
Notifications You must be signed in to change notification settings

manfromarce/DocSharp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DocSharp

DocSharp is a pure C# library to convert between document formats without Office interop or native dependencies (except for some special packages, see requirements).

The following packages are currently available:

  • DocSharp.Binary: convert Office 97-2003 binary documents (doc, xls, ppt) to OpenXML documents (docx, xlsx, pptx). This is a fork of the abandoned b2xtranslator project which provides critical fixes.
    Note: pre-97 formats and XLSB are very different and not supported.
  • DocSharp.Docx: convert DOCX to RTF, HTML, Markdown and plain text (.txt). Possible applications include generating Open XML documents in C# and exporting for other editors/services, or loading Microsoft Word documents in a RichTextBox / RichEditBox control.
  • DocSharp.Markdown: convert Markdown to DOCX or RTF using custom Markdig renderers.

Packages can be installed via NuGet:
NuGet NuGet NuGet NuGet NuGet

The optional extra packages DocSharp.ImageSharp, DocSharp.SystemDrawing, DocSharp.MagickNET allow to convert unsupported images (e.g. GIF / TIFF for DOCX -> RTF or WMF / EMF / TIFF for DOCX -> Markdown/HTML). Each of these has pros and cons, the choice depends on your requirements. More information can be found in the Wiki.

The codebase also contains few experimental converters that are not ready and not published on NuGet yet:

  • DocSharp.Renderer: provides basic DOCX and XLSX to PDF/images/SVG/XPS conversion using QuestPDF.
  • DocSharp.Epub: provides basic EPUB to DOCX (via HTML) conversion.
  • RTF to DOCX converter class in the DocSharp.Docx project

There is no common DOM to manipulate or generate documents, this library is mainly for conversion. Some helper methods on top of the Open XML SDK and format-specific writers are available, but they are mostly intended for internal use; however they could be extended/improved in the future.
You can consider using the Open XML SDK itself or other recommended libraries for documents creation and manipulation. Some of these are used in the sample app to test two-steps conversions, compare results, or generate documents in multiple formats with the same code.
DocSharp provides methods to accept/return a WordprocessingDocument directly (in addition to file path / Stream / byte array), and a SaveTo extension method for WordprocessingDocument.

Supported features

  • Binary formats: most doc/xls/ppt features were supported by the original project, but exceptions occurred when using .NET (rather than .NET Framework) or loading specific documents. The most noticeable issues have been fixed, but more work is needed to make the library reliable; if you find other bugs, you are welcome to open an issue (please attach a sample file if the issue only occurs for specific documents).
  • DOCX, RTF, Markdown: supported elements vary depending on input and output formats, see Supported features for an overview.

Requirements

  • Supported targets are .NET 8, 9, 10 and .NET Framework 4.6.2 (minimum netfx version still supported).
  • DocSharp.SystemDrawing is for Windows only (.NET Framework or net*-windows), as System.Drawing.Common is based on GDI+ and only supported on Windows since .NET 6.
  • DocSharp.ImageSharp is cross-platform for .NET 8+, as ImageSharp is fully managed C# code but does not support .NET Framework.
  • DocSharp.MagickNET is cross-platform for both .NET and .NET Framework, but Magick.NET bundles many native libraries that might not work on non-desktop platforms (Android / iOS /
  • DocSharp.Renderer depends on QuestPDF, which currently supports Windows x64 / x86, macOS x64 / ARM64, Linux x64 / ARM64. Windows ARM64, Android, iOS are not supported yet, due to a custom Skia build. Plus, the XPS generation is only supported on Windows.

Usage

You can refer to the project Wiki or sample app.

Roadmap

  • Finish and publish experimental converters
  • Support more elements and attributes, and fix issues on edge cases
  • Reduce code duplication, cleanup
  • Async functions/progress callback (some tasks such as downloading images referenced in Markdown may take some time)
  • Improve support for right-to-left and complex script languages
  • Make converters thread-safe

Credits

Dependencies:

Forked:

Others:

Other recommended libraries (some of these are used in the sample app, not dependencies when installing packages):

License

DocSharp is licensed under MIT license and can be used for both open source and commercial projects.

DocSharp.ImageSharp and DocSharp.MagickNET are licensed under Apache 2.0 license.
ImageSharp has a dual license, please visit their repository for more information. VectSharp is used under LGPL in this project (GPL packages are not used).

DocSharp.Renderer is itself licensed under MIT, but depends on QuestPDF which has additional requirements for companies and may require purchasing a license. Please check their repository for information on the Community and Commercial licenses.

Contribute

  • If you know how to fix a bug, feel free to open a Pull Request.
  • To implement a new feature, please open an issue or discussion to propose it before working on the pull request.
  • If you find the library useful, adding a star is highly appreciated. Stars are a way to guide other developers towards helpful libraries and tools.
  • This is a hobby project. You are welcome to donate to financially support its further development, if you wish (sponsor links for GitHub, LiberaPay, Ko-Fi, BuyMeACoffee and Thanks.dev are available in the repo page).

About

Pure C# library to convert between document formats (Office 97-2003, Open XML, RTF, Markdown)

Topics

Resources

License

MIT, Apache-2.0 licenses found

Licenses found

MIT
LICENSE
Apache-2.0
LICENSE-Apache

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Contributors 5