Skip to content

Feature request for Open XML SDK - Exclude unsupported XML characters during file generation #1532

@tauheedul

Description

@tauheedul

I have encountered an scenario where a user has entered data into the database which included control characters (most likely from a 3rd party system).

Subequently when attempting to export the data to an OpenXML spreadsheet directly from the database e.g. creating a spreadsheet using a DataTable with no pre-processing, the Open XML plugin fails because of the invalid hex characters detected.

Suggested solution
Upon saving a datatable and while parsing each column value, detect Invalid Xml Characters and exclude them from the output file or transform them to an XML equivalent character where possible.

Something like this: https://stackoverflow.com/questions/8170739/dealing-with-invalid-xml-hexadecimal-characters

The workaround
The workaround for us is to find every field at source and exclude invalid characters being added to the database, but this isn't feasable for a large system.

The other solution is to manually map the data table and exclude special characters at that point, but that adds additional overhead for large datasets because the table will be iterated again when OpenXML opens the dataset.

Example error
image

image

  • Exception message

'�', hexadecimal value 0x02, is an invalid character.

  • Exception stacktrace

    at System.Xml.XmlUtf8RawTextWriter.InvalidXmlChar(Int32 ch, Byte* pDst, Boolean entitize)
    at System.Xml.XmlUtf8RawTextWriter.WriteElementTextBlock(Char* pSrc, Char* pSrcEnd)
    at System.Xml.XmlUtf8RawTextWriter.WriteString(String text)
    at System.Xml.XmlWellFormedWriter.WriteString(String text)
    at DocumentFormat.OpenXml.OpenXmlLeafTextElement.WriteContentTo(XmlWriter w)
    at DocumentFormat.OpenXml.OpenXmlElement.WriteTo(XmlWriter xmlWriter)
    at DocumentFormat.OpenXml.OpenXmlCompositeElement.WriteContentTo(XmlWriter w)
    at DocumentFormat.OpenXml.OpenXmlElement.WriteTo(XmlWriter xmlWriter)
    at DocumentFormat.OpenXml.OpenXmlCompositeElement.WriteContentTo(XmlWriter w)
    at DocumentFormat.OpenXml.OpenXmlElement.WriteTo(XmlWriter xmlWriter)
    at DocumentFormat.OpenXml.OpenXmlCompositeElement.WriteContentTo(XmlWriter w)
    at DocumentFormat.OpenXml.OpenXmlElement.WriteTo(XmlWriter xmlWriter)
    at DocumentFormat.OpenXml.OpenXmlCompositeElement.WriteContentTo(XmlWriter w)
    at DocumentFormat.OpenXml.OpenXmlPartRootElement.WriteTo(XmlWriter xmlWriter)
    at DocumentFormat.OpenXml.OpenXmlPartRootElement.Save(Stream stream)
    at DocumentFormat.OpenXml.OpenXmlPartRootElement.SaveToPart(OpenXmlPart openXmlPart)
    at DocumentFormat.OpenXml.OpenXmlPartRootElement.Save()
    at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.TrySavePartContent(OpenXmlPart part)
    at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.SavePartContents(Boolean save)
    at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.Dispose(Boolean disposing)
    at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.Dispose()
    at TestOpenXmlSpreadsheet.Helper.ExportToExcelDownload(DataSet dataSet) in C:\Users\user\Source\repos\TestOpenXmlSpreadsheet\TestOpenXmlSpreadsheet\Form1.cs:line 118
    at TestOpenXmlSpreadsheet.Form1.btnTest_Click(Object sender, EventArgs e) in C:\Users\user\Source\repos\TestOpenXmlSpreadsheet\TestOpenXmlSpreadsheet\Form1.cs:line 39
    at System.Windows.Forms.Control.OnClick(EventArgs e)
    at System.Windows.Forms.Button.OnClick(EventArgs e)
    at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
    at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
    at System.Windows.Forms.Control.WndProc(Message& m)
    at System.Windows.Forms.ButtonBase.WndProc(Message& m)
    at System.Windows.Forms.Button.WndProc(Message& m)
    at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
    at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, WM msg, IntPtr wparam, IntPtr lparam)

Reproducible test
My test file generation logic is derived from the following sample code
https://www.codeindotnet.com/export-dataset-to-excel-in-c-net-core-openxml/

A single column datatable has a record added with the invalid text input by the user.
This is passed into the OpenXML file creation logic and the failure occurs after the Workbook.Save() is invoked.

This example is produced on .NET 5 with 3.0.0 Beta 3
I have also reproduced this on .NET Framework 4.5 with 2.20.0

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions