-
-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
Hi @joshy,
I have been using your library to convert a bunch of RTF documents to plain text, before doing further processing to segment the content. The document has a bunch of numbered headers which we use to determine when a section ends and another starts. However, I noticed that sometimes the numbering is missing after using the rtf_to_text function. For example using a snippet
{\\listtext\\pard\\plain\\rtlch\\af3\\afs20\\alang18441\\ab\\ltrch\\f3\\fs20\\lang18441\\langnp18441\\langfe18441\\langfenp18441\\b 10.\\tab}\\pard\\ltrpar\\s19\\itap0\\widctlpar\\qj\\fi-720\\li720\\ri43\\lin720\\rin43\\tx720\\tx1440\\tx2160\\tx2880\\tx3600\\tx4320\\tx5040\\tx5760\\tx6480\\tx7200\\tx7920\\tx8640\\tx9360\\tx10080\\ls23\\ilvl0\\plain\\rtlch\\af3\\afs20\\alang18441\\ab\\ltrch\\f3\\fs20\\lang18441\\langnp18441\\langfe18441\\langfenp18441\\b Trade and other receivables\\tab\\tab\\par\\trowd\\irow0\\irowband0\\trgaph108it returns just Trade and other receivables, but it should instead be 10. Trade and other receivables.
Not sure the root cause, but I think it is due to the \listtext. If I simply remove it from the rtf string, the numbering is retained in the converted plain text.
Could you advice what should be the correct behaviour?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels