How to Customize the HTML Output Generated by export_to_html()? #2657
Replies: 2 comments
-
|
Hi @Aneesabegumj! I'm Dosu and I’m helping the docling team. Docling’s export_to_html() generates HTML using a serializer that builds the structure (DOCTYPE, html, head, body) and applies default CSS for either a single-column or split-page layout. You can customize the output in several ways:
Example for custom CSS: from docling_core.types.doc.document import ContentLayer
from docling_core.transforms.serializer.html import HTMLParams
custom_css = """
<style>
body { background: #222; color: #eee; }
/* your custom styles here */
</style>
"""
params = HTMLParams(
css_styles=custom_css,
layers={ContentLayer.BODY, ContentLayer.FURNITURE},
output_style="single_column"
)
html = result.document.export_to_html(
included_content_layers=params.layers,
split_page_view=(params.output_style == "split_page"),
html_head=None, # or your own head HTML
# ...other params as needed
)There’s no built-in support for custom HTML templates or arbitrary layout changes beyond these parameters. For advanced customization (like changing the HTML structure), you’ll need to post-process the generated HTML [source]. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
|
Docling’s HTML output is mostly auto-generated from its internal document model, so the structure and styling are not meant to be highly customizable by default. Here’s how it works:
So yes, you can customize the output, but it requires manual post-processing or building a small custom HTML generator around result.document. Hope this helps😊 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Team,
I am converting Scanned/Readable PDFs to HTML using Docling. Here is the relevant part of my code:
pdf_opts = PdfFormatOption(
ocr=True,
dpi=300,
ocr_engine="rapidocr"
)
converter = DocumentConverter(
allowed_formats=[InputFormat.PDF],
format_options={InputFormat.PDF: pdf_opts}
)
result = converter.convert(pdf_path)
html_text = result.document.export_to_html()
The HTML output works, but it uses a default style/layout.
I want to understand:
Is there any built-in support for customizing the HTML output?
Or should I post-process the generated HTML manually?
If you need any further details from my side, please feel free to contact me at: aneesa.j@fintags.in
Beta Was this translation helpful? Give feedback.
All reactions