You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you set the ``include_page_breaks`` kwarg to ``True``, the output will include page breaks. This is only supported for ``.pptx``, ``.html``, ``.pdf``,
88
88
``.png``, and ``.jpg``.
@@ -306,6 +306,41 @@ Examples:
306
306
elements = partition_email(text=text, include_headers=True)
307
307
308
308
309
+
``partition_epub``
310
+
---------------------
311
+
312
+
The ``partition_epub`` function processes e-books in EPUB3 format. The function
313
+
first converts the document to HTML using ``pandocs`` and then calls ``partition_html``.
314
+
You'll need `pandocs <https://pandoc.org/installing.html>`_ installed on your system
315
+
to use ``partition_epub``.
316
+
317
+
318
+
Examples:
319
+
320
+
.. code:: python
321
+
322
+
from unstructured.partition.epub import partition_epub
323
+
324
+
elements = partition_epub(filename="example-docs/winter-sports.epub")
325
+
326
+
327
+
``partition_md``
328
+
---------------------
329
+
330
+
The ``partition_md`` function provides the ability to parse markdown files. The
331
+
following workflow shows how to use ``partition_md``.
332
+
333
+
334
+
Examples:
335
+
336
+
.. code:: python
337
+
338
+
from unstructured.partition.md import partition_md
0 commit comments