Skip to content

Commit f30e6e6

Browse files
committed
Sample data and code for the article on MarkItDown
1 parent 16ad784 commit f30e6e6

File tree

12 files changed

+78
-0
lines changed

12 files changed

+78
-0
lines changed

python-markitdown/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Python MarkItDown: Convert Documents Into LLM-Ready Markdown
2+
3+
This folder provides the code examples for the Real Python tutorial [Python MarkItDown: Convert Documents Into LLM-Ready Markdown](https://realpython.com/python-markitdown/).
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
from pathlib import Path
2+
3+
from markitdown import MarkItDown
4+
5+
6+
def main(
7+
input_dir,
8+
output_dir="output",
9+
target_formats=(".docx", ".xlsx", ".pdf"),
10+
):
11+
input_path = Path(input_dir)
12+
output_path = Path(output_dir)
13+
output_path.mkdir(parents=True, exist_ok=True)
14+
15+
md = MarkItDown()
16+
17+
for file_path in input_path.glob("*"):
18+
if file_path.suffix in target_formats:
19+
try:
20+
result = md.convert(file_path)
21+
except Exception as e:
22+
print(f"✗ Error converting {file_path.name}: {e}")
23+
24+
output_file = output_path / f"{file_path.stem}.md"
25+
output_file.write_text(result.markdown, encoding="utf-8")
26+
print(f"✓ Converted {file_path.name}{output_file.name}")
27+
28+
29+
if __name__ == "__main__":
30+
main("data", "output")

python-markitdown/convert_files.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
from markitdown import MarkItDown
2+
3+
md = MarkItDown()
4+
result = md.convert("./data/sample_DOCX.docx")
5+
print(result)

python-markitdown/data/pep8.docx

33.5 KB
Binary file not shown.
81 KB
Loading
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
First Name,Last Name,Department,Position,Start Date
2+
Alice,Johnson,Marketing,Marketing Coordinator,1/15/2022
3+
Bob,Williams,Human Resources,HR Generalist,6/1/2021
4+
Carol,Davis,Engineering,Software Engineer,3/20/2023
5+
David,Brown,Sales,Sales Representative,9/10/2022
6+
Eve,Miller,Finance,Financial Analyst,11/5/2021
7+
Frank,Garcia,Customer Service,Customer Support Specialist,7/1/2023
8+
Grace,Rodriguez,Research & Development,Research Scientist,4/25/2022
9+
Henry,Martinez,Operations,Operations Manager,2/14/2021
12.1 KB
Binary file not shown.
50.4 KB
Binary file not shown.
7.93 KB
Binary file not shown.
397 KB
Loading

0 commit comments

Comments
 (0)