Skip to content

Commit 3fc61d4

Browse files
authored
Merge pull request #10 from sfallah/sf/deepseek-ocr-test-script
python test script for deepseek-ocr testing OCR on text-1.jpeg newspaper image checking against expected reference model output for Free-OCR and Markdown
2 parents 6c36c03 + dc2066e commit 3fc61d4

File tree

4 files changed

+318
-0
lines changed

4 files changed

+318
-0
lines changed
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
<|ref|>title<|/ref|><|det|>[[61, 255, 907, 533]]<|/det|>
2+
# MEN WALK ON MOON
3+
ASTRONAUTS LAND ON PLAIN;
4+
COLLECT ROCKS, PLANT FLAG
5+
6+
<|ref|>text<|/ref|><|det|>[[56, 559, 268, 629]]<|/det|>
7+
Voice From Moon:
8+
Eagle Has Landed'
9+
10+
<|ref|>text<|/ref|><|det|>[[74, 645, 262, 675]]<|/det|>
11+
EAGLE (the lunar surface, Houston, Truesquily)
12+
Base here, The Eagle has landed.
13+
14+
<|ref|>text<|/ref|><|det|>[[74, 675, 262, 720]]<|/det|>
15+
BOOTHROOM: Lounge, Truesquily, we enjoy you on the ground. You've got a bunch of guys about to toss bikes. We're breaking again. Thanks a lot.
16+
17+
<|ref|>text<|/ref|><|det|>[[74, 720, 262, 750]]<|/det|>
18+
TRAVELLING MADE: Time you. BOOTHROOM: You're looking good here.
19+
20+
<|ref|>text<|/ref|><|det|>[[74, 750, 262, 780]]<|/det|>
21+
TRAVELLING MADE: A very smooth touchdown. BEDROOM: Eagle, you are very far. I'll. (The first sign in the lunar appearance) (Over.)
22+
23+
<|ref|>text<|/ref|><|det|>[[74, 780, 262, 810]]<|/det|>
24+
TRAVELLING MADE: Eagle, stay for I'll. BOOTHROOM: Bumper and we are you waiting the cue.
25+
26+
<|ref|>text<|/ref|><|det|>[[74, 810, 262, 830]]<|/det|>
27+
TRAVELLING MADE: Eagle, and service mobility.
28+
29+
<|ref|>text<|/ref|><|det|>[[74, 830, 262, 850]]<|/det|>
30+
How do you read me?
31+
32+
<|ref|>text<|/ref|><|det|>[[74, 850, 262, 880]]<|/det|>
33+
TRAVELLING COLUMBIA, he has landed Truesquily. Base, Eagle is at Truesquily. I read you first by. Over.
34+
35+
<|ref|>text<|/ref|><|det|>[[74, 880, 262, 900]]<|/det|>
36+
COLUMBIA: Yes, I heard the whole thing.
37+
38+
<|ref|>text<|/ref|><|det|>[[74, 900, 262, 920]]<|/det|>
39+
BOOTHROOM: Well, it's a good show.
40+
41+
<|ref|>text<|/ref|><|det|>[[74, 920, 262, 940]]<|/det|>
42+
COLUMBIA: Fantastic.
43+
44+
<|ref|>text<|/ref|><|det|>[[74, 940, 262, 960]]<|/det|>
45+
TRAVELLING MADE: I'll read that.
46+
47+
<|ref|>text<|/ref|><|det|>[[74, 960, 262, 980]]<|/det|>
48+
APOLLO CONTROL: The most major sky to sky will be for the 23 event, that is at 21 minutes 26 sec-
49+
50+
<|ref|>text<|/ref|><|det|>[[74, 980, 262, 990]]<|/det|>
51+
tion of lunar descent.
52+
53+
<|ref|>image<|/ref|><|det|>[[270, 545, 697, 990]]<|/det|>
54+
55+
56+
<|ref|>text<|/ref|><|det|>[[715, 559, 911, 629]]<|/det|>
57+
A Powdery Surface
58+
Is Closely Explored
59+
60+
<|ref|>text<|/ref|><|det|>[[733, 645, 851, 665]]<|/det|>
61+
BY JOHN NOBLE WILFORD
62+
63+
<|ref|>text<|/ref|><|det|>[[715, 669, 911, 700]]<|/det|>
64+
HOUSTON, Monday, July 21—New hires landed and walked on the moon.
65+
66+
<|ref|>text<|/ref|><|det|>[[715, 700, 911, 750]]<|/det|>
67+
Two Americans, astronauts of Apollo 11, steered their Eagle-shaped lunar module safely and smoothly to the lunar landing yesterday at 4:17:40 P.M., Eastern day-light time.
68+
69+
<|ref|>text<|/ref|><|det|>[[715, 750, 911, 780]]<|/det|>
70+
Neil A. Armstrong, the 38-year-old civilian commander, radioed to earth and the landing team here.
71+
72+
<|ref|>text<|/ref|><|det|>[[715, 780, 911, 830]]<|/det|>
73+
"Boom, Truesquily! Base here. The Eagle has landed," the first man to reach the moon—Neil Armstrong and his engineer, Capt. Charles E. Alder, of the Jet Propulsion Laboratory, the space agency's rocket and space program manager.
74+
75+
<|ref|>text<|/ref|><|det|>[[715, 830, 911, 880]]<|/det|>
76+
About six and a half hours later, Mr. Armstrong opened the landing craft's hatch, stepped slowly down the ladder and descended as he pointed his first landing footguard on the lunar crater.
77+
78+
<|ref|>text<|/ref|><|det|>[[715, 880, 911, 920]]<|/det|>
79+
"That's one small step for man, one giant leap for mankind."
80+
81+
<|ref|>text<|/ref|><|det|>[[715, 920, 911, 960]]<|/det|>
82+
His first step on the moon came on 10:56:29 P.M., as a television camera recorded the craft's transmitted his every word to an aerial and excited audiences of hundreds of millions of people on earth.
83+
84+
<|ref|>text<|/ref|><|det|>[[749, 960, 861, 974]]<|/det|>
85+
Testable Slope Test Soil
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
MEN WALK ON MOON
2+
ASTRONAUTS LAND ON PLAIN;
3+
COLLECT ROCKS, PLANT FLAG
4+
5+
Voice From Moon:
6+
'Eagle Has Landed'
7+
8+
A Powder Surface
9+
Is Closely Explored
10+
11+
By JOHN NOBLE WILFORD
12+
NOVEMBER, Monday, July 21—New York Herald and
13+
wished on the moon.
14+
15+
Two American astronauts of Apollo 11, steered their
16+
frigate Eagle toward the moon's surface and smoothly to
17+
the lunar landing yesterday at 4:17:40 P.M., Eastern day-
18+
light time.
19+
20+
Neil A. Armstrong, the 38-year-old civilian commander,
21+
landed on the soft sand of the moon's surface here.
22+
23+
"Beautiful, Triumph!" he said. "The Eagle has landed."
24+
25+
The first man to reach the moon—Neil Armstrong and
26+
his co-pilot, Charles E. "Pete" Conrad, 26, of the Pentagon,
27+
brought their ship to rest on a level, rock-strewn plain near
28+
the moon's surface. The two men and two of the three
29+
astronauts on board, Armstrong, Conrad and Edwin E.
30+
Aldrin, 38, of Houston, stepped slowly down the ladder
31+
and descended as he pointed his first full-flaming footpad
32+
at the lunar crater.
33+
34+
"That's one small step for man, one giant leap for
35+
mankind."
36+
37+
His first step on the moon came at 10:56:20 P.M., as
38+
a television camera rolled the earth's thousandth line every
39+
second to an aerial and studied audiences of hundreds of
40+
millions of people on earth.
41+
42+
Textile Slope Test Soil
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Test script to compare llama.cpp mtmd-cli output with HuggingFace reference implementation
4+
for DeepSeek-OCR model using embedding similarity.
5+
"""
6+
7+
import argparse
8+
import subprocess
9+
import sys
10+
from pathlib import Path
11+
12+
from sentence_transformers import SentenceTransformer
13+
from sentence_transformers import util
14+
15+
16+
def run_mtmd_deepseek_ocr(
17+
model_path: str,
18+
mmproj_path: str,
19+
image_path: str,
20+
bin_path: str,
21+
prompt: str = "Free OCR."
22+
) -> str:
23+
"""
24+
Run inference using llama.cpp mtmd-cli.
25+
"""
26+
cmd = [
27+
bin_path,
28+
"-m", model_path,
29+
"--mmproj", mmproj_path,
30+
"--image", image_path,
31+
# "-p", "<|grounding|>Convert the document to markdown.",
32+
"-p", prompt,
33+
"--chat-template", "deepseek-ocr",
34+
"--temp", "0",
35+
"-n", "1024",
36+
# "--verbose"
37+
]
38+
39+
print(f"Running llama.cpp command: {' '.join(cmd)}")
40+
41+
result = subprocess.run(
42+
cmd,
43+
capture_output=True,
44+
text=False,
45+
timeout=300
46+
)
47+
48+
if result.returncode != 0:
49+
stderr = result.stderr.decode('utf-8', errors='replace')
50+
print(f"llama.cpp stderr: {stderr}")
51+
raise RuntimeError(f"llama-mtmd-cli failed with code {result.returncode}")
52+
53+
output = result.stdout.decode('utf-8', errors='replace').strip()
54+
print(f"llama.cpp output length: {len(output)} chars")
55+
return output
56+
57+
58+
def compute_embedding_similarity(text1: str, text2: str, model_name: str) -> float:
59+
"""
60+
Compute cosine similarity between two texts using embedding model.
61+
"""
62+
print(f"Loading embedding model: {model_name}")
63+
64+
# Use sentence-transformers for easier embedding extraction
65+
embed_model = SentenceTransformer(model_name)
66+
67+
print("Computing embeddings...")
68+
embeddings = embed_model.encode([text1, text2], convert_to_numpy=True)
69+
70+
similarity = util.similarity.cos_sim([embeddings[0]], [embeddings[1]])[0][0]
71+
return float(similarity)
72+
73+
74+
def read_expected_output(file_path: str) -> str:
75+
"""
76+
Read expected OCR output from file.
77+
"""
78+
cur_path = Path(__file__).parent
79+
expected_path = str(cur_path / file_path)
80+
with open(expected_path, "r", encoding="utf-8") as f:
81+
return f.read().strip()
82+
83+
84+
def main():
85+
ap = argparse.ArgumentParser(description="Compare llama.cpp and HuggingFace DeepSeek-OCR outputs")
86+
ap.add_argument("--llama-model", default="gguf_models/deepseek-ai/deepseek-ocr-f16.gguf",
87+
help="Path to llama.cpp GGUF model")
88+
ap.add_argument("--mmproj", default="gguf_models/deepseek-ai/mmproj-deepseek-ocr-f16.gguf",
89+
help="Path to mmproj GGUF file")
90+
ap.add_argument("--image", default="test-1.jpeg",
91+
help="Path to test image")
92+
ap.add_argument("--llama-bin", default="build/bin/llama-mtmd-cli",
93+
help="Path to llama-mtmd-cli binary")
94+
ap.add_argument("--embedding-model", default="Qwen/Qwen3-Embedding-0.6B",
95+
help="Embedding model for similarity computation")
96+
ap.add_argument("--threshold", type=float, default=0.7,
97+
help="Minimum similarity threshold for pass")
98+
args = ap.parse_args()
99+
100+
# Validate paths
101+
# script directory + image
102+
mtmd_dir = Path(__file__).parent.parent
103+
args.image = str(mtmd_dir / args.image)
104+
# project directory + llama model
105+
args.llama_model = str(mtmd_dir.parent.parent / args.llama_model)
106+
# project directory + mmproj
107+
args.mmproj = str(mtmd_dir.parent.parent / args.mmproj)
108+
args.llama_bin = str(mtmd_dir.parent.parent / args.llama_bin)
109+
if not Path(args.image).exists():
110+
print(f"Error: Image not found: {args.image}")
111+
sys.exit(1)
112+
if not Path(args.llama_model).exists():
113+
print(f"Error: Model not found: {args.llama_model}")
114+
sys.exit(1)
115+
if not Path(args.mmproj).exists():
116+
print(f"Error: mmproj not found: {args.mmproj}")
117+
sys.exit(1)
118+
119+
print("=" * 60)
120+
print("DeepSeek-OCR: llama.cpp vs HuggingFace Comparison")
121+
print("=" * 60)
122+
123+
# Default paths based on your command
124+
125+
# Run llama.cpp inference
126+
print("\n[2/3] Running llama.cpp implementation...")
127+
llama_free_ocr = run_mtmd_deepseek_ocr(
128+
args.llama_model,
129+
args.mmproj,
130+
args.image,
131+
args.llama_bin
132+
)
133+
134+
llama_md_ocr = run_mtmd_deepseek_ocr(
135+
args.llama_model,
136+
args.mmproj,
137+
args.image,
138+
args.llama_bin,
139+
prompt="<|grounding|>Convert the document to markdown."
140+
)
141+
142+
expected_free_ocr = read_expected_output("test-1-extracted.txt")
143+
expected_md_ocr = read_expected_output("test-1-extracted.md")
144+
145+
# Compute similarity
146+
print("\n[3/3] Computing embedding similarity...")
147+
free_ocr_similarity = compute_embedding_similarity(
148+
expected_free_ocr,
149+
llama_free_ocr,
150+
args.embedding_model
151+
)
152+
153+
md_ocr_similarity = compute_embedding_similarity(
154+
expected_md_ocr,
155+
llama_md_ocr,
156+
args.embedding_model
157+
)
158+
159+
# Results
160+
print("\n" + "=" * 60)
161+
print("RESULTS")
162+
print("=" * 60)
163+
print(f"\nReference Model output:\n{'-' * 40}")
164+
print(expected_free_ocr)
165+
print(f"\nDeepSeek-OCR output:\n{'-' * 40}")
166+
print(llama_free_ocr)
167+
print(f"\n{'=' * 60}")
168+
print(f"Cosine Similarity: {free_ocr_similarity:.4f}")
169+
print(f"Threshold: {args.threshold}")
170+
print(f"Result: {'PASS' if free_ocr_similarity >= args.threshold else 'FAIL'}")
171+
print("=" * 60)
172+
173+
# Markdown OCR results
174+
print(f"\nReference Model Markdown output:\n{'-' * 40}")
175+
print(expected_md_ocr)
176+
print(f"\nDeepSeek-OCR Markdown output:\n{'-' * 40}")
177+
print(llama_md_ocr)
178+
print(f"\n{'=' * 60}")
179+
print(f"Cosine Similarity (Markdown): {md_ocr_similarity:.4f}")
180+
print(f"Threshold: {args.threshold}")
181+
print(f"Result: {'PASS' if md_ocr_similarity >= args.threshold else 'FAIL'}")
182+
print("=" * 60)
183+
184+
185+
186+
187+
if __name__ == "__main__":
188+
main()
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
sentence-transformers
2+
transformers
3+
tokenizers

0 commit comments

Comments
 (0)