@@ -64,38 +64,40 @@ def extract_chart_llm(
6464 # Create Mistral client
6565 client = Mistral (api_key = api_key )
6666
67- # Craft extraction prompt - chain of thought for precision
68- prompt = """You are extracting data from a chart. Be EXTREMELY precise.
69-
70- TASK: Extract the X,Y coordinates of EVERY data point marker in this chart.
67+ # Two-pass extraction for better accuracy on dense charts
68+ # Pass 1: Analyze and describe what you see
69+ # Pass 2: Extract data points one by one
70+
71+ prompt = """You are a precise chart data extraction AI.
7172
72- STEP 1 - ANALYZE AXES:
73- First, identify the axis ranges by reading the tick labels.
73+ TASK: Extract ALL data points from this chart with maximum precision.
7474
75- STEP 2 - LOCATE MARKERS:
76- For line charts: find every dot/marker on the line (not the line itself, the markers).
77- For scatter plots: find every dot.
78- For bar charts: measure the height of each bar.
75+ ANALYSIS PHASE - Before extracting, observe:
76+ 1. What type of chart is this? (line/scatter/bar)
77+ 2. X-axis: What is the range? What are the gridlines?
78+ 3. Y-axis: What is the range? What are the gridlines?
79+ 4. How many data points/markers are visible? Count them carefully.
7980
80- STEP 3 - READ VALUES :
81- For EACH marker, look at its position and read:
82- - X: What X gridline or tick is it at or between?
83- - Y: What Y gridline is the marker at? If between gridlines, estimate precisely.
81+ EXTRACTION PHASE - For EACH visible marker :
82+ - Look at its horizontal position → determine X value
83+ - Look at its vertical position → determine Y value
84+ - Do NOT smooth or interpolate - real data is often irregular
8485
85- CRITICAL: Do NOT interpolate or assume patterns. Each point may have a UNIQUE value.
86- Many charts have irregular data - do not assume smooth curves.
86+ IMPORTANT FOR LINE CHARTS:
87+ - Count the actual markers/dots on the line, not just the line endpoints
88+ - Each marker may have a DIFFERENT Y value - do not assume a pattern
89+ - If markers are dense (close together), take extra care to read each one
8790
88- Return JSON only :
91+ Output ONLY valid JSON :
8992{
9093 "chart_type": "line" or "scatter" or "bar",
9194 "x_label": "axis label",
9295 "y_label": "axis label",
93- "data": [{"x": val, "y": val}, ...]
96+ "point_count": number of data points you counted,
97+ "data": [{"x": value, "y": value}, ...]
9498}
9599
96- Example for irregular data:
97- {"data": [{"x": 0, "y": 5}, {"x": 1, "y": 8}, {"x": 2, "y": 12}, {"x": 3, "y": 15}]}
98- Note: each Y is different and not following a pattern."""
100+ VERIFICATION: Your data array length should match point_count."""
99101
100102 try :
101103 # Direct extraction with pixtral (OCR doesn't work for charts)
0 commit comments