Skip to content

Commit abb48e2

Browse files
committed
feat: improve LLM prompt with two-pass extraction and point counting
1 parent bb762a2 commit abb48e2

File tree

1 file changed

+23
-21
lines changed

1 file changed

+23
-21
lines changed

chart2csv/core/llm_extraction.py

Lines changed: 23 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -64,38 +64,40 @@ def extract_chart_llm(
6464
# Create Mistral client
6565
client = Mistral(api_key=api_key)
6666

67-
# Craft extraction prompt - chain of thought for precision
68-
prompt = """You are extracting data from a chart. Be EXTREMELY precise.
69-
70-
TASK: Extract the X,Y coordinates of EVERY data point marker in this chart.
67+
# Two-pass extraction for better accuracy on dense charts
68+
# Pass 1: Analyze and describe what you see
69+
# Pass 2: Extract data points one by one
70+
71+
prompt = """You are a precise chart data extraction AI.
7172
72-
STEP 1 - ANALYZE AXES:
73-
First, identify the axis ranges by reading the tick labels.
73+
TASK: Extract ALL data points from this chart with maximum precision.
7474
75-
STEP 2 - LOCATE MARKERS:
76-
For line charts: find every dot/marker on the line (not the line itself, the markers).
77-
For scatter plots: find every dot.
78-
For bar charts: measure the height of each bar.
75+
ANALYSIS PHASE - Before extracting, observe:
76+
1. What type of chart is this? (line/scatter/bar)
77+
2. X-axis: What is the range? What are the gridlines?
78+
3. Y-axis: What is the range? What are the gridlines?
79+
4. How many data points/markers are visible? Count them carefully.
7980
80-
STEP 3 - READ VALUES:
81-
For EACH marker, look at its position and read:
82-
- X: What X gridline or tick is it at or between?
83-
- Y: What Y gridline is the marker at? If between gridlines, estimate precisely.
81+
EXTRACTION PHASE - For EACH visible marker:
82+
- Look at its horizontal position → determine X value
83+
- Look at its vertical position → determine Y value
84+
- Do NOT smooth or interpolate - real data is often irregular
8485
85-
CRITICAL: Do NOT interpolate or assume patterns. Each point may have a UNIQUE value.
86-
Many charts have irregular data - do not assume smooth curves.
86+
IMPORTANT FOR LINE CHARTS:
87+
- Count the actual markers/dots on the line, not just the line endpoints
88+
- Each marker may have a DIFFERENT Y value - do not assume a pattern
89+
- If markers are dense (close together), take extra care to read each one
8790
88-
Return JSON only:
91+
Output ONLY valid JSON:
8992
{
9093
"chart_type": "line" or "scatter" or "bar",
9194
"x_label": "axis label",
9295
"y_label": "axis label",
93-
"data": [{"x": val, "y": val}, ...]
96+
"point_count": number of data points you counted,
97+
"data": [{"x": value, "y": value}, ...]
9498
}
9599
96-
Example for irregular data:
97-
{"data": [{"x": 0, "y": 5}, {"x": 1, "y": 8}, {"x": 2, "y": 12}, {"x": 3, "y": 15}]}
98-
Note: each Y is different and not following a pattern."""
100+
VERIFICATION: Your data array length should match point_count."""
99101

100102
try:
101103
# Direct extraction with pixtral (OCR doesn't work for charts)

0 commit comments

Comments
 (0)