Skip to content

Commit afdd4c9

Browse files
committed
w
1 parent 57d2276 commit afdd4c9

File tree

9 files changed

+1121
-23
lines changed

9 files changed

+1121
-23
lines changed

.claude/commands/mathml-general-exam.md

Lines changed: 66 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -15,35 +15,29 @@ You are going to OCR a general exam PDF file to accessible HTML with MathML usin
1515
## Workflow
1616

1717
### Step 1: Upload PDF to Mathpix
18-
Use the Mathpix v3/pdf API endpoint to upload the PDF:
18+
Use the Mathpix v3/pdf API endpoint to upload the PDF (SINGLE LINE - no backslashes):
1919
```bash
20-
curl -X POST "https://api.mathpix.com/v3/pdf" \
21-
-H "app_id: $MATHPIX_APP_ID" \
22-
-H "app_key: $MATHPIX_API_KEY" \
23-
-F "file=@<PDF_PATH>" \
24-
-F 'options_json={"conversion_formats": {"html.zip": true, "tex.zip": true}}'
20+
curl -X POST "https://api.mathpix.com/v3/pdf" -H "app_id: $MATHPIX_APP_ID" -H "app_key: $MATHPIX_API_KEY" -F "file=@<PDF_PATH>" -F 'options_json={"conversion_formats": {"html.zip": true, "tex.zip": true}}'
2521
```
2622

2723
Extract the `pdf_id` from the response.
2824

2925
### Step 2: Check conversion status
30-
Poll the status endpoint until conversion is complete:
26+
Poll the status endpoint until conversion is complete (SINGLE LINE - no backslashes):
3127
```bash
32-
curl -X GET "https://api.mathpix.com/v3/pdf/<PDF_ID>" \
33-
-H "app_id: $MATHPIX_APP_ID" \
34-
-H "app_key: $MATHPIX_API_KEY"
28+
curl -X GET "https://api.mathpix.com/v3/pdf/<PDF_ID>" -H "app_id: $MATHPIX_APP_ID" -H "app_key: $MATHPIX_API_KEY"
3529
```
3630

3731
Wait until `"status":"completed"`.
3832

3933
### Step 3: Download and Extract TeX Format
40-
Download the tex.zip format (NOT .html, as it uses SVG):
34+
Download the tex.zip format (NOT .html, as it uses SVG) - SINGLE LINE for curl, then unzip:
4135
```bash
42-
curl -X GET "https://api.mathpix.com/v3/pdf/<PDF_ID>.tex.zip" \
43-
-H "app_id: $MATHPIX_APP_ID" \
44-
-H "app_key: $MATHPIX_API_KEY" \
45-
-o /tmp/output.tex.zip
36+
curl -X GET "https://api.mathpix.com/v3/pdf/<PDF_ID>.tex.zip" -H "app_id: $MATHPIX_APP_ID" -H "app_key: $MATHPIX_API_KEY" -o /tmp/output.tex.zip
37+
```
4638

39+
Then extract:
40+
```bash
4741
cd /tmp && unzip -o output.tex.zip
4842
```
4943

@@ -102,8 +96,60 @@ Use this entity mapping:
10296
- Greek letters: α (&alpha;), β (&beta;), γ (&gamma;), δ (&delta;), ε (&epsilon;), η (&eta;), θ (&theta;), λ (&lambda;), μ (&mu;), ν (&nu;), π (&pi;), σ (&sigma;), τ (&tau;), φ (&phi;), ω (&omega;), Γ (&Gamma;), Δ (&Delta;), Θ (&Theta;), Λ (&Lambda;), Σ (&Sigma;), Φ (&Phi;), Ω (&Omega;)
10397
- Other: ∞ (&infin;), × (&times;), ⋅ (&sdot;), ± (&plusmn;), ∠ (&ang;), ⊕ (&oplus;), ⊗ (&otimes;)
10498

105-
### Step 6: Add H2 Problem Headings
106-
**CRITICAL**: After post-processing, you MUST manually add H2 headings for each problem.
99+
### Step 6: Handle Images (Diagrams, Figures)
100+
**IMPORTANT**: Many exams contain diagrams (commutative diagrams, geometric figures, knot diagrams, etc.) that are extracted by Mathpix.
101+
102+
1. **Check for extracted images**:
103+
```bash
104+
ls -la /tmp/<PDF_ID>/images/
105+
```
106+
107+
2. **If images exist**:
108+
- Create the images directory if it doesn't exist:
109+
```bash
110+
mkdir -p <EXAM_DIR>/images
111+
```
112+
113+
- Copy ALL image files to the exam images directory:
114+
```bash
115+
cp /tmp/<PDF_ID>/images/*.jpg <EXAM_DIR>/images/
116+
```
117+
118+
- **Update ALL image paths in the HTML**:
119+
- Find all `<img src="...">` tags in the HTML
120+
- Change from `<img src="FILENAME"` to `<img src="images/FILENAME.jpg"`
121+
- Add proper alt text describing what the diagram shows
122+
- Add styling for responsive images:
123+
```html
124+
<img src="images/FILENAME.jpg" alt="Descriptive alt text here" style="max-width: 100%; height: auto; display: block; margin: 1em auto;" />
125+
```
126+
127+
3. **Common exam diagrams to look for**:
128+
- Commutative diagrams (arrows between mathematical objects)
129+
- Pushout/pullback squares
130+
- Geometric figures (M\u00f6bius bands, knots, surfaces)
131+
- Graphs and plots
132+
- Function diagrams
133+
134+
4. **Alt text guidelines**:
135+
- Be descriptive but concise
136+
- Examples:
137+
- "Commutative diagram showing maps between groups A1, A2, B1, B2, and C"
138+
- "Trefoil knot diagram"
139+
- "Möbius band diagram showing the curve γ as its boundary"
140+
- "Pushout diagram showing the construction of Xf"
141+
142+
**Example transformation:**
143+
```html
144+
<!-- Before: Broken image path -->
145+
<img src="2025_11_13_abc123-1" alt="image" />
146+
147+
<!-- After: Fixed path with descriptive alt text -->
148+
<img src="images/2025_11_13_abc123-1.jpg" alt="Commutative diagram showing the exact sequence" style="max-width: 100%; height: auto; display: block; margin: 1em auto;" />
149+
```
150+
151+
### Step 7: Add H2 Problem Headings
152+
**CRITICAL**: After handling images, you MUST manually add H2 headings for each problem.
107153

108154
1. Read the processed HTML file
109155
2. Identify each problem in the exam (usually numbered 1, 2, 3, etc.)
@@ -133,10 +179,10 @@ Use this entity mapping:
133179
- Use the pattern: `<h2 class="unnumbered" id="problem-N">Problem N</h2>`
134180
- The ID should match the problem number for anchor linking
135181

136-
### Step 7: Save to final location
182+
### Step 8: Save to final location
137183
Save the processed HTML file next to the original PDF with the same name but .html extension.
138184

139-
### Step 8: Add accessible HTML link to the generals page
185+
### Step 9: Add accessible HTML link to the generals page
140186
After saving the HTML file, you MUST update the link in `graduate/general_exams.md` to follow accessibility best practices:
141187

142188
1. Read the file `graduate/general_exams.md`
@@ -165,7 +211,7 @@ After saving the HTML file, you MUST update the link in `graduate/general_exams.
165211
- Clearly labeling the PDF as "for printing" to indicate its purpose
166212
- Using ARIA labels to communicate that PDFs may have accessibility limitations
167213

168-
### Step 9: Final Review - Read Both Files
214+
### Step 10: Final Review - Read Both Files
169215
After completing all processing steps, you MUST read both the original PDF and the generated HTML file to provide a final quality assessment:
170216

171217
```bash

0 commit comments

Comments
 (0)