You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/index.md
+1-15Lines changed: 1 addition & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,29 +7,15 @@ Welcome to the refreshed docs for GlossAPI, the GFOSS pipeline for turning acade
7
7
-[Quickstart Recipes](quickstart.md) — common extraction/OCR flows in copy-paste form.
8
8
-[Lightweight PDF Corpus](lightweight_corpus.md) — 20 one-page PDFs for smoke testing without Docling or GPUs.
9
9
10
-
## Understand the architecture
11
-
-[Architecture Overview](architecture/index.md) — the end-to-end staged model and why it exists.
12
-
-[Core Design Principles](architecture/core_design_principles.md) — the design constraints that shape the pipeline.
13
-
-[Docling Throughput and Batching](architecture/docling_throughput_and_batching.md) — how throughput and stability trade off.
14
-
-[Failure Recovery and Skiplist](architecture/docling_failure_recovery_and_skiplist.md) — how the pipeline survives problematic PDFs.
15
-
-[Greek Text Validation](architecture/greek_text_validation.md) — why extraction success is not enough for Greek corpora.
16
-
-[Metadata, Artifacts, and Run Diagnostics](architecture/metadata_artifacts_and_run_diagnostics.md) — how provenance and operational state are retained.
17
-
-[Artifact Layout and Stage Handoffs](architecture/artifact_layout_and_stage_handoffs.md) — how folders, filenames, and metadata glue the stages together.
18
-
-[Resumability, Recovery, and Retention](architecture/resumability_recovery_and_retention.md) — how the current design supports reruns and where storage pressure appears.
19
-
-[DeepSeek-Only Upgrade Roadmap](architecture/deepseek_only_upgrade_roadmap.md) — the staged simplification plan for OCR and dependency upgrades.
20
-
21
10
## Learn the pipeline
22
11
-[Pipeline Overview](pipeline.md) explains each stage and the emitted artifacts.
23
12
-[OCR & Math Enrichment](ocr_and_math_enhancement.md) covers DeepSeek OCR remediation and Docling-based enrichment.
24
13
-[Multi-GPU & Benchmarking](multi_gpu.md) shares scaling and scheduling tips.
25
-
-[Stage Reference](stages/index.md) breaks down each pipeline stage as a contract.
26
14
27
15
## Configure and debug
28
16
-[Configuration](configuration.md) lists all environment knobs.
29
17
-[Troubleshooting](troubleshooting.md) captures the most common pitfalls.
0 commit comments