You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- RAG upgraded to dual-backend: FAISS+sentence-transformers (semantic) with TF-IDF fallback
- OCR fallback (pytesseract) for image-based PDF manuals in document_reader
- New generate_diagnostic_report_docx tool for structured Word reports
- 3 new optional dependency groups: vector-search, ocr, docx
- search_documentation now reports active backend in response
- 27 MCP tools (was 26)
- Updated all docs, CHANGELOG, README, GitHub Pages, INSTALL, CITATION
Copy file name to clipboardExpand all lines: CHANGELOG.md
+30-1Lines changed: 30 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,35 @@ All notable changes to the Predictive Maintenance MCP Server project will be doc
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
8
+
## [0.7.0] - 2025-07-14
9
+
10
+
### Added
11
+
-**FAISS vector search** — `search_documentation` now uses FAISS + sentence-transformers for semantic retrieval when installed (`pip install predictive-maintenance-mcp[vector-search]`). Falls back to TF-IDF keyword search when not installed. Dual-backend `DocumentIndex` in `src/rag.py`.
12
+
-**OCR for scanned PDFs** — `document_reader.extract_text_from_pdf()` automatically falls back to Tesseract OCR for pages with empty/minimal text. Requires optional `pytesseract` + `pdf2image` + Poppler.
13
+
-**DOCX diagnostic reports** — New `generate_diagnostic_report_docx` MCP tool and `save_diagnostic_report_docx()` in report generator. Creates structured Word documents with statistics tables, FFT/envelope peaks, bearing frequencies, ISO evaluation, and diagnostic summary. Requires optional `python-docx`.
14
+
-**New optional dependency groups** in `pyproject.toml`: `vector-search`, `ocr`, `docx`. The `full` extra now includes all of them.
15
+
-**Overlapping chunking** — New `chunk_text()` helper in RAG module for character-level overlapping chunks alongside paragraph-aware chunking.
16
+
17
+
### Changed
18
+
-**`search_documentation`** now reports active backend (`faiss` or `tfidf`) in response
Copy file name to clipboardExpand all lines: CITATION.cff
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
cff-version: 1.2.0
2
2
message: "If you use this software, please cite it as below."
3
3
title: "Predictive Maintenance MCP Server: An open-source framework for integrating Large Language Models with predictive maintenance and fault diagnosis workflows"
|`vector-search`|`faiss-cpu`, `sentence-transformers`| Semantic document search (FAISS). Falls back to TF-IDF when not installed. |
260
+
|`ocr`|`pytesseract`, `Pillow`, `pdf2image`| OCR for scanned/image-based PDF manuals. Requires [Poppler](https://github.com/ossamamehmood/Poppler-windows/releases) on system PATH. |
261
+
|`docx`|`python-docx`| Generate structured Word (.docx) diagnostic reports. |
262
+
|`full`| All of the above | Install all optional features at once. |
263
+
264
+
> **Note**: `vector-search` pulls in PyTorch (~2 GB). For lightweight installs, skip it — TF-IDF keyword search works well for technical documentation.
@@ -173,7 +177,11 @@ This project serves two audiences. Pick the door that fits you:
173
177
174
178
</details>
175
179
-**📁 Multi-Format Support** — Load signals from CSV, MAT (MATLAB), WAV, NPY, and Parquet files
176
-
-**🚀 Zero Configuration** — Works out of the box with sample data, auto-detects sampling rates from metadata
180
+
-**🔎 RAG Document Search** — Vector search (FAISS + sentence-transformers) with TF-IDF fallback over machine manuals and bearing catalogs. Auto-cached.
181
+
-**📝 DOCX Reports** — Generate structured Word diagnostic reports alongside interactive HTML (requires `python-docx`)
182
+
-**🔍 OCR for Scanned PDFs** — Automatic OCR fallback (Tesseract) for image-based equipment manuals
183
+
-**⚡ LLM-Optimised Output** — Tool responses return compact summaries (top peaks, statistics) instead of raw arrays, keeping LLM context windows lean
184
+
-**�🚀 Zero Configuration** — Works out of the box with sample data, auto-detects sampling rates from metadata
177
185
178
186
---
179
187
@@ -393,12 +401,13 @@ Tools perform **computations and generate outputs**:
393
401
-**`generate_fft_report`** — Interactive FFT spectrum HTML report with peak table
394
402
-**`generate_envelope_report`** — Envelope analysis report with bearing fault markers
395
403
-**`generate_iso_report`** — ISO 20816-3 evaluation with zone visualization
404
+
-**`generate_diagnostic_report_docx`** — Structured Word (.docx) diagnostic report (requires `python-docx`)
396
405
-**`generate_pca_visualization_report`** — 2D/3D PCA projection report for anomaly exploration
397
406
-**`generate_feature_comparison_report`** — Feature-level comparison report across signals/classes
398
407
-**`list_html_reports`** — List all generated reports with metadata
399
408
-**`get_report_info`** — Get report details without loading full HTML
400
409
401
-
> 💡 **All reports are interactive Plotly visualizations saved to `reports/` directory**
410
+
> 💡 **HTML reports are interactive Plotly visualizations saved to `reports/`. DOCX reports are structured Word documents for stakeholders.**
402
411
403
412
</details>
404
413
@@ -410,6 +419,7 @@ Tools perform **computations and generate outputs**:
410
419
-**`calculate_bearing_characteristic_frequencies`** — Calculate BPFO/BPFI/BSF/FTF from geometry
411
420
-**`read_manual_excerpt`** — Read manual text excerpt (configurable page limit)
412
421
-**`search_bearing_catalog`** — Search bearing geometry in local catalog (20+ common bearings)
422
+
-**`search_documentation`** — Semantic search across machine manuals and bearing catalogs (FAISS vector search or TF-IDF fallback)
413
423
414
424
**MCP Resources:**
415
425
-`manual://list` — Browse available manuals
@@ -442,7 +452,9 @@ The `skills/` directory contains pre-built guided workflows that orchestrate mul
442
452
|[**quick-screening**](skills/quick-screening/SKILL.md)| 5 | Fast health screening with clear Healthy/Suspicious/Critical classification |
443
453
|[**report-generation**](skills/report-generation/SKILL.md)| 6 | Professional HTML report generation with composite multi-report option |
444
454
445
-
> 💡 Skills are standalone markdown files that any MCP-compatible LLM client can use as system instructions to coordinate multi-step diagnostic workflows.
455
+
> ⚠️ **Skills are Claude / GitHub Copilot-specific.** They use [SKILL.md with YAML frontmatter](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/skills), a convention recognised by Claude.ai, Claude Code, and Copilot agents. Other LLM clients can still read them as plain markdown, but automatic skill invocation requires a Claude or Copilot-compatible host.
456
+
>
457
+
> **MCP tools, resources, and HTML reports are LLM-agnostic** — they work with any MCP-compatible client (ChatGPT, Ollama, LM Studio, etc.).
446
458
447
459
---
448
460
@@ -480,10 +492,11 @@ The system follows a **hybrid MCP architecture** combining Resources (direct dat
- 📦 **pypdf migration** — Replaced deprecated PyPDF2 with pypdf
709
-
- ▶️ **`python -m` support** — Run as `python -m predictive_maintenance_mcp`
710
-
- 🧹 **Consolidated metadata reads** — ISO evaluation no longer double-reads metadata files
720
+
- 🔎 **FAISS vector search** — Semantic document retrieval with sentence-transformers (TF-IDF fallback when not installed)
721
+
- 🔍 **OCR for scanned PDFs** — Automatic Tesseract OCR fallback for image-based equipment manuals
722
+
- 📝 **DOCX diagnostic reports** — Structured Word documents with statistics, peaks, ISO evaluation, and diagnostic summary
723
+
- ⚡ **Compact FFT output** — Top-20 peaks + RMS/stats instead of full arrays (~200 KB → ~2 KB)
711
724
712
725
### 🔮 Planned Enhancements
713
726
@@ -718,8 +731,9 @@ Each item below links to an open issue where you can **discuss, contribute, or c
718
731
| ✅ Done |**Parquet/MAT/WAV/NPY data format support**| v0.5.0 | — |
719
732
| 🔴 High |**Customizable ISO report thresholds**| Open |[Good First Issue](https://github.com/LGDiMaggio/predictive-maintenance-mcp/issues)|
720
733
| 🔴 High |**Docker image for zero-install setup**| Open |[Help Wanted](https://github.com/LGDiMaggio/predictive-maintenance-mcp/issues)|
721
-
| 🟡 Medium |**Vector search for large documents** (ChromaDB/FAISS) | Planned |[Discuss](https://github.com/LGDiMaggio/predictive-maintenance-mcp/discussions)|
722
-
| 🟡 Medium |**OCR for scanned PDF manuals** (Tesseract) | Planned |[Discuss](https://github.com/LGDiMaggio/predictive-maintenance-mcp/discussions)|
734
+
| ✅ Done |**Vector search for large documents** (FAISS + sentence-transformers) | v0.7.0 | — |
735
+
| ✅ Done |**OCR for scanned PDF manuals** (Tesseract) | v0.7.0 | — |
@@ -766,7 +780,7 @@ If you use this server in your research or projects, please cite:
766
780
title = {Predictive Maintenance MCP Server: An open-source framework for integrating Large Language Models with predictive maintenance and fault diagnosis workflows},
0 commit comments