Skip to content

Commit 715d724

Browse files
committed
feat: FAISS vector search, OCR for scanned PDFs, DOCX reports (v0.7.0)
- RAG upgraded to dual-backend: FAISS+sentence-transformers (semantic) with TF-IDF fallback - OCR fallback (pytesseract) for image-based PDF manuals in document_reader - New generate_diagnostic_report_docx tool for structured Word reports - 3 new optional dependency groups: vector-search, ocr, docx - search_documentation now reports active backend in response - 27 MCP tools (was 26) - Updated all docs, CHANGELOG, README, GitHub Pages, INSTALL, CITATION
1 parent 0c34464 commit 715d724

15 files changed

+892
-58
lines changed

CHANGELOG.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,35 @@ All notable changes to the Predictive Maintenance MCP Server project will be doc
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.7.0] - 2025-07-14
9+
10+
### Added
11+
- **FAISS vector search**`search_documentation` now uses FAISS + sentence-transformers for semantic retrieval when installed (`pip install predictive-maintenance-mcp[vector-search]`). Falls back to TF-IDF keyword search when not installed. Dual-backend `DocumentIndex` in `src/rag.py`.
12+
- **OCR for scanned PDFs**`document_reader.extract_text_from_pdf()` automatically falls back to Tesseract OCR for pages with empty/minimal text. Requires optional `pytesseract` + `pdf2image` + Poppler.
13+
- **DOCX diagnostic reports** — New `generate_diagnostic_report_docx` MCP tool and `save_diagnostic_report_docx()` in report generator. Creates structured Word documents with statistics tables, FFT/envelope peaks, bearing frequencies, ISO evaluation, and diagnostic summary. Requires optional `python-docx`.
14+
- **New optional dependency groups** in `pyproject.toml`: `vector-search`, `ocr`, `docx`. The `full` extra now includes all of them.
15+
- **Overlapping chunking** — New `chunk_text()` helper in RAG module for character-level overlapping chunks alongside paragraph-aware chunking.
16+
17+
### Changed
18+
- **`search_documentation`** now reports active backend (`faiss` or `tfidf`) in response
19+
- **27 MCP tools** (was 26) — added `generate_diagnostic_report_docx`
20+
- Version bumped to 0.7.0
21+
22+
## [0.6.0] - 2025-07-08
23+
24+
### Added
25+
- **RAG-based document search** — New `search_documentation` MCP tool using TF-IDF indexing over machine manuals and bearing catalogs (`src/rag.py`)
26+
- **`SpectralPeak` model** — Structured representation for individual FFT peaks (frequency, magnitude, dB, annotation)
27+
28+
### Changed
29+
- **Compact FFT output**`analyze_fft` now returns top-20 peaks + RMS/stats instead of full frequency/magnitude arrays (~200 KB → ~2 KB per call), eliminating LLM context overflow
30+
- **Compact signal resource**`read_signal_file` returns metadata + statistics only (no raw samples), preventing large JSON payloads
31+
- **Server instructions** updated with output-efficiency policy and RAG documentation guidance
32+
- **`pypdf`** promoted from optional to required dependency
33+
34+
### Fixed
35+
- LLM "output too long" errors caused by full-array serialisation in `FFTResult`
36+
837
## [0.5.0] - 2026-02-16
938

1039
### Added
@@ -138,7 +167,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
138167

139168
## Roadmap
140169

141-
### Planned for v0.6.0
170+
### Planned for v0.7.0
142171
- **📦 Docker image** for zero-install setup
143172
- **📏 Customizable ISO report thresholds**
144173
- Multi-signal comparison tools

CITATION.cff

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
cff-version: 1.2.0
22
message: "If you use this software, please cite it as below."
33
title: "Predictive Maintenance MCP Server: An open-source framework for integrating Large Language Models with predictive maintenance and fault diagnosis workflows"
4-
version: 0.5.0
5-
date-released: 2026-02-16
4+
version: 0.7.0
5+
date-released: 2025-07-14
66
authors:
77
- family-names: Di Maggio
88
given-names: Luigi Gianpio
@@ -51,8 +51,8 @@ preferred-citation:
5151
authors:
5252
- family-names: Di Maggio
5353
given-names: Luigi Gianpio
54-
year: 2025
55-
version: 0.5.0
54+
year: 2026
55+
version: 0.7.0
5656
repository-code: "https://github.com/LGDiMaggio/predictive-maintenance-mcp"
5757
license: MIT
5858
doi: "10.5281/zenodo.17611542"

INSTALL.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,34 @@ python validate_server.py
234234
- `scikit-learn>=1.7.2` - Machine learning
235235
- `plotly>=5.24.0` - Interactive visualizations
236236
- `pydantic>=2.12.0` - Data validation
237+
- `pypdf>=4.0` - PDF text extraction
238+
239+
### Optional Extras
240+
241+
Install any combination using pip extras:
242+
243+
```bash
244+
# Semantic vector search (FAISS + sentence-transformers)
245+
pip install predictive-maintenance-mcp[vector-search]
246+
247+
# OCR for scanned PDF manuals (Tesseract)
248+
pip install predictive-maintenance-mcp[ocr]
249+
250+
# DOCX diagnostic report generation
251+
pip install predictive-maintenance-mcp[docx]
252+
253+
# Everything (all optional features)
254+
pip install predictive-maintenance-mcp[full]
255+
```
256+
257+
| Extra | Packages | Purpose |
258+
|-------|----------|---------|
259+
| `vector-search` | `faiss-cpu`, `sentence-transformers` | Semantic document search (FAISS). Falls back to TF-IDF when not installed. |
260+
| `ocr` | `pytesseract`, `Pillow`, `pdf2image` | OCR for scanned/image-based PDF manuals. Requires [Poppler](https://github.com/ossamamehmood/Poppler-windows/releases) on system PATH. |
261+
| `docx` | `python-docx` | Generate structured Word (.docx) diagnostic reports. |
262+
| `full` | All of the above | Install all optional features at once. |
263+
264+
> **Note**: `vector-search` pulls in PyTorch (~2 GB). For lightweight installs, skip it — TF-IDF keyword search works well for technical documentation.
237265
238266
### Development Dependencies
239267
- `pytest>=8.0.0` - Testing

README.md

Lines changed: 30 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,10 @@ This project is built around the **Model Context Protocol (MCP)** — an open st
5959
│ │ ML Anomaly │ │ Manual/PDF │ │ Bearing │ │
6060
│ │ Detection │ │ Reader │ │ Catalog │ │
6161
│ └────────────────┘ └───────────────┘ └───────────────┘ │
62+
│ ┌────────────────┐ ┌───────────────┐ │
63+
│ │ RAG Document │ │ DOCX Report │ │
64+
│ │ Search (FAISS) │ │ Generation │ │
65+
│ └────────────────┘ └───────────────┘ │
6266
└──────────────┬───────────────────────────────────────────────┘
6367
6468
@@ -173,7 +177,11 @@ This project serves two audiences. Pick the door that fits you:
173177

174178
</details>
175179
- **📁 Multi-Format Support** — Load signals from CSV, MAT (MATLAB), WAV, NPY, and Parquet files
176-
- **🚀 Zero Configuration** — Works out of the box with sample data, auto-detects sampling rates from metadata
180+
- **🔎 RAG Document Search** — Vector search (FAISS + sentence-transformers) with TF-IDF fallback over machine manuals and bearing catalogs. Auto-cached.
181+
- **📝 DOCX Reports** — Generate structured Word diagnostic reports alongside interactive HTML (requires `python-docx`)
182+
- **🔍 OCR for Scanned PDFs** — Automatic OCR fallback (Tesseract) for image-based equipment manuals
183+
- **⚡ LLM-Optimised Output** — Tool responses return compact summaries (top peaks, statistics) instead of raw arrays, keeping LLM context windows lean
184+
- **�🚀 Zero Configuration** — Works out of the box with sample data, auto-detects sampling rates from metadata
177185

178186
---
179187

@@ -393,12 +401,13 @@ Tools perform **computations and generate outputs**:
393401
- **`generate_fft_report`** — Interactive FFT spectrum HTML report with peak table
394402
- **`generate_envelope_report`** — Envelope analysis report with bearing fault markers
395403
- **`generate_iso_report`** — ISO 20816-3 evaluation with zone visualization
404+
- **`generate_diagnostic_report_docx`** — Structured Word (.docx) diagnostic report (requires `python-docx`)
396405
- **`generate_pca_visualization_report`** — 2D/3D PCA projection report for anomaly exploration
397406
- **`generate_feature_comparison_report`** — Feature-level comparison report across signals/classes
398407
- **`list_html_reports`** — List all generated reports with metadata
399408
- **`get_report_info`** — Get report details without loading full HTML
400409

401-
> 💡 **All reports are interactive Plotly visualizations saved to `reports/` directory**
410+
> 💡 **HTML reports are interactive Plotly visualizations saved to `reports/`. DOCX reports are structured Word documents for stakeholders.**
402411
403412
</details>
404413

@@ -410,6 +419,7 @@ Tools perform **computations and generate outputs**:
410419
- **`calculate_bearing_characteristic_frequencies`** — Calculate BPFO/BPFI/BSF/FTF from geometry
411420
- **`read_manual_excerpt`** — Read manual text excerpt (configurable page limit)
412421
- **`search_bearing_catalog`** — Search bearing geometry in local catalog (20+ common bearings)
422+
- **`search_documentation`** — Semantic search across machine manuals and bearing catalogs (FAISS vector search or TF-IDF fallback)
413423

414424
**MCP Resources:**
415425
- `manual://list` — Browse available manuals
@@ -442,7 +452,9 @@ The `skills/` directory contains pre-built guided workflows that orchestrate mul
442452
| [**quick-screening**](skills/quick-screening/SKILL.md) | 5 | Fast health screening with clear Healthy/Suspicious/Critical classification |
443453
| [**report-generation**](skills/report-generation/SKILL.md) | 6 | Professional HTML report generation with composite multi-report option |
444454

445-
> 💡 Skills are standalone markdown files that any MCP-compatible LLM client can use as system instructions to coordinate multi-step diagnostic workflows.
455+
> ⚠️ **Skills are Claude / GitHub Copilot-specific.** They use [SKILL.md with YAML frontmatter](https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/skills), a convention recognised by Claude.ai, Claude Code, and Copilot agents. Other LLM clients can still read them as plain markdown, but automatic skill invocation requires a Claude or Copilot-compatible host.
456+
>
457+
> **MCP tools, resources, and HTML reports are LLM-agnostic** — they work with any MCP-compatible client (ChatGPT, Ollama, LM Studio, etc.).
446458
447459
---
448460

@@ -480,10 +492,11 @@ The system follows a **hybrid MCP architecture** combining Resources (direct dat
480492
│ │ TOOLS (Analysis & Processing) │ │
481493
│ │ • FFT, Envelope, ISO 20816-3 │ │
482494
│ │ • ML Anomaly Detection │ │
483-
│ │ • Report Generation (HTML) │ │
495+
│ │ • Report Generation (HTML + DOCX) │ │
484496
│ │ • Manual Spec Extraction │ │
485497
│ │ • Bearing Frequency Calculation │ │
486498
│ │ • Bearing Catalog Search │ │
499+
│ │ • RAG Document Search (FAISS / TF-IDF) │ │
487500
│ └──────────────────────────────────────────────────────┘ │
488501
└────────────────────┬────────────────────────────────────────┘
489502
@@ -523,8 +536,8 @@ The system follows a **hybrid MCP architecture** combining Resources (direct dat
523536

524537
**Key Features:**
525538
-**4 MCP Resources** — Direct read access to signals and manuals
526-
-**25 MCP Tools** — Complete diagnostic workflow (analysis, plotting, ML, reporting, manuals)
527-
-**4 MCP Prompts** — Guided diagnostic workflows
539+
-**27 MCP Tools** — Complete diagnostic workflow (analysis, plotting, ML, reporting incl. DOCX, manuals, RAG search)
540+
-**3 Copilot Skills** — Guided diagnostic workflows (Claude / Copilot-specific)
528541
-**Hybrid Architecture** — Resources for reading, Tools for processing
529542
-**Local-First** — All data stays on your machine (privacy-preserving)
530543

@@ -601,6 +614,7 @@ All analysis tools generate **interactive HTML reports** with Plotly visualizati
601614
| 🔊 **FFT Spectrum** | `generate_fft_report()` | Frequency analysis, peak detection, harmonic markers |
602615
| 🎯 **Envelope Analysis** | `generate_envelope_report()` | Bearing fault frequencies, modulation detection |
603616
| 📏 **ISO 20816-3** | `generate_iso_report()` | Vibration severity zones, compliance assessment |
617+
| 📝 **Diagnostic DOCX** | `generate_diagnostic_report_docx()` | Word document with stats, peaks, ISO, diagnosis |
604618

605619
All reports include:
606620
- Interactive Plotly charts (pan/zoom/hover)
@@ -628,7 +642,7 @@ Generate FFT report for baseline_1.csv
628642
| [Ollama Guide](docs/OLLAMA_GUIDE.md) | Engineers | Use with local LLMs (fully air-gapped) |
629643
| [CHANGELOG.md](CHANGELOG.md) | Everyone | Version history |
630644
| [data/README.md](data/README.md) | Everyone | Dataset documentation |
631-
| [skills/](skills/) | 🤖 LLM Clients | Copilot Skills — guided diagnostic workflows (bearing, screening, reporting) |
645+
| [skills/](skills/) | 🤖 Claude / Copilot | Copilot Skills — guided diagnostic workflows (bearing, screening, reporting) |
632646

633647
---
634648

@@ -701,13 +715,12 @@ npx @modelcontextprotocol/inspector python -m predictive_maintenance_mcp
701715

702716
## 🚀 Roadmap
703717

704-
### ✨ Recent: v0.5.0 — Code Quality & Multi-Format Support
718+
### ✨ Recent: v0.7.0 — Vector Search, OCR & DOCX Reports
705719

706-
- 📂 **Multi-format signal loading** — CSV, MAT, WAV, NPY, Parquet via unified `load_signal_data()`
707-
- 🔧 **ML code deduplication** — 4 helper functions reduce ~163 statements
708-
- 📦 **pypdf migration** — Replaced deprecated PyPDF2 with pypdf
709-
- ▶️ **`python -m` support** — Run as `python -m predictive_maintenance_mcp`
710-
- 🧹 **Consolidated metadata reads** — ISO evaluation no longer double-reads metadata files
720+
- 🔎 **FAISS vector search** — Semantic document retrieval with sentence-transformers (TF-IDF fallback when not installed)
721+
- 🔍 **OCR for scanned PDFs** — Automatic Tesseract OCR fallback for image-based equipment manuals
722+
- 📝 **DOCX diagnostic reports** — Structured Word documents with statistics, peaks, ISO evaluation, and diagnostic summary
723+
-**Compact FFT output** — Top-20 peaks + RMS/stats instead of full arrays (~200 KB → ~2 KB)
711724

712725
### 🔮 Planned Enhancements
713726

@@ -718,8 +731,9 @@ Each item below links to an open issue where you can **discuss, contribute, or c
718731
| ✅ Done | **Parquet/MAT/WAV/NPY data format support** | v0.5.0 ||
719732
| 🔴 High | **Customizable ISO report thresholds** | Open | [Good First Issue](https://github.com/LGDiMaggio/predictive-maintenance-mcp/issues) |
720733
| 🔴 High | **Docker image for zero-install setup** | Open | [Help Wanted](https://github.com/LGDiMaggio/predictive-maintenance-mcp/issues) |
721-
| 🟡 Medium | **Vector search for large documents** (ChromaDB/FAISS) | Planned | [Discuss](https://github.com/LGDiMaggio/predictive-maintenance-mcp/discussions) |
722-
| 🟡 Medium | **OCR for scanned PDF manuals** (Tesseract) | Planned | [Discuss](https://github.com/LGDiMaggio/predictive-maintenance-mcp/discussions) |
734+
| ✅ Done | **Vector search for large documents** (FAISS + sentence-transformers) | v0.7.0 ||
735+
| ✅ Done | **OCR for scanned PDF manuals** (Tesseract) | v0.7.0 ||
736+
| ✅ Done | **DOCX diagnostic reports** (python-docx) | v0.7.0 ||
723737
| 🟡 Medium | **Multi-signal trending** — Compare historical data | Planned | [Discuss](https://github.com/LGDiMaggio/predictive-maintenance-mcp/discussions) |
724738
| 🟢 Future | **Real-time streaming** — Live vibration monitoring | Concept ||
725739
| 🟢 Future | **Dashboard** — Multi-asset fleet monitoring | Concept ||
@@ -766,7 +780,7 @@ If you use this server in your research or projects, please cite:
766780
title = {Predictive Maintenance MCP Server: An open-source framework for integrating Large Language Models with predictive maintenance and fault diagnosis workflows},
767781
author = {Di Maggio, Luigi Gianpio},
768782
year = {2025},
769-
version = {0.5.0},
783+
version = {0.7.0},
770784
url = {https://github.com/LGDiMaggio/predictive-maintenance-mcp},
771785
doi = {10.5281/zenodo.17611542}
772786
}

SECURITY.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@
44

55
| Version | Supported |
66
| ------- | ------------------ |
7+
| 0.6.x | :white_check_mark: |
78
| 0.5.x | :white_check_mark: |
8-
| 0.4.x | :white_check_mark: |
9-
| 0.3.x | :x: |
10-
| < 0.3 | :x: |
9+
| 0.4.x | :x: |
10+
| < 0.4 | :x: |
1111

1212
## Reporting a Vulnerability
1313

docs/index.html

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
"description": "Open-source MCP server for AI-powered predictive maintenance, bearing diagnostics, vibration analysis, and ISO 20816 compliance.",
3939
"applicationCategory": "EngineeringApplication",
4040
"operatingSystem": "Windows, macOS, Linux",
41-
"softwareVersion": "0.5.0",
41+
"softwareVersion": "0.7.0",
4242
"license": "https://opensource.org/licenses/MIT",
4343
"url": "https://lgdimaggio.github.io/predictive-maintenance-mcp/",
4444
"codeRepository": "https://github.com/LGDiMaggio/predictive-maintenance-mcp",
@@ -667,7 +667,7 @@
667667
<div class="container">
668668
<div class="hero-badge">
669669
<span class="pulse"></span>
670-
v0.5.0 — Multi-format support, ML dedup &amp; more
670+
v0.7.0 — Vector search, OCR &amp; DOCX reports
671671
</div>
672672

673673
<h1>
@@ -709,7 +709,7 @@ <h1>
709709
<div class="container">
710710
<div class="stats-grid">
711711
<div class="stat-card">
712-
<div class="stat-num" data-target="25">0</div>
712+
<div class="stat-num" data-target="27">0</div>
713713
<div class="stat-label">MCP Tools</div>
714714
</div>
715715
<div class="stat-card">
@@ -765,7 +765,7 @@ <h3>ML Anomaly Detection</h3>
765765
<div class="feature-card reveal">
766766
<div class="feature-icon green">📄</div>
767767
<h3>Interactive HTML Reports</h3>
768-
<p>Publication-quality reports with Plotly charts, auto-generated summaries. Shareable files for ops teams and management.</p>
768+
<p>Publication-quality reports with Plotly charts and structured DOCX diagnostics. Shareable files for ops teams and management.</p>
769769
</div>
770770
<div class="feature-card reveal">
771771
<div class="feature-icon purple">📁</div>
@@ -903,7 +903,7 @@ <h2>Reports &amp; Visualizations</h2>
903903
</figure>
904904
<figure class="screenshot-card reveal">
905905
<img src="https://raw.githubusercontent.com/LGDiMaggio/predictive-maintenance-mcp/main/assets/MCPserver.png"
906-
alt="MCP server architecture showing 25 tools, 4 resources, 4 prompts"
906+
alt="MCP server architecture showing 26 tools, 4 resources, 4 prompts"
907907
loading="lazy" width="600" height="400">
908908
<figcaption>MCP server architecture overview</figcaption>
909909
</figure>

pyproject.toml

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "predictive-maintenance-mcp"
7-
version = "0.5.0"
7+
version = "0.7.0"
88
description = "Proof of Concept: AI-Powered Predictive Maintenance & Fault Diagnosis MCP Server - Industrial machinery condition monitoring, vibration analysis, bearing diagnostics, and ML-based anomaly detection through Model Context Protocol"
99
readme = "README.md"
1010
authors = [
@@ -51,6 +51,7 @@ dependencies = [
5151
"pandas>=2.3.3",
5252
"plotly>=5.24.0",
5353
"pydantic>=2.12.0",
54+
"pypdf>=4.0",
5455
"scikit-learn>=1.7.2",
5556
"scipy>=1.16.2",
5657
]
@@ -81,8 +82,20 @@ ml = [
8182
viz = [
8283
"plotly>=5.20",
8384
]
85+
vector-search = [
86+
"faiss-cpu>=1.7",
87+
"sentence-transformers>=2.0",
88+
]
89+
ocr = [
90+
"pytesseract>=0.3",
91+
"Pillow>=10.0",
92+
"pdf2image>=1.16",
93+
]
94+
docx = [
95+
"python-docx>=1.0",
96+
]
8497
full = [
85-
"predictive-maintenance-mcp[docs,ml,viz]",
98+
"predictive-maintenance-mcp[docs,ml,viz,vector-search,ocr,docx]",
8699
]
87100
dev = [
88101
"pytest>=8.0.0",

server.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@
33
"name": "io.github.LGDiMaggio/predictive-maintenance-mcp",
44
"title": "Predictive Maintenance",
55
"description": "Industrial vibration analysis, bearing fault diagnosis, ISO 20816, and ML anomaly detection",
6-
"version": "0.5.0",
6+
"version": "0.7.0",
77
"packages": [
88
{
99
"registryType": "pypi",
1010
"identifier": "predictive-maintenance-mcp",
11-
"version": "0.5.0",
11+
"version": "0.7.0",
1212
"transport": {
1313
"type": "stdio"
1414
}

src/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
Package name: predictive_maintenance_mcp (mapped from src/ directory).
99
"""
1010

11-
__version__ = "0.5.0"
11+
__version__ = "0.7.0"
1212
__author__ = "Luigi Gianpio Di Maggio"
1313
__license__ = "MIT"
1414

0 commit comments

Comments
 (0)