Skip to content

Latest commit

 

History

History
1531 lines (1227 loc) · 54.5 KB

File metadata and controls

1531 lines (1227 loc) · 54.5 KB

Machinery Diagnostics - Complete Examples

This guide provides step-by-step examples of complete diagnostic workflows using the Machinery Diagnostics MCP Server.

Table of Contents


Example 1: Simple FFT Analysis

Objective

Analyze the frequency content of a vibration signal to identify dominant frequencies.

Scenario

A bearing in healthy condition is operating. You want to analyze the baseline frequency spectrum to establish normal operating characteristics.

Real Dataset Information

  • Signal: real_train/baseline_1.csv
  • Sampling Rate: 97,656 Hz (from metadata)
  • Shaft Speed: 1500 RPM (25 Hz) - from metadata
  • Signal Duration: 6.0 seconds
  • Condition: Healthy baseline
  • Note: Bearing type not specified in metadata

Step-by-Step

Step 1: List available signals

In Claude Desktop:

List all available signals in the dataset

Step 2: Run FFT analysis

Analyze the FFT spectrum of real_train/baseline_1.csv

Note: The LLM will auto-detect sampling rate from metadata (97656 Hz).
If metadata is missing, you must provide sampling_rate explicitly.

Step 3: Interpret results

The tool will analyze the signal and return dominant frequencies:

{
  "peaks": [
    {"frequency": 25.0, "magnitude": 0.68},
    {"frequency": 50.0, "magnitude": 0.42},
    {"frequency": 75.0, "magnitude": 0.25}
  ],
  "signal_duration": 6.0,
  "analyzed_segment_duration": 1.0,
  "frequency_resolution": 1.0,
  "sampling_rate": 97656
}

Important Notes:

  • The analysis uses 1.0 second random segment by default (not full 6.0s signal)
  • Random segment extracted for efficiency (different segment each time unless seed specified)
  • Frequency resolution: 1 Hz (1/1.0s) - excellent for most diagnostics
  • To analyze full signal, specify segment_duration=None
  • To get reproducible results, specify random_seed parameter

Interpretation:

  • 25 Hz: Shaft rotation frequency (1500 RPM from metadata)
  • 50 Hz, 75 Hz: Harmonics of shaft frequency (2×, 3×)
  • Clean spectrum: No bearing fault frequencies detected

Step 4: Generate professional HTML report

Generate FFT report for real_train/baseline_1.csv

This creates:

  • HTML file (fft_spectrum_real_train_baseline_1.html) in reports/ directory
  • Interactive Plotly chart with zoom, pan, and hover features
  • Embedded metadata for LLM to read without opening file

No PNG files are generated - only HTML reports. To share static images:

  • Open HTML in browser and use screenshot/export feature
  • Or implement PNG export separately using kaleido library

Expected Outcome

✅ Identified shaft rotation frequency (25 Hz) and harmonics
✅ No unusual frequencies detected
✅ Vibration spectrum confirms healthy baseline operation
✅ Professional HTML report generated in reports/ directory
✅ LLM understood signal was analyzed using 1.0s random segment (default)


Example 2: Bearing Fault Detection

Objective

Detect and localize bearing faults using envelope analysis.

Scenario

You are investigating potential bearing faults in an operating machine. The vibration signal shows symptoms that may indicate outer race damage. You need to confirm the fault type and localize it using envelope analysis.

Critical Note: To perform accurate diagnosis, you MUST provide:

  • Shaft speed (Hz or RPM)
  • Bearing characteristic frequencies (BPFO, BPFI, BSF, FTF) in Hz
  • Or bearing geometry if frequencies are unknown

The LLM should request this information if not provided.

Real Dataset Information

  • Signal: real_train/OuterRaceFault_1.csv
  • Sampling Rate: 97,656 Hz (from metadata)
  • Shaft Speed: 1500 RPM (25 Hz) - from metadata
  • Signal Duration: 6.0 seconds
  • Fault Type: Outer race fault (known from experiment)
  • Bearing Characteristic Frequencies (from metadata):
    • BPFO: 81.125 Hz (Ball Pass Frequency Outer race)
    • BPFI: 118.875 Hz (Ball Pass Frequency Inner race)
    • BSF: 63.91 Hz (Ball Spin Frequency)
    • FTF: 14.8375 Hz (Fundamental Train Frequency / Cage frequency)

Important: These frequencies are provided as absolute values in Hz, not as multiples of shaft speed. When calling tools, use these exact Hz values.

Step-by-Step

Step 1: Perform envelope analysis

Perform envelope analysis on real_train/OuterRaceFault_1.csv with:
- Filter band: 2000 to 8000 Hz

Note: If you don't provide sampling_rate, the LLM will auto-detect it from metadata.
The analysis will use 1.0 second random segment by default.
For reproducible results, specify random_seed parameter.
To analyze full signal, set segment_duration=None.

Step 2: Examine envelope spectrum

The tool returns:

{
  "envelope_peaks": [
    {"frequency": 81.125, "magnitude": 0.94},
    {"frequency": 162.25, "magnitude": 0.52},
    {"frequency": 243.375, "magnitude": 0.31}
  ],
  "filter_band": [2000, 8000],
  "dominant_frequency": 81.125,
  "analyzed_segment_duration": 1.0,
  "signal_duration": 6.0
}

Step 3: Compare with bearing frequencies

You must provide the bearing frequencies to compare:

Compare detected peaks with these bearing frequencies:
- BPFO: 81.125 Hz
- BPFI: 118.875 Hz
- BSF: 63.91 Hz
- FTF: 14.8375 Hz
Detected Peak Theoretical Match Fault Type
81.125 Hz 81.125 Hz (BPFO) Outer race fault
162.25 Hz 162.25 Hz (2×BPFO) Harmonic
243.375 Hz 243.375 Hz (3×BPFO) Harmonic

Step 4: Generate professional envelope report

Generate envelope report for real_train/OuterRaceFault_1.csv with bearing frequencies:
- BPFO: 81.125 Hz
- BPFI: 118.875 Hz
- BSF: 63.91 Hz
- FTF: 14.8375 Hz
- Shaft speed: 25 Hz

This creates:

  • HTML file in reports/ directory with filtered signal, envelope, and spectrum plots
  • Automatic highlighting of bearing frequency peaks with vertical markers
  • Embedded metadata for LLM analysis

No PNG files are generated. For static images, open HTML in browser and export manually.

Diagnosis

🔴 OUTER RACE BEARING FAULT DETECTED

Evidence:

  • Strong peak at BPFO (81.125 Hz) with 94% magnitude
  • Clear harmonics (2×BPFO at 162.25 Hz, 3×BPFO at 243.375 Hz)
  • Pattern consistent with localized outer race defect
  • High signal-to-noise ratio confirms advanced fault stage

Recommendation:

  • Immediate action: Schedule bearing replacement within 1-2 weeks
  • Monitoring: Increase inspection frequency to daily
  • Operation: Reduce load and speed if possible until replacement
  • Root cause: Investigate lubrication, contamination, or installation errors

Expected Outcome

✅ Bearing fault localized to outer race
✅ Severity assessed (advanced stage)
✅ Actionable maintenance plan created
✅ Professional HTML report in reports/ directory
✅ LLM confirmed analysis used 1.0s random segment from 6.0s signal


Example 3: ISO 20816-3 Compliance Check

Objective

Evaluate vibration severity according to international standard ISO 20816-3.

Scenario

You need to evaluate vibration severity according to international standard ISO 20816-3 for compliance and condition monitoring baseline establishment.

Critical Note: ISO 20816-3 only defines Machine Group 1 and 2:

  • Group 1: Large machines (> 300 kW)
  • Group 2: Medium machines (15-300 kW)

There is no Group 3 in the standard. If user provides invalid group, LLM should request clarification.

Real Dataset Information

  • Signal: real_train/baseline_1.csv
  • Sampling Rate: 97,656 Hz (from metadata)
  • Shaft Speed: 1500 RPM (25 Hz) - from metadata
  • Signal Duration: 6.0 seconds
  • Condition: Healthy baseline
  • Machine Classification: For this example, assume Group 2 (medium machine, 15-300 kW)

Step-by-Step

Step 1: Evaluate against ISO standard

Evaluate real_train/baseline_1.csv against ISO 20816-3 with:
- Machine group: 2
- Support type: rigid

Note: LLM will auto-detect sampling_rate from metadata.
If not available, you must provide it explicitly.
ISO evaluation uses the FULL signal to calculate RMS (not segments).

Step 2: Review results

{
  "rms_velocity": 1.8,
  "unit": "mm/s",
  "severity_zone": "A",
  "machine_group": 2,
  "support_type": "rigid",
  "thresholds": {
    "zone_A_B": 2.3,
    "zone_B_C": 3.5,
    "zone_C_D": 5.5
  },
  "interpretation": "New machine condition - acceptable",
  "signal_duration": 6.0
}

Step 3: Interpret zones

Zone A (🟢): 0 - 2.3 mm/s     → New machine condition (CURRENT: 1.8 mm/s)
Zone B (🟡): 2.3 - 3.5 mm/s   → Acceptable for long-term operation
Zone C (🟠): 3.5 - 5.5 mm/s   → Unsatisfactory, short-term operation only
Zone D (🔴): > 5.5 mm/s       → Unacceptable, damage may occur

Step 4: Generate ISO compliance report

Generate ISO 20816-3 report for real_train/baseline_1.csv with:
- Machine group: 2
- Support type: rigid

This creates:

  • HTML file in reports/ directory with zone visualization and RMS velocity indicator
  • Color-coded zones (A=Green, B=Yellow, C=Orange, D=Red)
  • Current RMS velocity position on zone chart
  • Severity interpretation and recommendations

No PNG file is generated. For documentation, open HTML and export manually.

Step 5: Document for compliance

Create baseline report:

Vibration Assessment Report - Baseline Measurement
--------------------------------------------------
Machine: Test bearing rig
Standard: ISO 20816-3
Date: [Measurement Date]
Dataset: real_train/baseline_1.csv

Measurement:
- Signal Duration: 6.0 seconds
- Analysis: Full signal RMS (ISO standard requires complete signal)
- RMS Velocity: 1.8 mm/s
- Severity Zone: A (Green)
- Status: NEW MACHINE CONDITION

Classification:
- Machine Group: 2 (Medium machine, 15-300 kW)
- Support Type: Rigid
- Zone A Threshold: 0 - 2.3 mm/s

Result:
✅ Baseline measurement within Zone A (optimal condition)
✅ Suitable as reference for future condition monitoring
✅ No action required

Recommendation:
- Use this measurement as baseline for future comparisons
- Monitor quarterly for trend analysis
- Re-evaluate if RMS increases by >0.5 mm/s

Expected Outcome

✅ ISO 20816-3 compliance documented
✅ Baseline measurement in Zone A (optimal)
✅ Reference established for future monitoring
✅ Professional HTML report generated in reports/ directory
✅ LLM communicated that ISO uses full signal RMS (not segments)


Example 4: Complete Bearing Diagnosis

Objective

Perform comprehensive bearing diagnostics using the guided workflow prompt.

Scenario

You need to perform comprehensive bearing diagnostics using the guided workflow prompt. A vibration signal shows symptoms that require complete diagnostic workup with severity assessment and recommendations.

Critical Note: The LLM should request all necessary parameters if not provided:

  • Shaft speed (Hz or RPM)
  • Bearing characteristic frequencies (BPFO, BPFI, BSF, FTF) in Hz
  • Machine classification for ISO evaluation
  • Filter bands for envelope analysis (if not using defaults)

Real Dataset Information

  • Signal: real_train/InnerRaceFault_vload_1.csv
  • Sampling Rate: 48,828 Hz (from metadata)
  • Shaft Speed: 1500 RPM (25 Hz) - from metadata
  • Signal Duration: 3.0 seconds
  • Load Condition: Variable load test scenario (load = 0 N from metadata)
  • Fault Type: Inner race fault (known from experiment)
  • Bearing Characteristic Frequencies (from metadata):
    • BPFO: 81.125 Hz (Outer race)
    • BPFI: 118.875 Hz (Inner race fault)
    • BSF: 63.91 Hz (Ball spin frequency)
    • FTF: 14.8375 Hz (Cage frequency)

Important: Frequencies are in absolute Hz, not multiples of shaft speed.

Machine Classification

  • Machine Type: Assume Group 2 (medium machine, 15-300 kW)
  • Support: Rigid
  • Application: Fault diagnosis under variable load

Step-by-Step

Step 1: Initiate guided diagnosis

Run complete bearing diagnosis on real_train/InnerRaceFault_vload_1.csv with:
- Shaft frequency: 25 Hz
- Bearing frequencies:
  BPFO: 81.125 Hz
  BPFI: 118.875 Hz
  BSF: 63.91 Hz
  FTF: 14.8375 Hz
- Machine group: 2
- Support type: rigid

Note: LLM will auto-detect sampling_rate (48828 Hz) from metadata.
The signal is 3.0 seconds long.
FFT and Envelope will use 1.0s random segments by default.
ISO will use the full 3.0s signal for RMS calculation.

Step 2: Workflow executes automatically

The diagnose_bearing prompt performs 6 steps:

📋 STEP 1: Parameter Validation
✅ All required parameters present
✅ Bearing frequencies verified from metadata
✅ Signal duration: 3.0 seconds
✅ FFT/Envelope will use 1.0s random segments
✅ ISO will use full 3.0s signal

📊 STEP 2: ISO 20816-3 Severity Evaluation
Machine parameters:
- Group 2: Medium machine (15-300 kW) ✅
- Support: Rigid ✅

ISO Result (full signal RMS):
- RMS Velocity: 4.2 mm/s
- Zone: C (Orange)
- Interpretation: Unsatisfactory - plan maintenance immediately

🔍 STEP 3: FFT Analysis (1.0s random segment)
Primary frequencies detected:
- 25 Hz: Shaft rotation (1500 RPM)
- 50 Hz: 2× shaft (normal harmonic)
- 118.875 Hz: BPFI (⚠️ INNER RACE FAULT!)
- 237.75 Hz: 2×BPFI (strong harmonic)

📈 STEP 4: Envelope Analysis (1.0s random segment)
Envelope spectrum peaks:
- 118.875 Hz: Strong peak (BPFI) - magnitude 0.91
- 237.75 Hz: 2×BPFI harmonic - magnitude 0.58
- 356.625 Hz: 3×BPFI harmonic - magnitude 0.34

🎯 STEP 5: Fault Pattern Recognition
Pattern Match:
- Peak at BPFI: ✅ CONFIRMED (118.875 Hz)
- Harmonics present: ✅ YES (2×, 3×BPFI clearly visible)
- Sidebands: ✅ Modulation sidebands at ±25 Hz (shaft frequency)
- Pattern: Localized inner race defect with load-dependent modulation

Diagnosis: INNER RACE BEARING FAULT (Variable Load Condition)

💡 STEP 6: Recommendations

FAULT DETECTED: Inner Race Bearing Defect
Severity: HIGH (Zone C + Strong BPFI pattern + Harmonics)

Immediate Actions:
1. ⚠️ URGENT: Schedule bearing replacement within 1 week
2. Reduce operating speed and load immediately
3. Implement daily vibration monitoring
4. Prepare spare bearing and maintenance crew

Short-term (1-2 weeks):
5. Perform bearing replacement during next available window
6. Inspect bearing visually for fault confirmation
7. Document fault characteristics for root cause analysis

Long-term Actions:
8. Investigate root cause:
   - Check lubrication quality and quantity
   - Verify bearing installation (preload, alignment)
   - Review operating conditions (load cycles, temperature)
9. Improve predictive maintenance procedures
10. Consider automated condition monitoring system

Safety Notes:
⚠️ Variable load conditions accelerate fault progression
⚠️ Monitor closely - fault may deteriorate rapidly
⚠️ Prepare for emergency shutdown if vibration increases

Diagnosis Summary

Aspect Finding Severity
ISO 20816-3 Zone C (4.2 mm/s) 🟠 Unsatisfactory
FFT Analysis BPFI at 118.875 Hz + harmonics 🔴 Fault confirmed
Envelope Strong BPFI (0.91) + 2×, 3× harmonics 🔴 Advanced stage
Modulation Sidebands at ±25 Hz 🔴 Load-sensitive fault
Fault Type Inner race defect 🔴 Critical
Recommendation Replace within 1 week ⚠️ URGENT

Professional Reports Generated

The workflow automatically creates HTML reports in reports/ directory:

  • FFT Report (fft_spectrum_real_train_InnerRaceFault_vload_1.html)
    • Shows shaft frequency (25 Hz), harmonics, and BPFI peak (118.875 Hz)
  • Envelope Report (envelope_analysis_real_train_InnerRaceFault_vload_1.html)
    • Highlights BPFI with automatic bearing frequency markers
    • Displays filtered signal (2-8 kHz), envelope, and spectrum
  • ISO Report (iso_20816_real_train_InnerRaceFault_vload_1.html)
    • Zone C indication with color-coded severity chart

No PNG files are generated. For documentation, open HTML and export manually.

Expected Outcome

✅ Comprehensive diagnosis completed
✅ Fault localized to inner race (BPFI signature at 118.875 Hz)
✅ Severity assessed (HIGH/URGENT - Zone C + strong harmonics)
✅ Detailed action plan with timeline
✅ Professional HTML reports for documentation
✅ Root cause investigation guidance provided
✅ LLM communicated analysis strategy (1.0s segments for FFT/Envelope, full signal for ISO)


Example 5: Working with Different Segment Durations

Objective

Understand when and how to use different segment durations for optimal results.

Important: The default segment duration is 1.0 seconds (random) for FFT/Envelope analysis, and 0.1 seconds for ML feature extraction. This affects frequency resolution, analysis speed, and results.

Understanding Segment Duration

For FFT/Envelope Analysis:

  • Default: 1.0s random segment (efficient, good resolution)
  • Frequency Resolution = 1 / segment_duration
    • 1.0s segment → 1 Hz resolution
    • 2.0s segment → 0.5 Hz resolution
    • Full signal → Maximum resolution (depends on signal length)

For ML Anomaly Detection:

  • Default: 0.1s segments with 50% overlap
  • Shorter segments capture transient features better
  • More segments = better statistical representation

Scenario Comparison

Scenario A: Standard Bearing Analysis (Default 1.0s Works Well)

Real Dataset:

  • Signal: real_train/baseline_1.csv
  • Sampling Rate: 97,656 Hz (from metadata)
  • Signal Duration: 6.0 seconds
  • Shaft Speed: 1500 RPM (25 Hz)
  • Bearing Frequencies: BPFO = 81.125 Hz, BPFI = 118.875 Hz

Requirements:

  • Quick screening for bearing fault frequencies (~80-120 Hz range)
  • Standard diagnostic workflow
  • Need balance between speed and resolution

Solution: Use default 1.0s random segment

Generate FFT report for real_train/baseline_1.csv

Note: LLM will auto-detect sampling rate (97656 Hz) from metadata
Default segment_duration = 1.0s random segment
Frequency resolution = 1 Hz (excellent for bearing diagnostics)

Results:

  • Frequency resolution: 1 Hz (excellent for detecting 81 Hz and 119 Hz peaks)
  • Processing time: Fast
  • Detected peaks: 25 Hz (shaft), 50 Hz (2× harmonic), 75 Hz (3× harmonic)
  • ✅ Perfect for routine bearing diagnostics

Scenario B: Detailed Frequency Analysis (Need Better Resolution)

Real Dataset:

  • Signal: real_train/InnerRaceFault_vload_1.csv
  • Sampling Rate: 48,828 Hz (from metadata)
  • Signal Duration: 3.0 seconds
  • Fault Type: Inner race fault with BPFI = 118.875 Hz

Requirements:

  • Detect modulation sidebands around BPFI (±25 Hz spacing)
  • Need to distinguish 118.875 Hz from nearby frequencies
  • Detailed analysis for research or validation

Solution: Use longer segment (2.0s)

Generate FFT report for real_train/InnerRaceFault_vload_1.csv with:
- segment_duration: 2.0

LLM will inform: "Using 2.0s random segment for 0.5 Hz frequency resolution"
Auto-detects sampling_rate: 48828 Hz from metadata

Results:

  • Frequency resolution: 0.5 Hz (can resolve sidebands at 118.875 ± 25 Hz)
  • Processing time: Moderate
  • Detected: BPFI at 118.875 Hz + sidebands + harmonics
  • ✅ Excellent resolution for sideband analysis

Scenario C: Complete Signal Analysis (Full Signal)

Real Dataset:

  • Signal: real_train/OuterRaceFault_1.csv
  • Sampling Rate: 97,656 Hz (from metadata)
  • Signal Duration: 6.0 seconds
  • Fault Type: Outer race fault with BPFO = 81.125 Hz

Requirements:

  • Maximum frequency information for publication
  • Detailed envelope spectrum analysis
  • No performance constraints

Solution: Use full signal

Generate envelope report for real_train/OuterRaceFault_1.csv with:
- segment_duration: None
- bearing_frequencies: BPFO=81.125, BPFI=118.875, BSF=63.91, FTF=14.8375

LLM informs: "Analyzing full 6.0s signal for maximum resolution (0.167 Hz)"
Auto-detects sampling_rate: 97656 Hz from metadata

Results:

  • Frequency resolution: 0.167 Hz (maximum detail)
  • Processing time: Slower
  • Clear BPFO peak at 81.125 Hz with harmonics up to 6×BPFO
  • ✅ Maximum detail for research/documentation

Scenario D: Machine Learning Feature Extraction

Real Dataset:

  • Signals: real_train/baseline_1.csv, real_train/baseline_2.csv
  • Sampling Rates: 97,656 Hz (auto-detected per file)
  • Signal Durations: 6.0 seconds each
  • Purpose: Train anomaly detection model

Requirements:

  • Extract features from multiple healthy baselines
  • Need statistical representation across signals
  • Capture transient characteristics

Solution: Use default 0.1s segments with overlap

Train anomaly detection model on:
- real_train/baseline_1.csv
- real_train/baseline_2.csv

LLM specifies: "Extracting features from 0.1s segments with 50% overlap"
Auto-detects sampling rates from metadata files

Results:

  • Many segments: 6.0s × 2 files → ~240 segments total
  • Good statistical representation from multiple baselines
  • Captures transient features for anomaly detection
  • ✅ Optimal for ML training with real data

Segment Duration Recommendations

Application Tool Segment Duration Frequency Resolution Reason
Quick FFT screening analyze_fft 1.0s (default random) 1 Hz Fast, adequate resolution
Routine monitoring analyze_fft 1.0s random 1 Hz Balances speed and accuracy
Bearing diagnostics analyze_envelope 1.0s random 1 Hz Good resolution for BPFO/BPFI
Low-speed machines analyze_fft 2.0-5.0s 0.2-0.5 Hz Better low-frequency resolution
Gear analysis analyze_fft 2.0-5.0s 0.2-0.5 Hz Resolves sidebands
Research/validation analyze_fft None (full) Maximum Complete information
ML feature extraction extract_features 0.1s (default) N/A Captures transients
ML training train_anomaly_model 0.1s + 50% overlap N/A Statistical representation

Critical Guidelines for LLM

  1. Always inform user about segment strategy:

    • FFT/Envelope: "Analyzing 1.0s random segment from 6.0s signal (1 Hz resolution)"
    • ML: "Extracting features from 0.1s segments with 50% overlap → 120 segments"
    • ISO: "Using full signal for RMS calculation (ISO standard requirement)"
  2. Suggest appropriate duration if user needs different resolution:

    • "For better frequency resolution, try segment_duration=2.0 (0.5 Hz resolution)"
    • "For low-speed analysis, consider segment_duration=5.0 (0.2 Hz resolution)"
    • "For complete analysis, use segment_duration=None (full signal)"
  3. Explain random segment strategy:

    • "Random segment ensures representative sample without analyzing full signal"
    • "For reproducible results, specify random_seed parameter"
    • "Different random segment each analysis unless seed is fixed"
  4. Clarify ML segment strategy:

    • "ML uses 0.1s segments (not 1.0s) to capture transient features"
    • "50% overlap provides more training samples without full re-computation"
    • "6.0s signal → ~120 segments → better statistical model"
  5. Warn about computational cost:

    • "Using full signal (segment_duration=None) will take longer"
    • "Full signal analysis may produce larger output"
    • "For batch processing, stick to default 1.0s random segments"

Example 6: Machine Learning-Based Anomaly Detection

Objective

Train an ML model on healthy machine data to automatically detect anomalies in new measurements using real bearing data.

Scenario

You have vibration data from bearing tests. Some signals are from healthy bearings (baseline), others from faulty bearings. You want to build a model that automatically detects when a bearing develops a fault.

Real Dataset Available

Training Data (Healthy baselines):

  • real_train/baseline_1.csv - Healthy bearing, 6.0s, 97656 Hz
  • real_train/baseline_2.csv - Healthy bearing, 6.0s, 97656 Hz

Validation Data (Mix of healthy and faulty):

  • real_train/baseline_1.csv (portion) - Should classify as healthy
  • real_train/InnerRaceFault_vload_1.csv - Should detect as anomaly

Test Data (Real-world testing):

  • real_test/baseline_3.csv - Healthy baseline test
  • real_test/InnerRaceFault_vload_6.csv - Inner race fault test
  • real_test/OuterRaceFault_vload_6.csv - Outer race fault test

Step-by-Step

Step 1: Prepare training data

Use healthy baseline signals for training:

Train anomaly detection model on these healthy signals:
- real_train/baseline_1.csv
- real_train/baseline_2.csv

Use parameters:
- Model type: OneClassSVM
- PCA variance: 0.95
- Segment duration: 0.1 seconds (ML default, not 1.0s!)
- Model name: "bearing_health_model"

For validation, include:
- Faulty: real_train/InnerRaceFault_vload_1.csv

Note: LLM will auto-detect sampling rates from metadata files.
The tool automatically reads {filename}_metadata.json for each signal.
Different files may have different sampling rates (97656 Hz vs 48828 Hz).
If metadata is missing, tool will raise an error requesting sampling_rate.
0.1s segments with 50% overlap → many segments for statistical representation.

IMPORTANT: Including fault_signal_files enables SEMI-SUPERVISED hyperparameter tuning:
- With fault data → Manual hyperparameter search with validation scoring (balanced accuracy)
- Without fault data → UNSUPERVISED mode with automatic parameters
- NOTE: Model is ALWAYS trained only on healthy data (unsupervised)
- Fault data is used ONLY for hyperparameter tuning (post-training validation)
- LLM should suggest including fault data when available for better parameter selection

Validation Strategy Selection:

The tool supports TWO validation strategies for healthy data:

Validation Strategy 1: EXPLICIT Validation Files ⭐ RECOMMENDED for separate test set

Provide healthy_validation_files explicitly:
- healthy_validation_files: ["real_test/baseline_3.csv"]
- fault_signal_files: ["real_train/InnerRaceFault_vload_1.csv"]
  • Behavior: Uses provided validation files AS-IS (no splitting)
  • Model: Trains on 100% of training data (baseline_1.csv, baseline_2.csv)
  • Validation: Tests on SEPARATE validation set (baseline_3.csv)
  • Use case: When you have dedicated validation/test datasets
  • Advantage: Proper train/test separation, no data leakage

Validation Strategy 2: AUTO-SPLIT (80/20) - Default if no validation files

Do NOT provide healthy_validation_files:
- fault_signal_files: ["real_train/InnerRaceFault_vload_1.csv"]
  • Behavior: Automatically splits training data 80/20
  • Model: Retrains on 80% of training data
  • Validation: Tests on remaining 20% of training data
  • Use case: When you don't have separate validation data
  • Advantage: Automatic validation without extra data

LLM Prompt Examples:

Option 1 (Explicit validation files):

Train anomaly detection model:
- Training: real_train/baseline_1.csv, real_train/baseline_2.csv
- Validation (healthy): real_test/baseline_3.csv
- Validation (fault): real_train/InnerRaceFault_vload_1.csv
- Model: OneClassSVM
- Model name: "bearing_health_model"

Option 2 (Auto-split):

Train anomaly detection model:
- Training: real_train/baseline_1.csv, real_train/baseline_2.csv
- Validation (fault): real_train/InnerRaceFault_vload_1.csv
- Model: OneClassSVM
- Model name: "bearing_health_model"

Note: The tool will automatically split training data 80/20 for validation.

Training Mode Selection:

The tool automatically selects training strategy based on available data:

Option A: UNSUPERVISED (Novelty Detection) - No fault data

Train model with ONLY healthy signals (no fault data provided)
  • Training: OneClassSVM/LOF trained ONLY on healthy data
  • Parameters: Automatically calculated based on sample size
  • OneClassSVM: nu = adaptive (1/√n), gamma = 'scale'
  • LocalOutlierFactor: n_neighbors = √n, contamination = 0.1
  • Use case: When you only have healthy baseline data
  • Advantage: No need for fault examples
  • Learning type: Pure unsupervised (one-class learning)

Option B: SEMI-SUPERVISED (With Validation) - With fault data ⭐ RECOMMENDED

Train model with healthy signals + fault signals for hyperparameter tuning
  • Training: Still ONLY on healthy data (unsupervised)
  • Hyperparameter tuning: Uses validation set (healthy + fault) post-training
  • OneClassSVM: Tests nu=[0.01,0.05,0.1,0.2] × gamma=['scale','auto',0.001,0.01,0.1]
  • LocalOutlierFactor: Tests n_neighbors=[10,20,30,50] × contamination=[0.05,0.1,0.15,0.2]
  • Scoring: Balanced accuracy on validation (healthy specificity + fault sensitivity)
  • Use case: When you have both healthy AND fault examples
  • Advantage: Better parameter selection, higher accuracy
  • Learning type: Semi-supervised (train unsupervised, tune with labels)

CRITICAL: This is NOT supervised learning!

  • Model fit() is called ONLY on healthy data (no labels)
  • Fault data is used ONLY after training for hyperparameter selection
  • This is called "semi-supervised" because labels are used indirectly for tuning

LLM Recommendation Strategy:

User provides only healthy signals:
  → LLM: "I can train in unsupervised mode with automatic parameters. 
          Do you have any fault examples? Including them would enable 
          semi-supervised hyperparameter tuning for better performance."

User provides healthy + fault signals:
  → LLM: "Training in SEMI-SUPERVISED mode with hyperparameter optimization. 
          Model will be trained only on healthy data, then parameters will be 
          tuned using validation set for optimal anomaly detection."

Step 2: Review training results

The tool will return different information based on training mode:

SEMI-SUPERVISED Mode Output (with fault validation):

{
  "model_path": "models/bearing_health_model_model.pkl",
  "scaler_path": "models/bearing_health_model_scaler.pkl",
  "pca_path": "models/bearing_health_model_pca.pkl",
  "training_mode": "semi-supervised",
  "training_samples": 190,
  "features_per_sample": 17,
  "pca_components": 8,
  "pca_variance_explained": 0.956,
  "segment_duration": 0.1,
  "overlap_ratio": 0.5,
  "best_params": {
    "kernel": "rbf",
    "nu": 0.05,
    "gamma": 0.01
  },
  "validation_accuracy": 0.93,
  "validation_details": "Healthy: 46/48 correct (95.8%), Fault: 54/59 detected (91.5%)",
  "validation_metrics": {
    "healthy_correct": 46,
    "healthy_total": 48,
    "healthy_accuracy": 0.958,
    "fault_detected": 54,
    "fault_total": 59,
    "fault_accuracy": 0.915,
    "overall_accuracy": 0.93
  }
}

UNSUPERVISED Mode Output (no fault validation):

{
  "model_path": "models/bearing_health_model_model.pkl",
  "scaler_path": "models/bearing_health_model_scaler.pkl",
  "pca_path": "models/bearing_health_model_pca.pkl",
  "training_mode": "unsupervised",
  "training_samples": 120,
  "features_per_sample": 17,
  "pca_components": 8,
  "pca_variance_explained": 0.956,
  "segment_duration": 0.1,
  "overlap_ratio": 0.5,
  "best_params": {
    "kernel": "rbf",
    "nu": 0.0913,
    "gamma": "scale",
    "mode": "unsupervised_auto"
  }
}

Interpretation:

  • ✅ Model trained on 190 segments (80% used, 20% reserved for validation)
  • ✅ 17 time-domain features extracted per segment
  • ✅ PCA reduced to 8 components (95.6% variance retained)
  • SUPERVISED: Parameters optimized via grid search → nu=0.05, gamma=0.01
  • UNSUPERVISED: Parameters auto-calculated → nu=0.0913 (based on √n), gamma='scale'
  • Validation (supervised only):
    • Healthy validation: 46/48 correct (95.8%) - Low false positive rate ✓
    • Fault detection: 54/59 detected (91.5%) - High sensitivity ✓
    • Overall balanced accuracy: 93% (combines both metrics)

Important: LLM must inform user about:

  • Training mode: "Training in SUPERVISED mode with GridSearchCV" or "Training in UNSUPERVISED mode with automatic parameters"
  • Validation strategy: "Using 80/20 split: 80% for training, 20% healthy + fault signals for validation"
  • Sampling rate detection: "Auto-detected 97656 Hz from baseline_1_metadata.json"
  • Number of training files vs number of segments
  • Segment duration used (0.1s for ML, not 1.0s for FFT!)
  • Signal durations (baseline_1: 6.0s, baseline_2: 6.0s → total 12.0s → 238 segments → 190 training + 48 validation)
  • Parameter optimization: "Tested 20 parameter combinations, best: nu=0.05, gamma=0.01" (supervised) or "Auto-calculated nu=0.0913 based on sample size" (unsupervised)
  • Balanced validation: "Tested on BOTH healthy (specificity) and fault (sensitivity) data"
  • If metadata missing: "ERROR: No metadata found - please provide sampling_rate or create metadata file"

Step 3: Predict anomalies in test data

Test the model on unseen data:

Predict anomalies in real_test/baseline_3.csv using bearing_health_model

Expected: Should classify as healthy (low anomaly ratio)

Step 4: Review predictions for healthy test signal

Results:

{
  "num_segments": 30,
  "anomaly_count": 2,
  "anomaly_ratio": 0.067,
  "overall_health": "Healthy",
  "confidence": "High",
  "signal_duration": 6.0,
  "segment_duration": 0.2,
  "predictions": [1, 1, 1, 1, 1, 1, 1, -1, 1, 1, ...],
  "anomaly_scores": [-0.23, -0.18, -0.15, ..., 0.82, -0.12, ...]
}

Interpretation:

  • ✅ 30 segments analyzed (6.0s signal / 0.2s segments)
  • ✅ 2 anomalous segments detected (6.7% of signal)
  • Overall: Healthy (< 10% anomalies)
  • Action: Continue normal operation

Step 5: Test on faulty bearing

Predict anomalies in real_test/InnerRaceFault_vload_6.csv using bearing_health_model

Expected: Should detect high anomaly ratio

Results:

{
  "num_segments": 30,
  "anomaly_count": 24,
  "anomaly_ratio": 0.80,
  "overall_health": "Faulty",
  "confidence": "High",
  "signal_duration": 6.0,
  "segment_duration": 0.2
}

Interpretation:

  • ✅ 30 segments analyzed
  • ⚠️ 24 anomalous segments (80% of signal)
  • 🔴 Overall: FAULTY (> 30% anomalies)
  • 🔴 Action: Immediate inspection and bearing replacement

Step 6: Test on different fault type

Predict anomalies in real_test/OuterRaceFault_vload_6.csv using bearing_health_model

Expected: Should also detect as faulty

Results show high anomaly ratio → Model generalizes to different fault types!

Step 7: Visualize PCA Space (NEW TOOL!)

To understand how the model separates healthy from faulty bearings in 2D space:

Option A - Predictions Only (No Ground Truth):

Generate PCA visualization report for bearing_health_model with test signals:
- real_test/baseline_3.csv
- real_test/InnerRaceFault_vload_6.csv
- real_test/OuterRaceFault_vload_6.csv

Option B - With Validation (Ground Truth Provided):

Generate PCA visualization report for bearing_health_model with test signals:
- real_test/baseline_3.csv
- real_test/InnerRaceFault_vload_6.csv
- real_test/OuterRaceFault_vload_6.csv
And true labels:
- baseline_3.csv: healthy
- InnerRaceFault_vload_6.csv: faulty
- OuterRaceFault_vload_6.csv: faulty

This creates an interactive HTML report showing:

  • 2D PCA scatter plot (PC1 vs PC2)
  • Green dots: Segments predicted as healthy
  • Red X markers: Segments predicted as anomaly
  • Variance explained by each component
  • Hover information with segment details

IMPORTANT:

  • Without true_labels: Legend shows "Predicted: Healthy/Anomaly" (model predictions only)
  • With true_labels: Legend shows "True: X, Predicted: Y" for validation + accuracy metrics
  • Never assumes predictions = ground truth
  • Use validation mode to check model accuracy against known labels

Validation Metrics (when true_labels provided):

  • Overall accuracy across all segments
  • Per-file accuracy breakdown
  • Shows which files had prediction errors

The report is saved to reports/pca_visualization_bearing_health_model.html

Step 8: Compare Features with Violin Plots (NEW TOOL!)

To understand which time-domain features distinguish healthy from faulty bearings:

Generate feature comparison report with signal groups:
- Healthy: ["real_train/baseline_1.csv", "real_train/baseline_2.csv"]
- Inner Fault: ["real_train/InnerRaceFault_vload_1.csv", "real_train/InnerRaceFault_vload_2.csv"]
- Outer Fault: ["real_train/OuterRaceFault_1.csv", "real_train/OuterRaceFault_2.csv"]
Use segment_duration: 0.1 seconds

This creates an interactive HTML report showing:

  • Violin plots for all 17 time-domain features
  • Color-coded by signal group (Healthy, Inner Fault, Outer Fault)
  • Distribution comparison showing which features are most discriminative
  • Grid layout (3 columns × N rows)

The report is saved to reports/feature_comparison_Healthy_vs_Inner_Fault_vs_Outer_Fault.html

Key insights from feature comparison:

  • Kurtosis and Crest Factor: Higher in faulty bearings (impulsive signals)
  • RMS: Generally higher in faulty bearings
  • Entropy: Often lower in faulty bearings (more periodic)
  • Shape Factor: Can distinguish between fault types

Expected Outcome

Automated monitoring system that:

  • Detects anomalies using only healthy baseline training data
  • Provides early warning (detects faults before failure)
  • Generalizes to different fault types (inner race, outer race)
  • Uses real experimental bearing data
  • Supports multiple sampling rates (each file auto-detected)
  • Works without specifying sampling rate (metadata required)
  • Analyzes signals in 0.1s segments (ML default)

Model Performance:

  • Training: 2 healthy baselines (baseline_1, baseline_2)
  • Balanced validation: Tests BOTH healthy (specificity) and fault (sensitivity) data
  • Validation accuracy: ~93% overall (95.8% healthy correct, 91.5% fault detected)
  • Testing: Correctly identifies healthy and faulty test bearings
  • Low false positive rate: 95.8% of healthy segments correctly classified

Visualization Tools:

  • PCA scatter plots: See how model separates healthy from faulty in 2D space
  • Violin plots: Compare 17 time-domain features across groups
  • Both saved as interactive HTML reports

LLM Communication:

  • Informed user about sampling rate detection per file
  • Explained training data size (190 training + 48 healthy validation + 59 fault validation)
  • Clarified validation strategy: "80/20 split + fault validation for balanced metrics"
  • Reported both specificity (healthy accuracy) and sensitivity (fault detection)
  • Handled multi-rate training automatically

When to Use ML Approach

Use Case Traditional Analysis ML Approach
Single machine diagnosis ✅ Better (interpretable) ⚠️ Overkill
Fleet monitoring (>5 machines) ⚠️ Labor intensive ✅ Automated
Trend detection ⚠️ Manual comparison ✅ Automatic alerts
Unknown fault patterns ❌ May miss novel faults ✅ Detects deviations
Root cause diagnosis ✅ Clear (FFT, Envelope) ❌ Black box
Early warning ⚠️ Requires expertise ✅ Automated screening

Recommendation: Use ML for screening and monitoring, then use traditional analysis (FFT, Envelope, ISO) for root cause diagnosis when anomalies are detected.

Critical Guidelines for LLM

  1. Always handle sampling rate correctly (CRITICAL UPDATE!):

    • ML tools (train_anomaly_model, extract_features_from_signal): Auto-detect PER FILE from metadata
    • Each training/validation file can have different sampling rate!
    • FFT/Envelope/ISO: Auto-detect from signal's own metadata
    • If metadata exists → "Auto-detected sampling rate: 97656 Hz from baseline_1_metadata.json"
    • If metadata missing → RAISE ERROR requesting user to provide sampling_rate or create metadata
    • NEVER use silent default - always inform or error
    • User can override by providing explicit sampling_rate parameter (applies to all files)
  2. Multi-rate training communication:

    • "Auto-detected sampling rates: baseline_1 (97656 Hz), baseline_2 (97656 Hz), InnerFault (48828 Hz)"
    • "Training model with files at different sampling rates - features extracted correctly per file"
    • "Model saved with 'multi_rate_training': true in metadata"
  3. Training mode selection (NEW - CRITICAL!):

    • Ask user if they have fault data: "Do you have any fault signal examples? Including them enables supervised optimization for better accuracy."
    • If user provides ONLY healthy signals:
      • "Training in UNSUPERVISED mode (novelty detection)"
      • "Using automatic parameters: nu=0.0913 (adaptive), gamma='scale'"
      • "Parameters calculated based on your sample size (120 segments)"
    • If user provides healthy + fault signals:
      • "Training in SUPERVISED mode with GridSearchCV optimization"
      • "Testing 20 parameter combinations to find the best for your data..."
      • "Best parameters found: nu=0.05, gamma=0.01 (validation accuracy: 95%)"
    • Always inform about training mode before starting
  4. Parameter optimization communication:

    • SUPERVISED: "Optimizing parameters via grid search... tested nu × gamma combinations"
    • UNSUPERVISED: "Using automatic parameters based on sample size: nu = 1/√n = 0.0913"
    • Explain WHY: "Supervised mode finds better parameters when you have fault examples"
  5. Always request necessary information:

    • If bearing frequencies needed → "Please provide BPFO, BPFI, BSF, FTF"
    • For ISO evaluation → "What machine group? (1 or 2 only)"
    • If no metadata AND no sampling_rate → "Please provide sampling_rate or create metadata file"
    • NEW: "Do you have fault signal examples? They would enable supervised parameter optimization"
  6. Communicate segment analysis:

    • "Signal is X.X seconds long"
    • "Analyzing in 0.1s segments (ML default) → N segments"
    • "Using 50% overlap → total M segments for training"
  7. Explain model training:

    • "Auto-detected sampling rates from metadata files"
    • "Training on N files → M total segments"
    • "17 time-domain features per segment"
    • "PCA reduced to K components (X% variance)"
    • NEW: Training mode and parameter strategy
  8. Interpret results with context:

    • "Anomaly ratio 5% → Healthy (threshold: 10%)"
    • "Anomaly ratio 80% → Faulty (threshold: 30%)"
    • "Recommend traditional analysis to identify fault type"
  9. Use new visualization tools:

    • After training: "Would you like to visualize the PCA space?"
    • For feature analysis: "Generate feature comparison report to see which features distinguish healthy from faulty"
    • Reports saved as interactive HTML in reports/ directory
  10. PCA visualization labels (CRITICAL!):

  • WITHOUT true_labels: Labels show "Predicted: Healthy/Anomaly" (model predictions only)
  • WITH true_labels: Labels show "True: X, Predicted: Y" + accuracy metrics
  • NEVER assume predictions = ground truth unless user explicitly provides true_labels
  • Communication: "The PCA plot shows model PREDICTIONS. Provide true_labels for validation."
  • Example validation: "Overall accuracy: 95% (114/120 segments correct)"

Example 7: Machine Documentation Reader

Objective

Extract bearing specifications and technical data from equipment manuals to enable accurate fault diagnosis.

Scenario

You have a vibration signal showing potential bearing faults, but you don't know the bearing specifications (type, geometry, characteristic frequencies). You have the equipment manual as a PDF. You need to extract this information to perform accurate diagnosis.

Test Manual Available

A test pump manual is included in resources/machine_manuals/:

  • File: test_pump_manual.pdf (also available as .txt for testing)
  • Content: Complete pump specifications including:
    • Bearings: SKF 6205-2RS (drive end), NSK 6206 (non-drive end)
    • Bearing geometry: 9 balls, ball diameter, pitch diameter
    • Operating speeds: 1475 RPM (rated), 3000 RPM (max)
    • Power: 15 kW, 20 HP
    • Mechanical seal: Type 21
    • Maintenance schedules

Step-by-Step

Step 1: List available manuals

List all available machine manuals

Expected response:

{
  "manuals": [
    {
      "filename": "test_pump_manual.pdf",
      "type": "PDF",
      "size_mb": 0.08,
      "uri": "manual://read/test_pump_manual.pdf"
    }
  ],
  "total_manuals": 1
}

Step 2: Extract structured specifications

Extract specifications from test_pump_manual.pdf

IMPORTANT - LLM Guidelines:

  • The tool returns ONLY data extracted from the manual
  • DO NOT add information not present in the extraction results
  • If a specification is not found, state "Not found in manual"
  • DO NOT invent bearing geometries or frequencies

Response:

{
  "manual_file": "test_pump_manual.pdf",
  "bearings": [
    "SKF 6205-2RS",
    "NSK 6206",
    "6205",
    "6206"
  ],
  "rpm_values": [1475.0, 3000.0],
  "power_ratings": "15.0 KW, 20.0 HP",
  "text_excerpt": "CENTRIFUGAL PUMP MANUAL\nModel: CP-150..."
}

Interpretation:

  • ✅ Found 2 bearing types: SKF 6205-2RS (drive end), NSK 6206 (non-drive end)
  • ✅ Operating speeds: 1475 RPM (rated), 3000 RPM (maximum)
  • ✅ Power ratings: 15 kW (20 HP)

Step 3: Calculate bearing frequencies (Option A - Known Geometry)

If bearing geometry is in the manual:

Calculate bearing frequencies for SKF 6205 at 1475 RPM with:
- Number of balls: 9
- Ball diameter: 7.94 mm
- Pitch diameter: 34.55 mm
- Contact angle: 0° (deep groove bearing)

IMPORTANT - LLM Guidelines:

  • This tool REQUIRES exact bearing geometry
  • DO NOT guess or estimate geometry
  • ONLY calculate with geometry from manual or user
  • If geometry unknown, suggest checking manual or catalog

Result:

{
  "shaft_frequency_hz": 24.58,
  "BPFO": 85.20,
  "BPFI": 136.05,
  "BSF": 101.32,
  "FTF": 9.47
}

Step 4: Calculate frequencies (Option B - Catalog Lookup)

If geometry is NOT in manual but bearing designation is known:

Look up bearing 6205 in catalog and calculate frequencies at 1475 RPM

The system has a local catalog with common bearings (6205, 6206):

{
  "6205": {
    "num_balls": 9,
    "ball_diameter_mm": 7.94,
    "pitch_diameter_mm": 34.55,
    "bore_mm": 25,
    "outer_diameter_mm": 52,
    "width_mm": 15
  }
}

Step 5: Read full manual for additional context

For questions not answered by structured extraction:

Read the maintenance section from test_pump_manual.pdf

IMPORTANT - LLM Guidelines:

  • Base ALL answers EXCLUSIVELY on returned text
  • DO NOT add information not in the text
  • If not found, state "Not found in manual"
  • ALWAYS cite: "According to test_pump_manual.pdf..."

Response (excerpt):

MAINTENANCE SCHEDULE
-------------------
- Bearing lubrication: Every 6 months (lithium-based grease)
- Mechanical seal inspection: Every 3 months
- Impeller check: Annually
- Alignment check: Every 12 months
...

Step 6: Answer specific questions

User can now ask:

"What type of mechanical seal is used?"
→ "According to test_pump_manual.pdf, the pump uses a Type 21 mechanical seal 
   with carbon/ceramic seal faces."

"How often should I lubricate the bearings?"
→ "According to the manual, bearing lubrication should be performed every 
   6 months using lithium-based grease."

"How many impeller vanes?"
→ "According to test_pump_manual.pdf, the impeller is a closed-type bronze 
   impeller with 5 vanes."

Step 7: Complete workflow integration

Now combine manual data with signal analysis:

1. Extract bearing info from manual: SKF 6205-2RS at 1475 RPM
2. Calculate characteristic frequencies: BPFO = 85.20 Hz, BPFI = 136.05 Hz
3. Analyze vibration signal: real_train/OuterRaceFault_1.csv
4. Compare envelope spectrum peaks with bearing frequencies
5. Diagnose: Peak at 81.125 Hz matches BPFO → Outer race fault confirmed

Architecture: Hybrid Approach

The system uses 3 complementary methods:

1. MCP Resources (Primary - Full Text Access)

"Read the pump manual and tell me about the bearings"
  • Claude reads FULL PDF text directly via manual://read/{filename}
  • Can answer ANY question (not limited to pre-defined patterns)
  • Understands context: "nominal" vs "maximum" RPM, "drive end" vs "non-drive end"

2. Structured Extraction (Hints - Fast Lookup)

extract_manual_specs("test_pump_manual.pdf")
  • Regex-based extraction provides quick hints
  • Bearings: SKF 6205-2RS, NSK 6206
  • RPM: 1475, 3000
  • Power: 15 kW, 20 HP
  • Cached for repeated access

3. Catalog Lookup (Fallback - Missing Geometry)

lookup_bearing_in_catalog("6205")
  • Local database with common bearings
  • Provides geometry when not in manual
  • Extensible: add more bearings to catalog

Expected Outcome

Zero-knowledge diagnosis enabled

  • No need to know bearing type beforehand
  • Extract specs automatically from manual
  • Calculate frequencies from geometry
  • Complete diagnostic workflow

Flexible question answering

  • ANY question about manual content
  • Not limited to pre-defined patterns
  • Multiple RPM values handled correctly
  • Context-aware (rated vs maximum speed)

LLM stays grounded in data

  • Answers based ONLY on manual content
  • No hallucinations or assumptions
  • Clear when information is missing
  • Always cites source: "According to manual..."

Critical Guidelines for LLM

  1. NEVER invent specifications:

    • If bearing geometry not in manual → "Bearing geometry not found in manual. Check manufacturer catalog or measure physically."
    • If RPM not specified → "Operating speed not found in manual."
    • DO NOT use "typical" or "standard" values without user confirmation
  2. ALWAYS cite the manual:

    • ✅ "According to test_pump_manual.pdf, the bearing is SKF 6205-2RS"
    • ❌ "The bearing is probably a 6205 based on the pump size"
  3. Distinguish between different RPM values:

    • Manual may list: rated speed, maximum speed, minimum speed
    • Ask user which to use: "Manual shows 1475 RPM (rated) and 3000 RPM (max). Which operating speed should I use?"
  4. Request missing information:

    • "Bearing designation found (SKF 6205-2RS) but geometry not in manual. Should I look it up in the catalog or would you prefer to provide the geometry?"
  5. Use structured extraction as hints, not truth:

    • Regex extraction may find false positives (e.g., page numbers)
    • Always verify by reading full text if critical
    • Structured extraction is for quick screening only
  6. Combine methods intelligently:

    • Start with structured extraction (fast)
    • Use full text for ambiguous cases
    • Fall back to catalog for missing geometry
    • Ask user if catalog doesn't have bearing

Pro Tips

Tip 1: Always Check Metadata First

Before analysis, check if metadata files exist (e.g., baseline_1_metadata.json):

  • Sampling rate: Auto-detected from metadata (no need to specify!)
  • Bearing frequencies: May be provided in metadata
  • Signal duration: Documented in metadata
  • Shaft speed: Often documented
  • LLM will inform: "Auto-detected sampling rate: 97656 Hz from metadata" or "No metadata found, using default 10000 Hz"
  • User can always override with explicit parameters

Tip 2: Understanding Frequency Specifications

Bearing frequencies can be specified in two ways:

  • Absolute Hz (e.g., BPFO = 81.125 Hz) - used in our metadata
  • Multiples of shaft speed (e.g., BPFO = 3.245 × shaft_freq)

Always clarify with user which format they're providing. Our metadata uses absolute Hz values.

Tip 3: Bearing Type is Often Unknown

Don't assume specific bearing type unless explicitly stated. Instead:

  • Use "test bearing" or "bearing under test"
  • Rely on metadata for characteristic frequencies
  • Focus on fault detection, not bearing identification

Tip 4: Segment Duration Awareness

Different tools use different default segment durations:

  • ML tools (train_anomaly_model, extract_features): 0.1s segments (captures transients)
  • FFT/Envelope: 1.0s random segment (good frequency resolution, 1 Hz)
  • ISO 20816-3: Full signal (standard requirement)
  • LLM must always inform user about segment duration used
  • Explain frequency resolution (1 Hz for 1.0s, 10 Hz for 0.1s)
  • Suggest longer segments for low-speed machines
  • Mention signal duration vs analyzed segment duration

Tip 5: ISO 20816-3 Machine Groups

Only Group 1 and Group 2 exist in the standard:

  • Group 1: Large machines (> 300 kW)
  • Group 2: Medium machines (15-300 kW)
  • No Group 3 - if user asks, request clarification on machine power

Tip 6: No PNG Files Generated

Reports are HTML-only:

  • Professional interactive Plotly charts
  • Saved to reports/ directory
  • LLM should NOT claim PNG files are created
  • For static images: user must open HTML and export manually

Tip 7: Combine Multiple Analysis Methods

For best results, use:

  1. ISO 20816-3 → Overall severity assessment
  2. FFT → Identify primary frequencies
  3. Envelope → Bearing fault localization
  4. ML Anomaly Detection → Automated screening

Tip 8: Request Missing Parameters

LLM should inform about auto-detection and only request if truly needed:

  • Sampling rate: "I'll auto-detect from metadata" (rarely needs to ask)
  • Shaft speed: "What is the shaft speed?" (if needed for diagnosis and not in metadata)
  • Bearing frequencies: "Do you have BPFO, BPFI, BSF, FTF?" (if not in metadata)
  • Machine group: "What machine group for ISO? (1 or 2)" (always ask, not in metadata)
  • Segment duration: Use tool-specific defaults (0.1s ML, 1.0s FFT/Envelope)

Tip 9: Trend Analysis

Save analysis results over time to track:

  • RMS velocity trends (ISO evaluation)
  • Peak amplitude changes (FFT/Envelope)
  • Appearance of new frequencies
  • Anomaly ratio progression (ML models)

Tip 10: Safety First

If ISO returns Zone D or envelope shows strong BPFI/BPFO:

  • Stop operation immediately
  • Inspect bearing visually
  • Do not restart until repaired
  • High anomaly ratios (>50%) also warrant immediate inspection

Next Steps

  • Practice: Try these examples with your own signals
  • Learn: Study the diagnostic reasoning in each example
  • Contribute: Share your diagnostic workflows in GitHub Discussions

Need help? Open an issue or start a discussion on GitHub!

Found this useful? ⭐ Star the repository to show support!