Skip to content

Commit 156835d

Browse files
committed
Merge branch 'master' of https://github.com/ahhyoushh/FileSense
2 parents 2696055 + 68caf75 commit 156835d

1 file changed

Lines changed: 3 additions & 3 deletions

File tree

wiki/metrics.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ The RL agent has demonstrated that **Policy C** (Generation Disabled) provides o
118118
119119
### 2.5 Reference Model Comparison
120120

121-
We conducted a head-to-head comparison of three embedding models to determine the optimal balance between speed and accuracy for the FileSense pipeline.
121+
I conducted a head-to-head comparison of three embedding models to determine the optimal balance between speed and accuracy for the FileSense pipeline.
122122

123123
**Models Tested:**
124124
1. **all-mpnet-base-v2** (110M params) - *The previous gold standard*
@@ -139,7 +139,7 @@ We conducted a head-to-head comparison of three embedding models to determine th
139139
* **Robustness:** `bge-base` solved all edge cases where the other models failed (e.g., noisy PDF text extraction in `Ray optics.pdf` and `chem work.pdf`).
140140
* **Confidence:** The similarity distribution shifted significantly higher (0.60+), reducing the system's reliance on fallback mechanisms.
141141

142-
**Conclusion:** We have officially switched the default model to **BAAI/bge-base-en-v1.5** as of Dec 2025.
142+
**Conclusion:** switched the default model to **BAAI/bge-base-en-v1.5** as of Dec 2025.
143143

144144
---
145145

@@ -238,7 +238,7 @@ The consistency of keyword superiority across NCERT and STEM datasets (academic
238238
1. **Dataset Size:** Evaluation limited to <100 files per dataset
239239
2. **Domain Coverage:** Primarily academic content
240240
3. **Language:** English-only evaluation
241-
4. **Model:** Single embedding model tested (all-mpnet-base-v2)
241+
242242

243243
---
244244

0 commit comments

Comments
 (0)