You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-09-12-debugging-numeric-comparisons-llms.md
+19-7Lines changed: 19 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -95,9 +95,10 @@ Gemma-2-2B-IT internally represent the correct comparator but the last-layer MLP
95
95
-[3. Part I — Geometry & Emergence](#3-part-i--geometry--emergence)
96
96
-[4. Part II — Readout vs Representation](#4-part-ii--readout-vs-representation)
97
97
-[5. Part III — Causal Edits (Patching & Ablations)](#5-part-iii--causal-edits-patching--ablations)
98
-
-[6. Discussion](#6-discussion)
99
-
-[7. Repro Notes](#7-repro-notes)
100
-
-[8. Limitations & Next Steps](#8-limitations--next-steps)
98
+
-[6. Limitations & Possible Next steps](#6-limitations--possible-next-steps)
99
+
-[7. Appendix](#7-appendix)
100
+
-[8. References](#8-references)
101
+
-[9. Disclaimer](#9-disclaimer)
101
102
102
103
---
103
104
@@ -388,7 +389,7 @@ Results are not very different from the previous approach with similar trends.
388
389
- (e) Work currently analyse only one model **Gemma-2-2b-it**. And also only instruction tuned model. A good study could have been how the last layer of non instruction tuned behaved vs the instruction tuned.
389
390
- (f) Doesn't analyse negative numbers in comparison.
390
391
391
-
## Appendix
392
+
## 7. Appendix
392
393
393
394
### A. Harmful neurons
394
395
@@ -482,16 +483,27 @@ Basically trying out to see, h_j how strongly neuron j is firing and gradient pr
482
483
483
484
Next step would be to multiply it by (1 if Yes else -1) to align by truth.
484
485
486
+
### D. Code & Data Availability
485
487
486
-
## References {#references}
488
+
All code, notebooks, and datasets used in this analysis are available in the `sprint1` branch of the **som_numeric_comparison** repository on GitHub:
- Alain, G., & Bengio, Y. (2016). Understanding intermediate layers using linear classifier probes. arXiv:1610.01644. https://arxiv.org/abs/1610.01644
490
502
491
503
492
504
493
-
## Disclaimer
494
-
I only did this research in ~15 hours so there are lot of things unexplored and the quality of work can be significantly improved. Took a lot more time in writing than I expected (probably around 7 hours to refine ) .
505
+
## 9. Disclaimer
506
+
I only did this research in ~15 hours so there are lot of things unexplored and the quality of work can be significantly improved. Took a lot more time in writing than I expected (probably around 7 hours to refine ) .
0 commit comments