-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Summary
Several diseases in HPOA genes_to_disease.txt show unexpectedly high gene association counts starting in the August 2025 release. These changes correlate with an unexpected expansion in the upstream MedGen mim2gene_medgen file, where OMIM:615777 expanded from 1 entry to 571 entries between April 2025 and November 2025.
Impact
Affected in recent HPOA releases:
- OMIM:615777: 571 genes (expected: 1, XYLT1)
- OMIM:131300: Temporarily showed 571 genes in Aug-Sep 2025 (expected: 1, TGFB1; now corrected in Oct 2025)
- 22 total diseases with large increases in August 2025 release
- 2,043 additional gene-disease associations across all affected diseases
Bug Report Origin
A user reported that the Monarch Initiative page for MONDO:0007542 (Camurati-Engelmann disease / OMIM:131300) showed over 500 gene associations when OMIM lists only TGFB1.
Detailed Analysis
OMIM:615777 Timeline
HPOA genes_to_disease.txt:
| Date Range | Version | Gene Count | Gene(s) |
|---|---|---|---|
| 2023-09-01 to 2025-05-06 | Multiple | 1 | XYLT1 |
| 2025-08-11 | v2025-08-11 | 571 | Multiple |
| 2025-09-01 | v2025-09-01 | 571 | Multiple |
| 2025-10-22 | v2025-10-22 | 571 | Multiple |
MedGen mim2gene_medgen (upstream source):
| Date | Number of Entries | All Type |
|---|---|---|
| 2024-10-04 | 1 | phenotype |
| 2025-01-11 | 1 | phenotype |
| 2025-04-14 | 1 | phenotype |
| 2025-11-06 (current) | 571 | phenotype |
OMIM:131300 Timeline (Camurati-Engelmann disease)
HPOA genes_to_disease.txt:
| Date Range | Version | Gene Count | Gene(s) |
|---|---|---|---|
| 2023-09-01 to 2025-05-06 | Multiple | 1 | TGFB1 |
| 2025-08-11 | v2025-08-11 | 571 | Multiple |
| 2025-09-01 | v2025-09-01 | 571 | Multiple |
| 2025-10-22 | v2025-10-22 | 1 | TGFB1 |
MedGen mim2gene_medgen (upstream source):
| Date | Number of Entries | GeneID | Type |
|---|---|---|---|
| All observed versions | 1 | 7040 (TGFB1) | phenotype |
Other Affected Diseases
Diseases with >100 gene association changes during the analysis period:
| Disease ID | Min Genes | Max Genes | Change |
|---|---|---|---|
| OMIM:615777 | 1 | 571 | 570 |
| OMIM:131300 | 1 | 571 | 570 |
| OMIM:180100 | 1 | 311 | 310 |
Impact Analysis
Comparing HPOA v2025-05-06 (before) to v2025-08-11 (after):
- 22 diseases gained >10 gene associations
- Total of 2,043 additional associations across these diseases
- The specific number 571 appears in both:
- MedGen's unexpected expansion for OMIM:615777
- Multiple diseases in HPOA (OMIM:615777, OMIM:131300)
Observed Patterns
-
MedGen expansion: OMIM:615777 went from 1 entry to 571 entries in MedGen between April 2025 and November 2025
-
Timing correlation: The 571-gene count first appears in HPOA v2025-08-11, which would have been built after the MedGen expansion
-
Shared count: The exact number 571 appearing in multiple contexts suggests these issues may be related
-
Partial recovery: OMIM:131300 returned to the correct value in HPOA v2025-10-22, but OMIM:615777 still shows 571 genes
-
Type field: All entries we examined in MedGen have
type="phenotype"(both the stable single entries and the expanded 571 entries)
Data Sources
Analysis period:
- HPOA releases: September 2023 - October 2025 (20 releases)
- MedGen snapshots: October 2024 - November 2025 (6 versions via archive.org)
Files analyzed:
genes_to_disease.txtfrom https://github.com/obophenotype/human-phenotype-ontology/releasesmim2gene_medgenfrom https://ftp.ncbi.nlm.nih.gov/gene/DATA/mim2gene_medgen