Skip to content

Unexpected Gene-Disease Association Counts in Recent medgen/HPOA Releases #1226

@kevinschaper

Description

@kevinschaper

Summary

Several diseases in HPOA genes_to_disease.txt show unexpectedly high gene association counts starting in the August 2025 release. These changes correlate with an unexpected expansion in the upstream MedGen mim2gene_medgen file, where OMIM:615777 expanded from 1 entry to 571 entries between April 2025 and November 2025.

Impact

Affected in recent HPOA releases:

  • OMIM:615777: 571 genes (expected: 1, XYLT1)
  • OMIM:131300: Temporarily showed 571 genes in Aug-Sep 2025 (expected: 1, TGFB1; now corrected in Oct 2025)
  • 22 total diseases with large increases in August 2025 release
  • 2,043 additional gene-disease associations across all affected diseases

Bug Report Origin

A user reported that the Monarch Initiative page for MONDO:0007542 (Camurati-Engelmann disease / OMIM:131300) showed over 500 gene associations when OMIM lists only TGFB1.

Detailed Analysis

OMIM:615777 Timeline

HPOA genes_to_disease.txt:

Date Range Version Gene Count Gene(s)
2023-09-01 to 2025-05-06 Multiple 1 XYLT1
2025-08-11 v2025-08-11 571 Multiple
2025-09-01 v2025-09-01 571 Multiple
2025-10-22 v2025-10-22 571 Multiple

MedGen mim2gene_medgen (upstream source):

Date Number of Entries All Type
2024-10-04 1 phenotype
2025-01-11 1 phenotype
2025-04-14 1 phenotype
2025-11-06 (current) 571 phenotype

OMIM:131300 Timeline (Camurati-Engelmann disease)

HPOA genes_to_disease.txt:

Date Range Version Gene Count Gene(s)
2023-09-01 to 2025-05-06 Multiple 1 TGFB1
2025-08-11 v2025-08-11 571 Multiple
2025-09-01 v2025-09-01 571 Multiple
2025-10-22 v2025-10-22 1 TGFB1

MedGen mim2gene_medgen (upstream source):

Date Number of Entries GeneID Type
All observed versions 1 7040 (TGFB1) phenotype

Other Affected Diseases

Diseases with >100 gene association changes during the analysis period:

Disease ID Min Genes Max Genes Change
OMIM:615777 1 571 570
OMIM:131300 1 571 570
OMIM:180100 1 311 310

Impact Analysis

Comparing HPOA v2025-05-06 (before) to v2025-08-11 (after):

  • 22 diseases gained >10 gene associations
  • Total of 2,043 additional associations across these diseases
  • The specific number 571 appears in both:
    • MedGen's unexpected expansion for OMIM:615777
    • Multiple diseases in HPOA (OMIM:615777, OMIM:131300)

Observed Patterns

  1. MedGen expansion: OMIM:615777 went from 1 entry to 571 entries in MedGen between April 2025 and November 2025

  2. Timing correlation: The 571-gene count first appears in HPOA v2025-08-11, which would have been built after the MedGen expansion

  3. Shared count: The exact number 571 appearing in multiple contexts suggests these issues may be related

  4. Partial recovery: OMIM:131300 returned to the correct value in HPOA v2025-10-22, but OMIM:615777 still shows 571 genes

  5. Type field: All entries we examined in MedGen have type="phenotype" (both the stable single entries and the expanded 571 entries)

Data Sources

Analysis period:

  • HPOA releases: September 2023 - October 2025 (20 releases)
  • MedGen snapshots: October 2024 - November 2025 (6 versions via archive.org)

Files analyzed:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions