Enhance histopheno endpoint with frequency statistics per bin

## Summary

The `/v3/api/histopheno/{id}` endpoint currently returns only association **counts** per anatomical system bin. We want to enhance it to also return **frequency statistics** derived from two complementary data sources:

1. **Disease-level frequency** — averaged from `frequency_computed_sortable_float` on `DiseaseToPhenotypicFeatureAssociation` records (normalized from `has_count/has_total`, `has_percentage`, or `frequency_qualifier`)
2. **Case-level frequency** — computed as (distinct cases with a phenotype in the bin) / (total cases for the disease), using `CaseToPhenotypicFeatureAssociation` records linked via `CaseToDiseaseAssociation` join

## Motivation

The current histopheno chart shows how many phenotype associations fall into each anatomical system, but says nothing about how *common* those phenotypes are. A bin with 500 associations where each phenotype is "Very rare" tells a different story than one with 500 where each is "Very frequent." Adding frequency data makes the visualization much more informative.

Case-level frequency from phenopacket data provides a complementary, independent signal: what fraction of individual patients with this disease have phenotypes in each system.

## Experimental findings

We validated that Solr's JSON Facet API supports all needed aggregations efficiently. A single Solr request can return both disease-level and case-level stats for all 20 bins using `excludeTags`/`{!tag=...}` to run the case join query in a separate domain from the disease filter.

### Per-bin stats available from Solr

| Stat | Source | Solr function |
|------|--------|---------------|
| `count` | Disease associations | query facet count (existing) |
| `avg_freq` | Disease associations | `avg(frequency_computed_sortable_float)` — only over docs with data |
| `num_with_freq` | Disease associations | `countvals(frequency_computed_sortable_float)` |
| `distinct_cases` | Case associations | `unique(subject)` via join |
| `total_cases` | Case associations | `unique(subject)` on full case result set |

### Example output (MONDO:0020121, Duchenne muscular dystrophy)

```
Bin                    Assoc  AvgFreq  Cases/136  Case%
musculature             2135   48.3%    135/136   99.3%
nervous_system          1042   43.4%     95/136   69.9%
head_neck                608   40.2%     79/136   58.1%
skeletal_system          524   39.2%     91/136   66.9%
eye                      300   42.4%     18/136   13.2%
metabolism_homeostasis   229   60.0%     77/136   56.6%
blood                    204   64.6%     75/136   55.1%
respiratory              161   32.9%     50/136   36.8%
```

### Performance (warmed, local)

| Query type | QTime |
|------------|-------|
| Current (traditional facet) | ~2ms |
| JSON facet with avg + countvals | ~5ms |
| Combined disease + case (single request) | ~12ms |
| Adding percentile/median | +~20ms (t-digest computation) |

Percentile adds ~3x CPU cost. Median is arguably more informative than mean for this data (distribution is bimodal, clustered at qualifier midpoints like 0.05, 0.30, 0.80), but could be made optional if server load is a concern.

## Data coverage

- ~327K of 15.3M association docs have `frequency_computed_sortable_float`
- For a well-annotated disease like Duchenne, ~65-85% of associations per bin have frequency data
- 168K `CaseToPhenotypicFeatureAssociation` records exist across ~8K cases
- Case availability is disease-dependent (136 cases for Duchenne, 0 for many diseases)

## Implementation approach

- Switch from traditional Solr facet queries to the JSON Facet API in `build_histopheno_query`
- Use `excludeTags`/`{!tag=...}` pattern to run disease and case facets in a single request
- Extend `HistoBin` model with optional frequency fields (`frequency_mean`, `frequency_count`, `case_count`, `total_cases`, etc.)
- Case stats naturally return 0/absent for diseases without case data
- Consider whether frequency stats should be opt-in via query parameter or always returned

## Technical notes

- Solr `avg()` automatically excludes documents without the field (averages only over docs with frequency data)
- `countvals()` is a lightweight alternative to nested `{type:query}` for counting docs with a field
- The join query `{!join from=subject to=subject}` links cases to diseases efficiently (~4ms warmed)
- Frequency values cluster at qualifier midpoints (0.05=Occasional, 0.30=Frequent, 0.80=Very frequent, 0.01=Very rare, 1.0=Obligate) mixed with precise fractions from has_count/has_total

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance histopheno endpoint with frequency statistics per bin #1266

Summary

Motivation

Experimental findings

Per-bin stats available from Solr

Example output (MONDO:0020121, Duchenne muscular dystrophy)

Performance (warmed, local)

Data coverage

Implementation approach

Technical notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stat	Source	Solr function
`count`	Disease associations	query facet count (existing)
`avg_freq`	Disease associations	`avg(frequency_computed_sortable_float)` — only over docs with data
`num_with_freq`	Disease associations	`countvals(frequency_computed_sortable_float)`
`distinct_cases`	Case associations	`unique(subject)` via join
`total_cases`	Case associations	`unique(subject)` on full case result set

Query type	QTime
Current (traditional facet)	~2ms
JSON facet with avg + countvals	~5ms
Combined disease + case (single request)	~12ms
Adding percentile/median	+~20ms (t-digest computation)

Enhance histopheno endpoint with frequency statistics per bin #1266

Description

Summary

Motivation

Experimental findings

Per-bin stats available from Solr

Example output (MONDO:0020121, Duchenne muscular dystrophy)

Performance (warmed, local)

Data coverage

Implementation approach

Technical notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions