-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Open
Copy link
Labels
enhancementNew feature or requestNew feature or request
Description
Description
Currently, bootstrap_cube() and calculate_bootstrap_ci() allow uncertainty estimation for groups with very small sample sizes. This can lead to:
- undefined jackknife estimates (BCa acceleration), --> Creates error now
- degenerate or misleading bootstrap distributions,
- confidence intervals that are numerically computable but statistically meaningless.
In particular:
- BCa intervals are mathematically undefined for groups with too few observations.
- Percentile, normal, and basic intervals are technically computable but unreliable when group sizes are very small.
To ensure statistically defensible output, a fail-safe mechanism should be implemented.
Proposed behaviour
Global rule:
- If a group has fewer than 4 original observations,
→ all confidence intervals (perc,bca,norm,basic) must be returned asNA
→ a warning must be issued.
This rule applies consistently across:
- bootstrap generation (
bootstrap_cube()), and - uncertainty estimation (
calculate_bootstrap_ci()).
Implementation notes
-
Group size (
n_group) should be:- computed during bootstrapping, and
- carried through to downstream CI calculations.
-
CI calculation should short-circuit early for invalid groups, rather than:
- failing deep inside BCa acceleration logic, or
- silently returning misleading intervals.
-
BCa-specific failures (e.g. undefined acceleration) should become secondary once the group-size rule is enforced.
Rationale
- Resampling methods cannot create information that is not present in the data.
- For very small groups, uncertainty estimates are dominated by numerical artefacts rather than sampling variability.
- Returning
NAwith an explicit warning is more honest and policy-safe than producing fragile confidence limits.
This behaviour aligns with:
- best practices in bootstrap inference (Efron & Tibshirani),
- reproducibility and transparency requirements for indicator reporting,
- downstream policy and monitoring use cases.
Acceptance criteria
- Groups with
n_group < 4returnNAfor all CI types - A clear warning is issued once per call
- Behaviour is documented
- Unit tests cover small-group cases
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request