Skip to content

refactor: define the geometric distribution via a sum of Dirac masses#36378

Open
EtienneC30 wants to merge 11 commits intoleanprover-community:masterfrom
EtienneC30:rfc_geom
Open

refactor: define the geometric distribution via a sum of Dirac masses#36378
EtienneC30 wants to merge 11 commits intoleanprover-community:masterfrom
EtienneC30:rfc_geom

Conversation

@EtienneC30
Copy link
Member

@EtienneC30 EtienneC30 commented Mar 8, 2026

Change the definition of geometricMeasure p to be
Measure.sum (fun n ↦ ENNReal.ofReal ((1 - p) ^ n * p)) • (.dirac n))
instead of using PMF. This allows to directly use API for measures instead of having to develop an API for PMF, which anyway is defined as a sum of Dirac masses.

Also add some results about integrals against the geometricMeasure.


Open in Gitpod

@EtienneC30 EtienneC30 added t-measure-probability Measure theory / Probability theory blocked-by-other-PR This PR depends on another PR (this label is automatically managed by a bot) labels Mar 8, 2026
@github-actions github-actions bot added the large-import Automatically added label for PRs with a significant increase in transitive imports label Mar 8, 2026
@github-actions
Copy link

github-actions bot commented Mar 8, 2026

PR summary 9d0ea7dd98

Import changes exceeding 2%

% File
+45.36% Mathlib.Probability.Distributions.Geometric

Import changes for modified files

Dependency changes

File Base Count Head Count Change
Mathlib.Probability.Distributions.Geometric 1605 2333 +728 (+45.36%)
Import changes for all files
Files Import difference
Mathlib.Probability.Distributions.Geometric 728

Declarations diff

+ _root_.HasSum.isProbabilityMeasure_sum_dirac
+ geometricMeasure_eq
+ geometricMeasure_nonneg
+ geometricMeasure_pos
+ geometricMeasure_real_singleton
+ geometricMeasure_real_singleton_pos
+ geometricMeasure_singleton
+ hasSum_integral_geometricMeasure
+ hasSum_one_geometricMeasure
+ integrable_geometricMeasure_iff
+ integral_geometricMeasure
+ integral_geometricMeasure'

You can run this locally as follows
## summary with just the declaration names:
./scripts/pr_summary/declarations_diff.sh <optional_commit>

## more verbose report:
./scripts/pr_summary/declarations_diff.sh long <optional_commit>

The doc-module for scripts/pr_summary/declarations_diff.sh contains some details about this script.


No changes to technical debt.

You can run this locally as

./scripts/reporting/technical-debt-metrics.sh pr_summary
  • The relative value is the weighted sum of the differences with weight given by the inverse of the current value of the statistic.
  • The absolute value is the relative value divided by the total sum of the inverses of the current values (i.e. the weighted average of the differences).

@mathlib-dependent-issues mathlib-dependent-issues bot removed the blocked-by-other-PR This PR depends on another PR (this label is automatically managed by a bot) label Mar 9, 2026
@mathlib-dependent-issues
Copy link

Comment on lines +40 to +44
noncomputable def geometricMeasure (p : ℝ) : Measure ℕ := if 0 < p ∧ p ≤ 1
then
Measure.sum (fun n ↦ ENNReal.ofReal ((1 - p) ^ n * p) • .dirac n)
else
.dirac 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the p parameter take values in unitInterval. This is what was done for setBer and so far I have been happy with this decision.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if its generalising the definition so we don't have to define a version of the measure for each type. Perhaps you could define it on any Semiring using the emedding of ℕ into any Semiring. I am not sure this is a good idea but I think its worth considering.

The types I could mainly forsee being used are Nat, Int, Real, ENat or ENNReal where the latter two might be relevant in the context of stopping times. For these cases the p = 0 junk value should be dirac Top. You could use a silly trick like adding the sSup class and defining the p = 0 junk value to be dirac of the sup over all elements of Nat casted to the ambient space.

If this causes too much of a hassle then I wouldn't bother but its possible it could avoid a lot of duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the p parameter take values in unitInterval. This is what was done for setBer and so far I have been happy with this decision.

I am a bit reluctant to do that because I don't like having to deal with coercions. For instance it seems more complicated/verbose to prove

lemma geometricMeasure_pos {p : unitInterval} (h1 : p ≠ 0) (h2 : p ≠ 1) (n : ℕ) :
    0 < (1 - p : ℝ) ^ n * p :=

especially because I cannot use tactics directly like linarith for instance. I am not sure there is much interest in restricting the parameter, is there? Likewise I am wondering if I should not make the parameter real in #36374.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean by "adding the sSup class" but I guess I can just assume SupSet R for the domain since all the interesting cases satisfy it, maybe that's what you meant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I go back and forth on this but I had a discussion about a related thing a few weeks ago with @b-mehta (where I argued the opposite) and I am coming around to the idea things should take values in their natural type and its up to the API and tactics to accommodate. The reason for this is that I think this leads to nicer statements (you don't have to carry around extra hypothesis, deal with junk values, for example in the setBer case you can say that certain probabilities are monotone or continuous in p rather than monotoneOn or continuousOn). Proofs can be made nicer with tactics but statements can't.

In particular I recently made linarith and Rify work for NNReal. I would definitely keep the other one NNReal and let me know if that causes you any troubles because there should be a way to fix it!

It would also be easy to get these to work for unitInterval but not entirely clear its ideal to just keep special casing things (but maybe it is?). Anyway because I am of two minds I won't push this but its something to think about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general pattern in most of mathlib is that it's more convenient to define things with an explicit parameter like this, that way you don't need to add a bunch of lemmas about what happens if you do take a pushforward. (A current change in this way is moving definitions which take in a RingHomClass to instead take in a RingHom). And importantly, it's not the case that we'd need to make a version of this for other types as suggested above.

Ok I'll undo the change. When you say

it's not the case that we'd need to make a version of this for other types as suggested above

Do you mean that we should just always use a push-forward and not write new definitions for different types? This means will need to add lemmas about the push-forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means will need to add lemmas about the push-forward.

I think that if one wants to talk about the geometric distribution in ENat, then doing the Measure.map of the Nat version should suffice, and then the lemmas we'll need split into two categories:

a) lemmas about geometric distribution on Nat
b) lemmas about Measure.map

a) is what you've already provided in this file, and b) are already in mathlib! Perhaps there are a few exceptions, but I think these are few and bounded.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going under the assumption that we will want at least a Nat and Real version anyway. Note that arguably the Real version should be the privileged measure (if it weren't for the fact that pullbacks are much worse than push forwards) because you cant talk about Moments, Characteristic Functions, CLT etc... on Nats. Of course we could state these by mapping the measure but this does seem a bit unpleasant to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we might still want a version for Real, and use push-forward for ENat and ENNReal. But if we do want that I think it's ok to duplicate things rather than having one general version that can cause other issues Bhavik pointed out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course we could state these by mapping the measure but this does seem a bit unpleasant to me.

I think this is actually more pleasant! For a very simple example, consider that Nat.factorial could be valued in semirings, but we instead take it to be nat-valued and cast where necessary: the extra polymorphism is "fake" generality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

large-import Automatically added label for PRs with a significant increase in transitive imports t-measure-probability Measure theory / Probability theory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants