refactor: define the geometric distribution via a sum of Dirac masses by EtienneC30 · Pull Request #36378 · leanprover-community/mathlib4

EtienneC30 · 2026-03-08T20:43:54Z

Change the definition of geometricMeasure p to be
Measure.sum (fun n ↦ ENNReal.ofReal ((1 - p) ^ n * p)) • (.dirac n))
instead of using PMF. This allows to directly use API for measures instead of having to develop an API for PMF, which anyway is defined as a sum of Dirac masses.

Also add some results about integrals against the geometricMeasure.

depends on: [Merged by Bors] - feat: integrability with respect to a countable sum of measures #36355

github-actions · 2026-03-08T20:45:00Z

PR summary 9d0ea7dd98

Import changes exceeding 2%

%	File
+45.36%	`Mathlib.Probability.Distributions.Geometric`

Import changes for modified files

Dependency changes

File	Base Count	Head Count	Change
Mathlib.Probability.Distributions.Geometric	1605	2333	+728 (+45.36%)

Import changes for all files

Files	Import difference
`Mathlib.Probability.Distributions.Geometric`	728

Declarations diff

+ _root_.HasSum.isProbabilityMeasure_sum_dirac
+ geometricMeasure_eq
+ geometricMeasure_nonneg
+ geometricMeasure_pos
+ geometricMeasure_real_singleton
+ geometricMeasure_real_singleton_pos
+ geometricMeasure_singleton
+ hasSum_integral_geometricMeasure
+ hasSum_one_geometricMeasure
+ integrable_geometricMeasure_iff
+ integral_geometricMeasure
+ integral_geometricMeasure'

You can run this locally as follows

## summary with just the declaration names:
./scripts/pr_summary/declarations_diff.sh <optional_commit>

## more verbose report:
./scripts/pr_summary/declarations_diff.sh long <optional_commit>

The doc-module for scripts/pr_summary/declarations_diff.sh contains some details about this script.

No changes to technical debt.

You can run this locally as

./scripts/reporting/technical-debt-metrics.sh pr_summary

The relative value is the weighted sum of the differences with weight given by the inverse of the current value of the statistic.
The absolute value is the relative value divided by the total sum of the inverses of the current values (i.e. the weighted average of the differences).

mathlib-dependent-issues · 2026-03-09T14:19:01Z

This PR/issue depends on:

~~[Merged by Bors] - feat: integrability with respect to a countable sum of measures #36355~~
By Dependent Issues (🤖). Happy coding!

DavidLedvinka · 2026-03-09T19:06:09Z

Mathlib/Probability/Distributions/Geometric.lean

+noncomputable def geometricMeasure (p : ℝ) : Measure ℕ := if 0 < p ∧ p ≤ 1
+  then
+    Measure.sum (fun n ↦ ENNReal.ofReal ((1 - p) ^ n * p) • .dirac n)
+  else
+    .dirac 0


I would make the p parameter take values in unitInterval. This is what was done for setBer and so far I have been happy with this decision.

I also wonder if its generalising the definition so we don't have to define a version of the measure for each type. Perhaps you could define it on any Semiring using the emedding of ℕ into any Semiring. I am not sure this is a good idea but I think its worth considering.

The types I could mainly forsee being used are Nat, Int, Real, ENat or ENNReal where the latter two might be relevant in the context of stopping times. For these cases the p = 0 junk value should be dirac Top. You could use a silly trick like adding the sSup class and defining the p = 0 junk value to be dirac of the sup over all elements of Nat casted to the ambient space.

If this causes too much of a hassle then I wouldn't bother but its possible it could avoid a lot of duplication.

I would make the p parameter take values in unitInterval. This is what was done for setBer and so far I have been happy with this decision.

I am a bit reluctant to do that because I don't like having to deal with coercions. For instance it seems more complicated/verbose to prove

lemma geometricMeasure_pos {p : unitInterval} (h1 : p ≠ 0) (h2 : p ≠ 1) (n : ℕ) : 0 < (1 - p : ℝ) ^ n * p :=

especially because I cannot use tactics directly like linarith for instance. I am not sure there is much interest in restricting the parameter, is there? Likewise I am wondering if I should not make the parameter real in #36374.

I am not sure what you mean by "adding the sSup class" but I guess I can just assume SupSet R for the domain since all the interesting cases satisfy it, maybe that's what you meant.

I go back and forth on this but I had a discussion about a related thing a few weeks ago with @b-mehta (where I argued the opposite) and I am coming around to the idea things should take values in their natural type and its up to the API and tactics to accommodate. The reason for this is that I think this leads to nicer statements (you don't have to carry around extra hypothesis, deal with junk values, for example in the setBer case you can say that certain probabilities are monotone or continuous in p rather than monotoneOn or continuousOn). Proofs can be made nicer with tactics but statements can't.

In particular I recently made linarith and Rify work for NNReal. I would definitely keep the other one NNReal and let me know if that causes you any troubles because there should be a way to fix it!

It would also be easy to get these to work for unitInterval but not entirely clear its ideal to just keep special casing things (but maybe it is?). Anyway because I am of two minds I won't push this but its something to think about.

The general pattern in most of mathlib is that it's more convenient to define things with an explicit parameter like this, that way you don't need to add a bunch of lemmas about what happens if you do take a pushforward. (A current change in this way is moving definitions which take in a RingHomClass to instead take in a RingHom). And importantly, it's not the case that we'd need to make a version of this for other types as suggested above.

Ok I'll undo the change. When you say

it's not the case that we'd need to make a version of this for other types as suggested above

Do you mean that we should just always use a push-forward and not write new definitions for different types? This means will need to add lemmas about the push-forward.

This means will need to add lemmas about the push-forward.

I think that if one wants to talk about the geometric distribution in ENat, then doing the Measure.map of the Nat version should suffice, and then the lemmas we'll need split into two categories:

a) lemmas about geometric distribution on Nat
b) lemmas about Measure.map

a) is what you've already provided in this file, and b) are already in mathlib! Perhaps there are a few exceptions, but I think these are few and bounded.

I was going under the assumption that we will want at least a Nat and Real version anyway. Note that arguably the Real version should be the privileged measure (if it weren't for the fact that pullbacks are much worse than push forwards) because you cant talk about Moments, Characteristic Functions, CLT etc... on Nats. Of course we could state these by mapping the measure but this does seem a bit unpleasant to me.

I agree that we might still want a version for Real, and use push-forward for ENat and ENNReal. But if we do want that I think it's ok to duplicate things rather than having one general version that can cause other issues Bhavik pointed out.

Of course we could state these by mapping the measure but this does seem a bit unpleasant to me.

I think this is actually more pleasant! For a very simple example, consider that Nat.factorial could be valued in semirings, but we instead take it to be nat-valued and cast where necessary: the extra polymorphism is "fake" generality.

Mathlib/Probability/Distributions/Geometric.lean

EtienneC30 added 4 commits March 8, 2026 11:49

feat: integrability with respect to a countable sum of measures

4527eaf

fix

7c74b5a

Merge branch 'master' into rfc_geom

2454a9b

refactor: define the geometric distribution via a sum of Dirac masses

c8c176c

EtienneC30 added t-measure-probability Measure theory / Probability theory blocked-by-other-PR This PR depends on another PR (this label is automatically managed by a bot) labels Mar 8, 2026

github-actions bot added the large-import Automatically added label for PRs with a significant increase in transitive imports label Mar 8, 2026

EtienneC30 added 3 commits March 9, 2026 11:53

typo

8173514

author

1828f7d

Merge branch 'master' into rfc_geom

a33458f

mathlib-dependent-issues bot removed the blocked-by-other-PR This PR depends on another PR (this label is automatically managed by a bot) label Mar 9, 2026

DavidLedvinka reviewed Mar 9, 2026

View reviewed changes

EtienneC30 added 4 commits March 9, 2026 20:45

simplify

4d34db3

use Semiring + unitInterval

72359c0

reverse semiring

0253db8

add small lemma

f833591

Conversation

EtienneC30 commented Mar 8, 2026 • edited by RemyDegenne Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR summary 9d0ea7dd98

Import changes for modified files

Declarations diff

Uh oh!

mathlib-dependent-issues bot commented Mar 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EtienneC30 commented Mar 8, 2026 •

edited by RemyDegenne

Loading

github-actions bot commented Mar 8, 2026 •

edited

Loading