refactor: define the geometric distribution via a sum of Dirac masses#36378
refactor: define the geometric distribution via a sum of Dirac masses#36378EtienneC30 wants to merge 11 commits intoleanprover-community:masterfrom
Conversation
PR summary 9d0ea7dd98Import changes exceeding 2%
|
| File | Base Count | Head Count | Change |
|---|---|---|---|
| Mathlib.Probability.Distributions.Geometric | 1605 | 2333 | +728 (+45.36%) |
Import changes for all files
| Files | Import difference |
|---|---|
Mathlib.Probability.Distributions.Geometric |
728 |
Declarations diff
+ _root_.HasSum.isProbabilityMeasure_sum_dirac
+ geometricMeasure_eq
+ geometricMeasure_nonneg
+ geometricMeasure_pos
+ geometricMeasure_real_singleton
+ geometricMeasure_real_singleton_pos
+ geometricMeasure_singleton
+ hasSum_integral_geometricMeasure
+ hasSum_one_geometricMeasure
+ integrable_geometricMeasure_iff
+ integral_geometricMeasure
+ integral_geometricMeasure'
You can run this locally as follows
## summary with just the declaration names:
./scripts/pr_summary/declarations_diff.sh <optional_commit>
## more verbose report:
./scripts/pr_summary/declarations_diff.sh long <optional_commit>The doc-module for scripts/pr_summary/declarations_diff.sh contains some details about this script.
No changes to technical debt.
You can run this locally as
./scripts/reporting/technical-debt-metrics.sh pr_summary
- The
relativevalue is the weighted sum of the differences with weight given by the inverse of the current value of the statistic. - The
absolutevalue is therelativevalue divided by the total sum of the inverses of the current values (i.e. the weighted average of the differences).
|
This PR/issue depends on: |
| noncomputable def geometricMeasure (p : ℝ) : Measure ℕ := if 0 < p ∧ p ≤ 1 | ||
| then | ||
| Measure.sum (fun n ↦ ENNReal.ofReal ((1 - p) ^ n * p) • .dirac n) | ||
| else | ||
| .dirac 0 |
There was a problem hiding this comment.
I would make the p parameter take values in unitInterval. This is what was done for setBer and so far I have been happy with this decision.
There was a problem hiding this comment.
I also wonder if its generalising the definition so we don't have to define a version of the measure for each type. Perhaps you could define it on any Semiring using the emedding of ℕ into any Semiring. I am not sure this is a good idea but I think its worth considering.
The types I could mainly forsee being used are Nat, Int, Real, ENat or ENNReal where the latter two might be relevant in the context of stopping times. For these cases the p = 0 junk value should be dirac Top. You could use a silly trick like adding the sSup class and defining the p = 0 junk value to be dirac of the sup over all elements of Nat casted to the ambient space.
If this causes too much of a hassle then I wouldn't bother but its possible it could avoid a lot of duplication.
There was a problem hiding this comment.
I would make the
pparameter take values inunitInterval. This is what was done forsetBerand so far I have been happy with this decision.
I am a bit reluctant to do that because I don't like having to deal with coercions. For instance it seems more complicated/verbose to prove
lemma geometricMeasure_pos {p : unitInterval} (h1 : p ≠ 0) (h2 : p ≠ 1) (n : ℕ) :
0 < (1 - p : ℝ) ^ n * p :=
especially because I cannot use tactics directly like linarith for instance. I am not sure there is much interest in restricting the parameter, is there? Likewise I am wondering if I should not make the parameter real in #36374.
There was a problem hiding this comment.
I am not sure what you mean by "adding the sSup class" but I guess I can just assume SupSet R for the domain since all the interesting cases satisfy it, maybe that's what you meant.
There was a problem hiding this comment.
I go back and forth on this but I had a discussion about a related thing a few weeks ago with @b-mehta (where I argued the opposite) and I am coming around to the idea things should take values in their natural type and its up to the API and tactics to accommodate. The reason for this is that I think this leads to nicer statements (you don't have to carry around extra hypothesis, deal with junk values, for example in the setBer case you can say that certain probabilities are monotone or continuous in p rather than monotoneOn or continuousOn). Proofs can be made nicer with tactics but statements can't.
In particular I recently made linarith and Rify work for NNReal. I would definitely keep the other one NNReal and let me know if that causes you any troubles because there should be a way to fix it!
It would also be easy to get these to work for unitInterval but not entirely clear its ideal to just keep special casing things (but maybe it is?). Anyway because I am of two minds I won't push this but its something to think about.
There was a problem hiding this comment.
The general pattern in most of mathlib is that it's more convenient to define things with an explicit parameter like this, that way you don't need to add a bunch of lemmas about what happens if you do take a pushforward. (A current change in this way is moving definitions which take in a RingHomClass to instead take in a RingHom). And importantly, it's not the case that we'd need to make a version of this for other types as suggested above.
Ok I'll undo the change. When you say
it's not the case that we'd need to make a version of this for other types as suggested above
Do you mean that we should just always use a push-forward and not write new definitions for different types? This means will need to add lemmas about the push-forward.
There was a problem hiding this comment.
This means will need to add lemmas about the push-forward.
I think that if one wants to talk about the geometric distribution in ENat, then doing the Measure.map of the Nat version should suffice, and then the lemmas we'll need split into two categories:
a) lemmas about geometric distribution on Nat
b) lemmas about Measure.map
a) is what you've already provided in this file, and b) are already in mathlib! Perhaps there are a few exceptions, but I think these are few and bounded.
There was a problem hiding this comment.
I was going under the assumption that we will want at least a Nat and Real version anyway. Note that arguably the Real version should be the privileged measure (if it weren't for the fact that pullbacks are much worse than push forwards) because you cant talk about Moments, Characteristic Functions, CLT etc... on Nats. Of course we could state these by mapping the measure but this does seem a bit unpleasant to me.
There was a problem hiding this comment.
I agree that we might still want a version for Real, and use push-forward for ENat and ENNReal. But if we do want that I think it's ok to duplicate things rather than having one general version that can cause other issues Bhavik pointed out.
There was a problem hiding this comment.
Of course we could state these by mapping the measure but this does seem a bit unpleasant to me.
I think this is actually more pleasant! For a very simple example, consider that Nat.factorial could be valued in semirings, but we instead take it to be nat-valued and cast where necessary: the extra polymorphism is "fake" generality.
Change the definition of
geometricMeasure pto beMeasure.sum (fun n ↦ ENNReal.ofReal ((1 - p) ^ n * p)) • (.dirac n))instead of using
PMF. This allows to directly use API for measures instead of having to develop an API forPMF, which anyway is defined as a sum of Dirac masses.Also add some results about integrals against the geometricMeasure.