Purity, Seeding and Caches #1442

Hazelfire · 2022-12-06T00:47:14Z

Hazelfire
Dec 6, 2022

This is a proposal that talks about how Squiggle should operate in regards to purity, caching and seeding. The basic proposal is:

Distribution Constructors + sample functions should be impure
All other operations should be pure

This is mainly motivated by allowing squiggle to better cache previous results. In my models, I often compose large lambda functions out of many other lambda functions and call them all at various stages to output intermediate results (example here: https://observablehq.com/@hazelfire/givewells-givedirectly-cost-effectiveness-analysis). Currently, performance bottlenecks are mainly around converting distributions that are not sample sets (particularly beta distributions and point set distributions, apparently the sampling process for beta distributions is particularly slow). It would be beneficial for squiggle to cache intermediate results that it has calculated already from exactly the same inputs. However, this would be difficult to do without a principled approach to purity.

Semantically, it means the following:

baseline = 1 to 10
increase = log(baseline + (2 to 5)) - log(baseline)
increase

Always returns a distribution that has all its mass above 0. I think this is intuitive. It therefore means this is semantically different from:

increase = log((1 to 10) + (2 to 5)) - log((1 to 10))
increase

Which does not necessarily have its probability mass above 0.

How I would intend to implement that is by having every symbolic distribution hold a seed that is used for operations on that symbolic distribution.

I might go one level ahead, and say that the following code:

baseline = 1 to 10
increase = log(PointSet.fromDist(baseline) + (2 to 5)) - log(PointSet.fromDist(baseline))
increase

In this case, the seed would have to be passed on (or converted in a pure form) into the pointset distribution that comes from PointSet.fromDist. Then they would both use the same seed to convert them to the same sample set distribution.

This has the benefit of being able to cache distribution operations if they are being done a lot. For instance, PointSet -> Sample Set conversions could be cached so that the code runs faster if the same operation is done many times.

berekuk · 2022-12-14T19:48:41Z

berekuk
Dec 14, 2022
Maintainer

This is important to get right, and tricky, but so far seems like a really good idea (I haven't thought about it much yet).

Will we eventually need a clone function to reset the seed, if you need an uncorrelated sample set?

If all conversions are cached, then the boundary between different types of distributions becomes pretty vague, it's almost the same object that's used as a pointset or sampleset in a given context.

Maybe we should express this in syntax? log(baseline as PointSet + (2 to 5)) - log(baseline as PointSet) or something like that.

Or maybe we don't want to allow explicit casting to pointsets and samplesets ever? Right now we're in a weird situation where sometimes you can explicitly convert a dist, but also sometimes functions on generic dists guess which conversion is necessary, and sometimes they guess suboptimally. Things would be more straightforward if all functions acted on all dists (when it's possible at all), and guessed the approach, but we could give hints on which approach to choose. (This is different from as PointSet syntax, because in this case we give hints to the function, and don't cast values). I'm having a hard time thinking of a good syntax for this in case of infix arithmetic operations, though; and there might be other reasons why this won't work.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Purity, Seeding and Caches #1442

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Purity, Seeding and Caches #1442

Uh oh!

Uh oh!

Hazelfire Dec 6, 2022

Replies: 1 comment

Uh oh!

Uh oh!

berekuk Dec 14, 2022 Maintainer

Hazelfire
Dec 6, 2022

berekuk
Dec 14, 2022
Maintainer