Start the sequence at zero and add scrambling

First of all, thank youfor writing Sobol.jl. I have been learning about Quasi Monte Carlo (**QMC**) recently and I'm so glad that there is a Sobol module in Julia. I want to suggest a fairly significant API change, but I hope to convince you that it's a very good idea.

**1) Include the initial $(0,0,...)$ point in the Sobol sequence**

The first value in the Sobol sequence is zero. [This recent paper by Art Owen](https://arxiv.org/abs/2008.08051) shows that the common practice of dropping the $(0,0,...)$ point (as done in Sobol.jl) ruins the digital net property of Sobol and makes its convergence worse. What makes Sobol a "digital net" is the fact that if you take the first $2^n$ Sobol points (in any dimension) you can partition your unit interval / square / hypercube into $2^k (k \le n)$ even slices and each slice will get its fair share $2^(n-k)$ of the points. This is the key to its value in QMC. If you skip the $(0,0,...)$ and draw $2^n$ points, the result is not a digital net. Let me illustrate:

```julia
using PyPlot

for x in 0:(1/4):1
    for y in 0:(1/4):1
        plot([x,x],[0,1],"C1--")
        plot([0,1],[y,y],"C1--")
    end
end

dim = 2
seq = SobolSeq(dim)

# Include the (0,0) point, as per Owen (2021).
plot([0],[0],"ko")

for j in 1:(N-1)
    local x,y = next!(seq)
    plot([x],[y],"C0o")
end

# The 'extra' point.
x,y = next!(seq)
plot([x],[y],"ro")
```

Notice how Sobol.jl leaves the bottom-left square empty and instead puts two points in another square. So that ruins the digital net, and the convergence. You can fix this by skipping the next $2^n - 1$ so that the *next* $2^n$ points do make a digital net. I suspect that this might be the origin of the common practice of skipping $2^n - 1$ points. This also addresses the concern that $(0,0,...)$ is a corner case that could lead to weird results. But as Owen's paper shows (Figure 1), an even better solution is to scramble the points.

**2) Add support for scrambled Sobol points**

Back in the 90's [Owen proposed scrambling digital nets](https://www.sciencedirect.com/science/article/pii/S0885064X98904873) to create a hybrid method that brings the best features of MC and QMC. In Randomized QMC (RQMC) you scramble the points in a way that preserves the digital net but creates a random instance and removes the corner points like $(0,0,...)$. [Figure 1 of Owen and Rudolf (2021)](https://epubs.siam.org/doi/10.1137/20M1320535) has a nice illustration of MC vs QMC vs RQMC.

Since Owen's proposal, several authors have proposed various ways to scramble the points and have shown empirically that RQMC has better convergence than QMC. But more importantly, RQMC recovers a feature from MC that is lost in QMC: The ability to measure uncertainty. In MC, if you draw $N$ points to compute some integral, you can repeat the experiment many times to measure the variance of that result. This is lost in QMC because the points are not random. But RQMC lets you do something similar by generating multiple scrambles.

A scrambled Sobol is already implemented in [scipy.stats.qmc.Sobol](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.qmc.Sobol.html) and in Owen's [R package (rsobol)](https://artowen.su.domains/code/). Unfortunately I was not able to figure out either method works, so I could not translate them to Julia. It doesn't look complicated at all; I just don't understand SciPy, R, or Sobol direction vectors well enough. They don't seem to be doing exactly the same thing, but they both seem to be doing something short and clever with the direction vectors.

I should mention that the R code from Owen is BSD-licensed, so if you can read R, you can just take his method and convert it to Julia.

Notice that the SciPy and R implementations both default to scrambled points and you have to explicitly ask for unscrambled points.

**3) Don't produce Sobol points one at a time, draw $2^n$ at a time**

At this point I hope I have convinced you that drawing anything other than a power of 2 worth of Sobol points ruins the digital net, and misses what makes Sobol powerful. In Owen's article he says

> Using 1000 points of a Sobol’ sequence may well be less accurate than using 512. 

The SciPy API strongly encourages you to draw $2^n$ points, and while they allow you to draw a different number, they give you a big warning to not do that. Owen's R code goes even farther --- it doesn't have an option to draw anything but a power of two. The input parameters are the exponent and the dimensionality. So I think Sobol.jl should do the same thing. Ditch the `next!()` method and instead draw a power of two.

I realize that this was a long request and that I'm suggesting breaking the API, but I hope I have made a convincing case that this would turn Sobol.jl into a very powerful tool for anyone computing high-dimensional integrals with RQMC methods. Not coincidentally, this is precisely what I am hoping to do with it.

Let me know what you think.

References:

* https://arxiv.org/abs/2008.08051
* https://www.sciencedirect.com/science/article/pii/S0885064X98904873
* https://epubs.siam.org/doi/10.1137/20M1320535
* https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.qmc.Sobol.html
* https://artowen.su.domains/code/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Start the sequence at zero and add scrambling #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Start the sequence at zero and add scrambling #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions