Skip to content

Commit f6f9908

Browse files
authored
Update README.md (#13) [skip ci]
* Update README.md * Update README.md * Update README.md * Update README.md * Update README.md
1 parent f64b724 commit f6f9908

File tree

1 file changed

+111
-9
lines changed

1 file changed

+111
-9
lines changed

README.md

Lines changed: 111 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,26 +12,128 @@ Overview
1212

1313
Filecache for stan models
1414

15-
Installation / Usage
15+
Installation
1616
--------------------
1717

18-
This package has not been pushed to pypi, so cannot be installed using `pip`
18+
You can install this package from pypi using pip:
1919

20-
Instead, you can install from pip using "extended" git syntax:
21-
22-
$ pip install git+git://github.com/hammerlab/stancache
20+
$ pip install stancache
2321

2422
Or clone the repo & run setup.py:
2523

2624
$ git clone https://github.com/hammerlab/stancache.git
2725
$ python setup.py install
28-
26+
27+
Introduction
28+
------------
29+
30+
This is a filecache for [pystan](https://pystan.readthedocs.io/en/latest/) models fit to data. Each pystan model fit to data is comprised of two parts - the compiled model code & the result of MCMC sampling of that model given data. Both model compilation & model sampling can be time-consuming operations, so both are cached as separate [pickled](https://docs.python.org/3/library/pickle.html) objects on the filesystem.
31+
32+
This separation allows one to (for example) compile a model once & execute the model several times - caching the result each time. You might be testing the model on different samples of data, or using different initializations or passing in different parameters.
33+
34+
Loading pickled pystan.fit objects into memory is also safer using `cached_stan_fit()` since this will ensure that the compiled model is first unpickled before the fit model.
35+
36+
Getting started
37+
---------------
38+
39+
### Configuratation
40+
41+
The configuration uses python's [configparser](https://docs.python.org/2/library/configparser.html) module, allowing the user to either load a `config.ini` file from disk or set the configuration in code.
42+
43+
`stancache` looks for a default config file to be located in `'~/.stancache.ini'`. You can modify this using `stancache.config.load_config('/another/config/file.ini')`.
44+
45+
Currently, the config settings include
46+
47+
* `CACHE_DIR` (defaults to `.cached_models`)
48+
* `SEED` (seed value passed to `pystan.stan` for reproducible research)
49+
* `SET_SEED` (boolean, whether to set the random.seed, systemwide in addition to stan_seed)
50+
51+
You can use `config.set_value(NAME=value)` to modify a setting.
52+
53+
For example, you might want to set up a shared-nfs-mount containing fitted models among your collaborators:
54+
55+
```python
56+
from stancache import config
57+
config.set_value(CACHE_DIR='/mnt/trial-analyses/cohort1/stancache')
58+
```
59+
60+
An updated list of configuration defaults is available in [defaults.py](https://github.com/hammerlab/stancache/blob/master/stancache/defaults.py)
61+
62+
### Fitting cached models
63+
64+
Once you have configured your settings, you would then use `stancache.cached_stan_fit` to fit your model, like so:
65+
66+
```python
67+
from stancache import stancache
68+
fit1 = stancache.cached_stan_fit(file = '/path/to/model.stan', data=dict(), chains=4, iter=100)
69+
```
70+
71+
The options to `cached_stan_fit` are the same as those to `pystan.stan` (see [pystan.stan documentation](https://pystan.readthedocs.io/en/latest/api.html#pystan.stan)).
72+
73+
Also see `?stancache.cached_stan_fit` for more details.
74+
75+
### Caching other items
76+
77+
The caching is very sensitive to certain things which would change the returned object, such as the sort order of your data elements within the dictionary. But is not sensitive to other things (such as whether you use a file-based stan code or string-based version of same code).
78+
79+
In practice, we find that it can be helpful to cache data-preparation steps, especially when simulating data. There is thus as `stancache.cached()` wrapper function for this purpose, to cache all objects _other_ than `pystan.stan` objects using the same file-cache settings.
80+
81+
A fairly common set-up for us is, for example, to fit a set of models in a distributed execution environment, then review the model results in a set of jupyter notebooks. In this case, in our jupyter notebook we will set a parameter of `cache_only=True` when loading model results into the Jupyter notebook to force a failure if the cache is not available.
82+
2983
Contributing
3084
------------
3185

3286
TBD
3387

34-
Example
35-
-------
88+
Examples
89+
--------
3690

37-
TBD
91+
For example (borrowing from [pystan's docs](https://pystan.readthedocs.io/en/latest/getting_started.html)):
92+
93+
```python
94+
import stancache
95+
96+
schools_code = """
97+
data {
98+
int<lower=0> J; // number of schools
99+
real y[J]; // estimated treatment effects
100+
real<lower=0> sigma[J]; // s.e. of effect estimates
101+
}
102+
parameters {
103+
real mu;
104+
real<lower=0> tau;
105+
real eta[J];
106+
}
107+
transformed parameters {
108+
real theta[J];
109+
for (j in 1:J)
110+
theta[j] <- mu + tau * eta[j];
111+
}
112+
model {
113+
eta ~ normal(0, 1);
114+
y ~ normal(theta, sigma);
115+
}
116+
"""
117+
118+
schools_dat = {'J': 8,
119+
'y': [28, 8, -3, 7, -1, 1, 18, 12],
120+
'sigma': [15, 10, 16, 11, 9, 11, 10, 18]}
121+
122+
# fit model to data
123+
fit = stancache.cached_stan_fit(model_code=schools_code, data=schools_dat,
124+
iter=1000, chains=4)
125+
126+
# load fit model from cache
127+
fit2 = stancache.cached_stan_fit(model_code=schools_code, data=schools_dat,
128+
iter=1000, chains=4)
129+
```
130+
131+
In addition, there are a number of publicly-accessible ipynbs using [stancache](http://github.com/hammerlab/stancache).
132+
133+
These include:
134+
135+
* [survivalstan-examples](http://github.com/jburos/survivalstan-examples)
136+
* [immune-infiltrate-explorations](http://github.com/hammerlab/immune-infiltrate-explorations)
137+
- e.g. [model-single-origin-samples/0.830 model3 by cell_type (n=500).ipynb](http://nbviewer.jupyter.org/github/hammerlab/immune-infiltrate-explorations/blob/master/model-single-origin-samples/0.830%20model3%20by%20cell_type%20%28n%3D500%29.ipynb)
138+
139+
If you know of other examples, please let us know and we will add them to this list.

0 commit comments

Comments
 (0)