-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
121 lines (93 loc) · 4.79 KB
/
README.Rmd
File metadata and controls
121 lines (93 loc) · 4.79 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
output: github_document
always_allow_html: true
---
# Sparse Policy Tree <img src='logo.png' align="right" height="150">
An R / Rust Package to implement an exhaustive tree search for
policy-learning. Aims to extend and speed up work done with the `policytree`
package.
# Usage
There's just one function, `sparse_policy_tree()`. Use it just as you would the
`policy_tree()` function from the `policytree` package.
Trees aren't guaranteed to be exactly the same as those produced by
`policytree` in that some leaves may be left unpruned (working on this), but
they should give the same predictions.
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(sparsepolicytree)
library(policytree)
```
```{r}
n <- 400
p <- 4
d <- 3
depth <- 2
# Classification task taken from policytree tests
# Continuous X
X <- round(matrix(rnorm(n * p), n, p),2)
colnames(X) <- letters[1:ncol(X)]
Y <- matrix(0, n, d)
best.tree <- policytree:::make_tree(X, depth = depth, d = d)
best.action <- policytree:::predict_test_tree(best.tree, X)
Y[cbind(1:n, best.action)] <- 100 * runif(n)
best.reward <- sum(Y[cbind(1:n, best.action)])
tree <- sparse_policy_tree(X,Y,2)
tree
```
# Installation:
## Installing Rust:
You must have [Rust](https://www.rust-lang.org/tools/install) installed to
compile this package. The rust website provides an excellent installation
script that has never caused me any issues.
On Linux, you can install Rust with:
```{sh, eval=F}
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```
On Windows, I use the rust installation wizard, found
[here](https://forge.rust-lang.org/infra/other-installation-methods.html).
## Installing Package from Github:
Once you install rust, you should be able to install the package with:
```{r, eval=F}
devtools::install_github("Yale-Medicaid/sparsepolicytree")
```
# Benchmarks:
Below are the results from a series of benchmarks to gauge the speed of the
package. These were run on an old 4-core, 8-thread server that I have access
to, and I think should be pretty representative of the speed a user can expect
to see if they run the algorithm on all cores of a modern data-science laptop.
| Number of Observations | Number of Distinct Predictor Values |Number of Predictors| Number of Treatments | Time |
|------------------------|-------------------------------------|--------------------|----------------------|------|
| 10^2 | 2 |30 |20 |0.003s|
| 10^3 | 2 |30 |20 |0.015s|
| 10^4 | 2 |30 |20 |0.15s |
| 10^5 | 2 |30 |20 |1.57s |
| 10^6 | 2 |30 |20 |17s |
| 10^2 | 30 |30 |20 |0.052s|
| 10^3 | 30 |30 |20 |0.240s|
| 10^4 | 30 |30 |20 |3.041s|
| 10^5 | 30 |30 |20 |100s |
| 10^6 | 30 |30 |20 |21 min|
| 10^2 | 10^2 |30 |20 |0.41s |
| 10^3 | 10^3 |30 |20 |40s |
| 10^4 | 10^4 |30 |20 |70 min|
sparse_policy_tree is dominant when the number of values a variable can take is
small (say, under 200), but performs poorly when the number of unique values is
large. For dense variables, you are generally better off using the policytree
package, which is more developed,and will return faster while running on a only
single core.
## Limiting the Number of Threads:
To constrain the number of cores the program uses, you can set the
`RAYON_NUM_THREADS` variable before running the search. At present, this
variable is read at the construction of the multithreading thread pool, and so
must be set once each R session. In a future version, I'll work to include a
fix so that the number of threads can be included in an argument to the
`sparse_policy_tree` function.
Here's an example of how to set the function to run on an single core in R:
```{r, eval=F}
Sys.setenv(RAYON_NUM_THREADS=1)
```