|
| 1 | +opam-version: "2.0" |
| 2 | +authors: "Francois Berenger" |
| 3 | + |
| 4 | +homepage: "https://github.com/UnixJunkie/orf" |
| 5 | +bug-reports: "https://github.com/UnixJunkie/orf/issues" |
| 6 | +dev-repo: "git+https://github.com/UnixJunkie/orf.git" |
| 7 | +license: "LGPL-2.1-or-later WITH OCaml-LGPL-linking-exception" |
| 8 | +build: ["dune" "build" "-p" name "-j" jobs] |
| 9 | +depends: [ |
| 10 | + "batteries" {>= "3.2.0"} |
| 11 | + "cpm" {>= "11.0.0"} |
| 12 | + "dolog" {>= "4.0.0"} |
| 13 | + "dune" {>= "2.8"} |
| 14 | + "minicli" |
| 15 | + "molenc" {>= "16.15.0"} |
| 16 | + "ocaml" {>= "4.12"} |
| 17 | + "parany" {>= "11.0.0"} |
| 18 | + "line_oriented" |
| 19 | +] |
| 20 | +depopts: [ |
| 21 | + "conf-gnuplot" |
| 22 | +] |
| 23 | +synopsis: "OCaml Random Forests" |
| 24 | +description:""" |
| 25 | +Random Forests (RFs) can do classification or regression modeling. |
| 26 | + |
| 27 | +Random Forests are one of the workhorse of modern machine |
| 28 | +learning. Especially, they cannot over-fit to the training set, are |
| 29 | +fast to train, predict fast, parallelize well and give you a reasonable |
| 30 | +model even without optimizing the model's default hyper-parameters. In |
| 31 | +other words, it is hard to shoot yourself in the foot while training |
| 32 | +or exploiting a Random Forests model. In comparison, with deep neural |
| 33 | +networks it is very easy to shoot yourself in the foot. |
| 34 | + |
| 35 | +Using out of bag (OOB) samples, you can even get an idea of a RFs |
| 36 | +performance, without the need for a held out (test) data-set. |
| 37 | + |
| 38 | +Their only drawback is that RFs, being an ensemble model, cannot predict |
| 39 | +values which are outside of the training set range of values (this is |
| 40 | +a serious limitation in case you are trying to optimize or minimize |
| 41 | +something in order to discover outliers, compared to your training |
| 42 | +set samples). |
| 43 | + |
| 44 | +For the moment, this implementation only consider a sparse vector |
| 45 | +of integers as features. i.e. categorical variables will need to be |
| 46 | +one-hot-encoded. |
| 47 | +For classification, the dependent variable must be an integer |
| 48 | +(encoding a class label). |
| 49 | +For regression, the dependent variable must be a float. |
| 50 | + |
| 51 | +Bibliography |
| 52 | +============ |
| 53 | + |
| 54 | +Breiman, Leo. (1996). Bagging Predictors. Machine learning, 24(2), |
| 55 | +123-140. |
| 56 | + |
| 57 | +Breiman, Leo. (2001). Random Forests. Machine learning, 45(1), 5-32. |
| 58 | + |
| 59 | +Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely Randomized |
| 60 | +Trees. Machine learning, 63(1), 3-42.""" |
| 61 | +url { |
| 62 | + src: "https://github.com/UnixJunkie/orf/archive/refs/tags/v1.0.1.tar.gz" |
| 63 | + checksum: "sha256=7e3977bf99284fca63144dad27bdb5f024e59425188b58246b89bf4770f43791" |
| 64 | +} |
0 commit comments