Merge pull request #22843 from UnixJunkie/orf_101

raphael-proust · web-flow · commit 55462e8292e4 · 2024-07-03T15:21:47.000+02:00
new file:   packages/orf/orf.1.0.1/opam
diff --git a/packages/orf/orf.1.0.1/opam b/packages/orf/orf.1.0.1/opam
@@ -0,0 +1,64 @@
+opam-version: "2.0"
+authors: "Francois Berenger"
+maintainer: "unixjunkie@sdf.org"
+homepage: "https://github.com/UnixJunkie/orf"
+bug-reports: "https://github.com/UnixJunkie/orf/issues"
+dev-repo: "git+https://github.com/UnixJunkie/orf.git"
+license: "LGPL-2.1-or-later WITH OCaml-LGPL-linking-exception"
+build: ["dune" "build" "-p" name "-j" jobs]
+depends: [
+  "batteries" {>= "3.2.0"}
+  "cpm" {>= "11.0.0"}
+  "dolog" {>= "4.0.0"}
+  "dune" {>= "2.8"}
+  "minicli"
+  "molenc" {>= "16.15.0"}
+  "ocaml" {>= "4.12"}
+  "parany" {>= "11.0.0"}
+  "line_oriented"
+]
+depopts: [
+  "conf-gnuplot"
+]
+synopsis: "OCaml Random Forests"
+description:"""
+Random Forests (RFs) can do classification or regression modeling.
+
+Random Forests are one of the workhorse of modern machine
+learning. Especially, they cannot over-fit to the training set, are
+fast to train, predict fast, parallelize well and give you a reasonable
+model even without optimizing the model's default hyper-parameters. In
+other words, it is hard to shoot yourself in the foot while training
+or exploiting a Random Forests model. In comparison, with deep neural
+networks it is very easy to shoot yourself in the foot.
+
+Using out of bag (OOB) samples, you can even get an idea of a RFs
+performance, without the need for a held out (test) data-set.
+
+Their only drawback is that RFs, being an ensemble model, cannot predict
+values which are outside of the training set range of values (this is
+a serious limitation in case you are trying to optimize or minimize
+something in order to discover outliers, compared to your training
+set samples).
+
+For the moment, this implementation only consider a sparse vector
+of integers as features. i.e. categorical variables will need to be
+one-hot-encoded.
+For classification, the dependent variable must be an integer
+(encoding a class label).
+For regression, the dependent variable must be a float.
+
+Bibliography
+============
+
+Breiman, Leo. (1996). Bagging Predictors. Machine learning, 24(2),
+123-140.
+
+Breiman, Leo. (2001). Random Forests. Machine learning, 45(1), 5-32.
+
+Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely Randomized
+Trees. Machine learning, 63(1), 3-42."""
+url {
+  src: "https://github.com/UnixJunkie/orf/archive/refs/tags/v1.0.1.tar.gz"
+  checksum: "sha256=7e3977bf99284fca63144dad27bdb5f024e59425188b58246b89bf4770f43791"
+}