new file: packages/orf/orf.1.0.1/opam

UnixJunkie · UnixJunkie · commit a75da9d7478c · 2023-01-05T13:05:57.000+09:00
compiles w/ ocaml-5 and many other versions probably
diff --git a/packages/orf/orf.1.0.1/opam b/packages/orf/orf.1.0.1/opam
@@ -0,0 +1,63 @@
+opam-version: "2.0"
+authors: "Francois Berenger"
+maintainer: "unixjunkie@sdf.org"
+homepage: "https://github.com/UnixJunkie/orf"
+bug-reports: "https://github.com/UnixJunkie/orf/issues"
+dev-repo: "git+https://github.com/UnixJunkie/orf.git"
+license: "LGPL-2.1-or-later WITH OCaml-LGPL-linking-exception"
+build: ["dune" "build" "-p" name "-j" jobs]
+depends: [
+  "batteries" {>= "3.2.0"}
+  "cpm" {>= "6.0.0"}
+  "dolog" {>= "4.0.0"}
+  "dune" {>= "2.8"}
+  "minicli"
+  "ocaml"
+  "parany" {>= "11.0.0"}
+  "line_oriented"
+]
+depopts: [
+  "conf-gnuplot"
+]
+synopsis: "OCaml Random Forests"
+description:"""
+Random Forests (RFs) can do classification or regression modeling.
+
+Random Forests are one of the workhorse of modern machine
+learning. Especially, they cannot over-fit to the training set, are
+fast to train, predict fast, parallelize well and give you a reasonable
+model even without optimizing the model's default hyper-parameters. In
+other words, it is hard to shoot yourself in the foot while training
+or exploiting a Random Forests model. In comparison, with deep neural
+networks it is very easy to shoot yourself in the foot.
+
+Using out of bag (OOB) samples, you can even get an idea of a RFs
+performance, without the need for a held out (test) data-set.
+
+Their only drawback is that RFs, being an ensemble model, cannot predict
+values which are outside of the training set range of values (this is
+a serious limitation in case you are trying to optimize or minimize
+something in order to discover outliers, compared to your training
+set samples).
+
+For the moment, this implementation only consider a sparse vector
+of integers as features. i.e. categorical variables will need to be
+one-hot-encoded.
+For classification, the dependent variable must be an integer
+(encoding a class label).
+For regression, the dependent variable must be a float.
+
+Bibliography
+============
+
+Breiman, Leo. (1996). Bagging Predictors. Machine learning, 24(2),
+123-140.
+
+Breiman, Leo. (2001). Random Forests. Machine learning, 45(1), 5-32.
+
+Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely Randomized
+Trees. Machine learning, 63(1), 3-42."""
+url {
+  src: "https://github.com/UnixJunkie/orf/archive/refs/tags/v1.0.1.tar.gz"
+  checksum: "md5=d41d8cd98f00b204e9800998ecf8427e"
+}