Update dev documentation

kefniark · kefniark · commit 251b89f9e7bb · 2022-01-22T18:59:15.000+09:00
diff --git a/.npmignore b/.npmignore
@@ -11,3 +11,6 @@ utils
 .eslintrc
 .prettierignore
 .prettierrc
+
+tsconfig.json
+.gitattributes
diff --git a/Readme.md b/Readme.md
@@ -81,27 +81,8 @@ To summary in one sentence:
 
 [More Benchmark Information](./docs/benchmark.md)
 
---
+---
 
 ## Developer
 
-```sh
-# Install
-yarn
-
-# Build
-yarn build
-
-# Test
-yarn test
-
-# Lint / Auto-fix code style problems
-yarn lint
-
-# Optional, used to generate src/profiles/* data from language dataset
-# Warning: This step is time consuming and require to install big datasets (described in ./docs/dev.md)
-yarn train
-
-# Optional, used to generate benchmark data/bench/*
-yarn bench
-```
+You want to **Contribute** or **Open a PR**, it's recommend to take a look [at the dev documentation](./docs/dev.md)
diff --git a/docs/dev.md b/docs/dev.md
@@ -1,30 +1,60 @@
 # Development
 
-## Setup
-
-To be able to train the model
-
-- Download the [Tatoeba sentence export](https://downloads.tatoeba.org/exports/sentences.tar.bz2)
-- Extract in `data/tatoeba.csv`
-
-- Download the [UDHR](https://unicode.org/udhr/assemblies/udhr_txt.zip)
-- Extract in `data/udhr/`
-
 ## Commands
 
 ```sh
-# install deps
+# Install
 yarn
 
-# train and generate language profiles
-yarn train
-
-# build the library
+# Build
 yarn build
 
-# code style linting
+# Test
+yarn test
+
+# Lint / Auto-fix code style problems
 yarn lint
+```
 
-# test
-yarn test
+---
+
+## Install issues
+
+For the moment the library has lot of dev-dependencies purely for the benchmark process.
+Some of those libraries need to compile native code, which can be problematic (gcc, gyp, python, ...)
+
+If you run into those issues, one of the easiest solution is to remove the problematic dependencies from `package.json` then try again to install.
+
+[like here](https://github.com/komodojp/tinyld/issues/10#issuecomment-1019085476)
+
+It will only cause issue with `yarn bench`, but everything else should still work normally
+
+---
+
+## Optional
+
+### 1. Generate profiles (`yarn train`)
+
+This step require lot of data and time, so it's optional and the result are store directly in git.
+
+This will analyse lot fo text in different language and build statistics to be able to identify the best features for each language
+
+To be able to train the model, you will need first to have the dataset locally
+
+```
+Download Datasets
+ - Download the [Tatoeba sentence export](https://downloads.tatoeba.org/exports/sentences.tar.bz2)
+ - Extract in `data/tatoeba.csv`
+ - Download the [UDHR](https://unicode.org/udhr/assemblies/udhr_txt.zip)
+ - Extract in `data/udhr/`
+
+Run yarn train
+  - For each language, it will build statistics for words and n-grams
+  - This goes through massive amount of data and will take time, prepare few coffee
+
+When your profile files are generated, you can run `yarn build` and you will have a build with those new data
 ```
+
+### 2. Generate benchmark data (`yarn bench`)
+
+This step require a bit of time, it will run lot of different test for a set of libraries to generate the benchmark page and diagrams.
diff --git a/package.json b/package.json
@@ -67,8 +67,7 @@
     "test:unit": "uvu tests",
     "test:dependencies": "yarn audit --level high || echo \"Run 'yarn update' to interactively update dependencies for this project\"",
     "test:lint": "eslint --ext .js,.ts ./ && prettier --config .prettierrc --ignore-path .prettierignore --check \"**/*.{ts,js}\"",
-    "test:types": "tsc --noEmit",
-    "update": "yarn upgrade-interactive"
+    "test:types": "tsc --noEmit"
   },
   "devDependencies": {
     "@types/node": "^16.4.13",