You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+6-4Lines changed: 6 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,12 +8,14 @@ You can install Sherlock by cloning this repository, and run `pip install .`.
8
8
9
9
10
10
## Demonstration of usage
11
-
The notebooks in `notebooks/` prefixed with `01-data processing.ipynb` and `02-1-train-and-test-sherlock.ipynb` can be used to reproduce the results, and demonstrate the usage of Sherlock (from data preprocessing to model training and evaluation). The `00-WIP-use-sherlock-out-of-the-box.ipynb` notebook demonstrates usage of the readily trained model for a given table (WIP).
11
+
The `00-use-sherlock-out-of-the-box.ipynb` notebook demonstrates usage of the readily trained model for a given table (WIP).
12
+
13
+
The notebooks in `notebooks/` prefixed with `01-data processing.ipynb` and `02-1-train-and-test-sherlock.ipynb` can be used to reproduce the results, and demonstrate the usage of Sherlock (from data preprocessing to model training and evaluation).
12
14
13
15
14
16
## Data
15
17
The raw data (corresponding to annotated table columns) can be downloaded using the `download_data()` function in the `helpers` module.
16
-
This will download 3.6GB of data into the `data` directory. Use the `01-data-preprocessing.ipynb` notebook to preprocess this data. Each column is then represented by a feature vector of dimensions 1x1588. The extracted features per column are based on "paragraph" embeddings (full column), word embeddings (aggregated from each column cell), character count statistics (e.g. average number of "." in a column's cells) and column-level statistics (e.g. column entropy).
18
+
This will download +/- 500MB of data into the `data` directory. Use the `01-data-preprocessing.ipynb` notebook to preprocess this data. Each column is then represented by a feature vector of dimensions 1x1588. The extracted features per column are based on "paragraph" embeddings (full column), word embeddings (aggregated from each column cell), character count statistics (e.g. average number of "." in a column's cells) and column-level statistics (e.g. column entropy).
17
19
18
20
19
21
## The Sherlock model
@@ -36,14 +38,14 @@ The notebook `02-1-train-and-test-sherlock.ipynb` illustrates how Sherlock, as c
36
38
├── model_files <- Files with trained model weights and specification.
37
39
├── sherlock_model.json
38
40
└── sherlock_weights.h5
39
-
41
+
40
42
├── notebooks <- Notebooks demonstrating data preprocessing and train/test of Sherlock.
0 commit comments