diff --git a/README.md b/README.md index d6113734..2638a516 100644 --- a/README.md +++ b/README.md @@ -49,7 +49,7 @@ This bash script runs the post correction using a pre-trained OCRs should be used, models for these OCR steps are required and must be configured in an according configuration file (see ocrd-tool.json). -Arguments: +This tool accepts the following Arguments: * `--parameter` path to configuration file * `--input-file-grp` name of the master-OCR file group * `--output-file-grp` name of the post-correction file group @@ -62,7 +62,7 @@ This tool is used to align the master OCR with any additional support OCRs. It accepts a comma-separated list of input file groups, which it aligns in order. -Arguments: +This tool accepts the following Arguments: * `--parameter` path to configuration file * `--input-file-grp` comma seperated list of the input file groups; first input file group is the master OCR @@ -72,8 +72,10 @@ Arguments: ### ocrd-cis-train.sh Script to train a model from a list of ground-truth archives (see -ocrd-tool.json) for the post correction. The tool somewhat mimics the -behaviour of other ocrd tools: +ocrd-tool.json) for the post correction. + +The tool somewhat mimics the behaviour of other ocrd tools and accepts +the following Arguments: * `--mets` for the workspace * `--log-level` is passed to other tools * `--parameter` is used as configuration @@ -85,10 +87,12 @@ Helper tool to get the path of the installed data files. Usage: path to th default 3-grams language model file. ### ocrd-cis-wer -Helper tool to calculate the word error rate aligned ocr files. It -writes a simple JSON-formated stats file to the given output file group. +Helper tool to calculate the word error rate of aligned ocr files. It +writes a simple JSON-formated stats file to the given output file +group. -Arguments: +This tool accepts the following Arguments: + * `--parameter` set configuration file * `--input-file-grp` input file group of aligned ocr results with their respective ground truth. * `--output-file-grp` name of the file group for the stats file diff --git a/data/docs/ocrd-cis-align/authors.md b/data/docs/ocrd-cis-align/authors.md new file mode 100644 index 00000000..cbdb0060 --- /dev/null +++ b/data/docs/ocrd-cis-align/authors.md @@ -0,0 +1,5 @@ +# Authors +1. Christoph Weber +2. Florian Fink +3. Robert Sachunsky +4. Tobias Englmeier diff --git a/data/docs/ocrd-cis-align/copyright.md b/data/docs/ocrd-cis-align/copyright.md new file mode 100644 index 00000000..83cbbb55 --- /dev/null +++ b/data/docs/ocrd-cis-align/copyright.md @@ -0,0 +1,22 @@ +# License +MIT License + +Copyright (c) 2018 2018 Centrum für Informations- und Sprachverarbeitung (CIS) + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/data/docs/ocrd-cis-align/description.md b/data/docs/ocrd-cis-align/description.md new file mode 100644 index 00000000..9c68dbec --- /dev/null +++ b/data/docs/ocrd-cis-align/description.md @@ -0,0 +1,5 @@ +# Description of ocrd-cis-align {#description .concept} +Aligns tokens of multiple input file groups to one output file group. +This tool is used to align the master OCR with any additional support +OCRs. It accepts a comma-separated list of input file groups, which +it aligns in order. diff --git a/data/docs/ocrd-cis-align/glossary.xml b/data/docs/ocrd-cis-align/glossary.xml new file mode 100644 index 00000000..02bcee21 --- /dev/null +++ b/data/docs/ocrd-cis-align/glossary.xml @@ -0,0 +1,42 @@ + + + + Glossar + + + diff --git a/data/docs/ocrd-cis-align/inputFormatDescription.md b/data/docs/ocrd-cis-align/inputFormatDescription.md new file mode 100644 index 00000000..767d563c --- /dev/null +++ b/data/docs/ocrd-cis-align/inputFormatDescription.md @@ -0,0 +1 @@ +# Input format {#inputFormatDescription .reference} diff --git a/data/docs/ocrd-cis-align/installation.md b/data/docs/ocrd-cis-align/installation.md new file mode 100644 index 00000000..08ab18a8 --- /dev/null +++ b/data/docs/ocrd-cis-align/installation.md @@ -0,0 +1,4 @@ +# Installation of ocrd-cis-align {#installation .task} +1. Initialize virtualenv: `python3 -m venv path/to/dir` (optional) +2. Install ocrd_cis: `make install` +3. Test the installation: `make test` (optional) diff --git a/data/docs/ocrd-cis-align/name.md b/data/docs/ocrd-cis-align/name.md new file mode 100644 index 00000000..3cd5e368 --- /dev/null +++ b/data/docs/ocrd-cis-align/name.md @@ -0,0 +1 @@ +# ocrd-cis-align diff --git a/data/docs/ocrd-cis-align/option.md b/data/docs/ocrd-cis-align/option.md new file mode 100644 index 00000000..6b07e685 --- /dev/null +++ b/data/docs/ocrd-cis-align/option.md @@ -0,0 +1,8 @@ +# Options for ocrd-cis-align {#option .reference} +This tool accepts the following Arguments: +* `--parameter` path to configuration file +* `--input-file-grp` comma seperated list of the input file groups; +first input file group is the master OCR +* `--output-file-grp` name of the file group for the aligned result +* `--log-level` set log level +* `--mets` path to METS file in workspace diff --git a/data/docs/ocrd-cis-align/outputFormatDescription.md b/data/docs/ocrd-cis-align/outputFormatDescription.md new file mode 100644 index 00000000..89486585 --- /dev/null +++ b/data/docs/ocrd-cis-align/outputFormatDescription.md @@ -0,0 +1 @@ +# Output format {#outputFormatDescription .reference} diff --git a/data/docs/ocrd-cis-align/parameters.md b/data/docs/ocrd-cis-align/parameters.md new file mode 100644 index 00000000..19b1ea55 --- /dev/null +++ b/data/docs/ocrd-cis-align/parameters.md @@ -0,0 +1,5 @@ +# Parameters {#parameters .reference} +The tool ocrd-cis-align accepts the following configuration parameters: +```json +{} +``` diff --git a/data/docs/ocrd-cis-align/release_notes.md b/data/docs/ocrd-cis-align/release_notes.md new file mode 100644 index 00000000..d7de275f --- /dev/null +++ b/data/docs/ocrd-cis-align/release_notes.md @@ -0,0 +1 @@ +# Release notes diff --git a/data/docs/ocrd-cis-align/reporting.md b/data/docs/ocrd-cis-align/reporting.md new file mode 100644 index 00000000..e71d72c5 --- /dev/null +++ b/data/docs/ocrd-cis-align/reporting.md @@ -0,0 +1,2 @@ +# Reporting +Reports any bugs/problems at the [issues page](https://github.com/cisocrgroup/ocrd_cis/issues) diff --git a/data/docs/ocrd-cis-align/tool.md b/data/docs/ocrd-cis-align/tool.md new file mode 100644 index 00000000..8636efca --- /dev/null +++ b/data/docs/ocrd-cis-align/tool.md @@ -0,0 +1,2 @@ +# Tool ocrd-cis-align {#Tool .concept} +Align multiple OCRs and/or GTs diff --git a/data/docs/ocrd-cis-align/topicmap.xml b/data/docs/ocrd-cis-align/topicmap.xml new file mode 100644 index 00000000..a3c13f10 --- /dev/null +++ b/data/docs/ocrd-cis-align/topicmap.xml @@ -0,0 +1,18 @@ + + + + + + + + + + + + + + + + + + diff --git a/data/docs/ocrd-cis-align/troubleshooting.xml b/data/docs/ocrd-cis-align/troubleshooting.xml new file mode 100644 index 00000000..2d113697 --- /dev/null +++ b/data/docs/ocrd-cis-align/troubleshooting.xml @@ -0,0 +1,29 @@ + + + + Troubleshooting + + diff --git a/data/docs/ocrd-cis-post-correct.sh/authors.md b/data/docs/ocrd-cis-post-correct.sh/authors.md new file mode 100644 index 00000000..cbdb0060 --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/authors.md @@ -0,0 +1,5 @@ +# Authors +1. Christoph Weber +2. Florian Fink +3. Robert Sachunsky +4. Tobias Englmeier diff --git a/data/docs/ocrd-cis-post-correct.sh/copyright.md b/data/docs/ocrd-cis-post-correct.sh/copyright.md new file mode 100644 index 00000000..83cbbb55 --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/copyright.md @@ -0,0 +1,22 @@ +# License +MIT License + +Copyright (c) 2018 2018 Centrum für Informations- und Sprachverarbeitung (CIS) + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/data/docs/ocrd-cis-post-correct.sh/description.md b/data/docs/ocrd-cis-post-correct.sh/description.md new file mode 100644 index 00000000..20e543f4 --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/description.md @@ -0,0 +1,5 @@ +# Description of ocrd-cis-post-correct.sh {#description .concept} +This bash script runs the post correction using a pre-trained +[model](http://cis.lmu.de/~finkf/model.zip). If additional support +OCRs should be used, models for these OCR steps are required and must +be configured in an according configuration file (see ocrd-tool.json). diff --git a/data/docs/ocrd-cis-post-correct.sh/glossary.xml b/data/docs/ocrd-cis-post-correct.sh/glossary.xml new file mode 100644 index 00000000..02bcee21 --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/glossary.xml @@ -0,0 +1,42 @@ + + + + Glossar + + + diff --git a/data/docs/ocrd-cis-post-correct.sh/inputFormatDescription.md b/data/docs/ocrd-cis-post-correct.sh/inputFormatDescription.md new file mode 100644 index 00000000..767d563c --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/inputFormatDescription.md @@ -0,0 +1 @@ +# Input format {#inputFormatDescription .reference} diff --git a/data/docs/ocrd-cis-post-correct.sh/installation.md b/data/docs/ocrd-cis-post-correct.sh/installation.md new file mode 100644 index 00000000..4c22504d --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/installation.md @@ -0,0 +1,4 @@ +# Installation of ocrd-cis-post-correct.sh {#installation .task} +1. Initialize virtualenv: `python3 -m venv path/to/dir` (optional) +2. Install ocrd_cis: `make install` +3. Test the installation: `make test` (optional) diff --git a/data/docs/ocrd-cis-post-correct.sh/name.md b/data/docs/ocrd-cis-post-correct.sh/name.md new file mode 100644 index 00000000..570c5151 --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/name.md @@ -0,0 +1 @@ +# ocrd-cis-post-correct.sh diff --git a/data/docs/ocrd-cis-post-correct.sh/option.md b/data/docs/ocrd-cis-post-correct.sh/option.md new file mode 100644 index 00000000..294bf986 --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/option.md @@ -0,0 +1,7 @@ +# Options for ocrd-cis-post-correct.sh {#option .reference} +This tool accepts the following Arguments: +* `--parameter` path to configuration file +* `--input-file-grp` name of the master-OCR file group +* `--output-file-grp` name of the post-correction file group +* `--log-level` set log level +* `--mets` path to METS file in workspace diff --git a/data/docs/ocrd-cis-post-correct.sh/outputFormatDescription.md b/data/docs/ocrd-cis-post-correct.sh/outputFormatDescription.md new file mode 100644 index 00000000..89486585 --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/outputFormatDescription.md @@ -0,0 +1 @@ +# Output format {#outputFormatDescription .reference} diff --git a/data/docs/ocrd-cis-post-correct.sh/parameters.md b/data/docs/ocrd-cis-post-correct.sh/parameters.md new file mode 100644 index 00000000..7c38ab60 --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/parameters.md @@ -0,0 +1,5 @@ +# Parameters {#parameters .reference} +The tool ocrd-cis-post-correct.sh accepts the following configuration parameters: +```json +null +``` diff --git a/data/docs/ocrd-cis-post-correct.sh/release_notes.md b/data/docs/ocrd-cis-post-correct.sh/release_notes.md new file mode 100644 index 00000000..d7de275f --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/release_notes.md @@ -0,0 +1 @@ +# Release notes diff --git a/data/docs/ocrd-cis-post-correct.sh/reporting.md b/data/docs/ocrd-cis-post-correct.sh/reporting.md new file mode 100644 index 00000000..e71d72c5 --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/reporting.md @@ -0,0 +1,2 @@ +# Reporting +Reports any bugs/problems at the [issues page](https://github.com/cisocrgroup/ocrd_cis/issues) diff --git a/data/docs/ocrd-cis-post-correct.sh/tool.md b/data/docs/ocrd-cis-post-correct.sh/tool.md new file mode 100644 index 00000000..e45ae45f --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/tool.md @@ -0,0 +1,2 @@ +# Tool ocrd-cis-post-correct.sh {#Tool .concept} +null diff --git a/data/docs/ocrd-cis-post-correct.sh/topicmap.xml b/data/docs/ocrd-cis-post-correct.sh/topicmap.xml new file mode 100644 index 00000000..a3c13f10 --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/topicmap.xml @@ -0,0 +1,18 @@ + + + + + + + + + + + + + + + + + + diff --git a/data/docs/ocrd-cis-post-correct.sh/troubleshooting.xml b/data/docs/ocrd-cis-post-correct.sh/troubleshooting.xml new file mode 100644 index 00000000..2d113697 --- /dev/null +++ b/data/docs/ocrd-cis-post-correct.sh/troubleshooting.xml @@ -0,0 +1,29 @@ + + + + Troubleshooting + + diff --git a/data/docs/ocrd-cis-train.sh/authors.md b/data/docs/ocrd-cis-train.sh/authors.md new file mode 100644 index 00000000..cbdb0060 --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/authors.md @@ -0,0 +1,5 @@ +# Authors +1. Christoph Weber +2. Florian Fink +3. Robert Sachunsky +4. Tobias Englmeier diff --git a/data/docs/ocrd-cis-train.sh/copyright.md b/data/docs/ocrd-cis-train.sh/copyright.md new file mode 100644 index 00000000..83cbbb55 --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/copyright.md @@ -0,0 +1,22 @@ +# License +MIT License + +Copyright (c) 2018 2018 Centrum für Informations- und Sprachverarbeitung (CIS) + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/data/docs/ocrd-cis-train.sh/description.md b/data/docs/ocrd-cis-train.sh/description.md new file mode 100644 index 00000000..3648ddc3 --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/description.md @@ -0,0 +1,3 @@ +# Description of ocrd-cis-train.sh {#description .concept} +Script to train a model from a list of ground-truth archives (see +ocrd-tool.json) for the post correction. diff --git a/data/docs/ocrd-cis-train.sh/glossary.xml b/data/docs/ocrd-cis-train.sh/glossary.xml new file mode 100644 index 00000000..02bcee21 --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/glossary.xml @@ -0,0 +1,42 @@ + + + + Glossar + + + diff --git a/data/docs/ocrd-cis-train.sh/inputFormatDescription.md b/data/docs/ocrd-cis-train.sh/inputFormatDescription.md new file mode 100644 index 00000000..767d563c --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/inputFormatDescription.md @@ -0,0 +1 @@ +# Input format {#inputFormatDescription .reference} diff --git a/data/docs/ocrd-cis-train.sh/installation.md b/data/docs/ocrd-cis-train.sh/installation.md new file mode 100644 index 00000000..1f53c798 --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/installation.md @@ -0,0 +1,4 @@ +# Installation of ocrd-cis-train.sh {#installation .task} +1. Initialize virtualenv: `python3 -m venv path/to/dir` (optional) +2. Install ocrd_cis: `make install` +3. Test the installation: `make test` (optional) diff --git a/data/docs/ocrd-cis-train.sh/name.md b/data/docs/ocrd-cis-train.sh/name.md new file mode 100644 index 00000000..8ac81f55 --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/name.md @@ -0,0 +1 @@ +# ocrd-cis-train.sh diff --git a/data/docs/ocrd-cis-train.sh/option.md b/data/docs/ocrd-cis-train.sh/option.md new file mode 100644 index 00000000..d4350786 --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/option.md @@ -0,0 +1,7 @@ +# Options for ocrd-cis-train.sh {#option .reference} +The tool somewhat mimics the behaviour of other ocrd tools and accepts +the following Arguments: +* `--mets` for the workspace +* `--log-level` is passed to other tools +* `--parameter` is used as configuration +* `--output-file-grp` defines the output file group for the model diff --git a/data/docs/ocrd-cis-train.sh/outputFormatDescription.md b/data/docs/ocrd-cis-train.sh/outputFormatDescription.md new file mode 100644 index 00000000..89486585 --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/outputFormatDescription.md @@ -0,0 +1 @@ +# Output format {#outputFormatDescription .reference} diff --git a/data/docs/ocrd-cis-train.sh/parameters.md b/data/docs/ocrd-cis-train.sh/parameters.md new file mode 100644 index 00000000..abdb8b55 --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/parameters.md @@ -0,0 +1,5 @@ +# Parameters {#parameters .reference} +The tool ocrd-cis-train.sh accepts the following configuration parameters: +```json +null +``` diff --git a/data/docs/ocrd-cis-train.sh/release_notes.md b/data/docs/ocrd-cis-train.sh/release_notes.md new file mode 100644 index 00000000..d7de275f --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/release_notes.md @@ -0,0 +1 @@ +# Release notes diff --git a/data/docs/ocrd-cis-train.sh/reporting.md b/data/docs/ocrd-cis-train.sh/reporting.md new file mode 100644 index 00000000..e71d72c5 --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/reporting.md @@ -0,0 +1,2 @@ +# Reporting +Reports any bugs/problems at the [issues page](https://github.com/cisocrgroup/ocrd_cis/issues) diff --git a/data/docs/ocrd-cis-train.sh/tool.md b/data/docs/ocrd-cis-train.sh/tool.md new file mode 100644 index 00000000..da241f2e --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/tool.md @@ -0,0 +1,2 @@ +# Tool ocrd-cis-train.sh {#Tool .concept} +null diff --git a/data/docs/ocrd-cis-train.sh/topicmap.xml b/data/docs/ocrd-cis-train.sh/topicmap.xml new file mode 100644 index 00000000..a3c13f10 --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/topicmap.xml @@ -0,0 +1,18 @@ + + + + + + + + + + + + + + + + + + diff --git a/data/docs/ocrd-cis-train.sh/troubleshooting.xml b/data/docs/ocrd-cis-train.sh/troubleshooting.xml new file mode 100644 index 00000000..2d113697 --- /dev/null +++ b/data/docs/ocrd-cis-train.sh/troubleshooting.xml @@ -0,0 +1,29 @@ + + + + Troubleshooting + + diff --git a/data/docs/ocrd-cis-wer/authors.md b/data/docs/ocrd-cis-wer/authors.md new file mode 100644 index 00000000..cbdb0060 --- /dev/null +++ b/data/docs/ocrd-cis-wer/authors.md @@ -0,0 +1,5 @@ +# Authors +1. Christoph Weber +2. Florian Fink +3. Robert Sachunsky +4. Tobias Englmeier diff --git a/data/docs/ocrd-cis-wer/copyright.md b/data/docs/ocrd-cis-wer/copyright.md new file mode 100644 index 00000000..83cbbb55 --- /dev/null +++ b/data/docs/ocrd-cis-wer/copyright.md @@ -0,0 +1,22 @@ +# License +MIT License + +Copyright (c) 2018 2018 Centrum für Informations- und Sprachverarbeitung (CIS) + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/data/docs/ocrd-cis-wer/description.md b/data/docs/ocrd-cis-wer/description.md new file mode 100644 index 00000000..b8ff32e2 --- /dev/null +++ b/data/docs/ocrd-cis-wer/description.md @@ -0,0 +1,4 @@ +# Description of ocrd-cis-wer {#description .concept} +Helper tool to calculate the word error rate of aligned ocr files. It +writes a simple JSON-formated stats file to the given output file +group. diff --git a/data/docs/ocrd-cis-wer/glossary.xml b/data/docs/ocrd-cis-wer/glossary.xml new file mode 100644 index 00000000..02bcee21 --- /dev/null +++ b/data/docs/ocrd-cis-wer/glossary.xml @@ -0,0 +1,42 @@ + + + + Glossar + + + diff --git a/data/docs/ocrd-cis-wer/inputFormatDescription.md b/data/docs/ocrd-cis-wer/inputFormatDescription.md new file mode 100644 index 00000000..767d563c --- /dev/null +++ b/data/docs/ocrd-cis-wer/inputFormatDescription.md @@ -0,0 +1 @@ +# Input format {#inputFormatDescription .reference} diff --git a/data/docs/ocrd-cis-wer/installation.md b/data/docs/ocrd-cis-wer/installation.md new file mode 100644 index 00000000..bd00e27b --- /dev/null +++ b/data/docs/ocrd-cis-wer/installation.md @@ -0,0 +1,4 @@ +# Installation of ocrd-cis-wer {#installation .task} +1. Initialize virtualenv: `python3 -m venv path/to/dir` (optional) +2. Install ocrd_cis: `make install` +3. Test the installation: `make test` (optional) diff --git a/data/docs/ocrd-cis-wer/name.md b/data/docs/ocrd-cis-wer/name.md new file mode 100644 index 00000000..4c318e8c --- /dev/null +++ b/data/docs/ocrd-cis-wer/name.md @@ -0,0 +1 @@ +# ocrd-cis-wer diff --git a/data/docs/ocrd-cis-wer/option.md b/data/docs/ocrd-cis-wer/option.md new file mode 100644 index 00000000..468cc43d --- /dev/null +++ b/data/docs/ocrd-cis-wer/option.md @@ -0,0 +1,8 @@ +# Options for ocrd-cis-wer {#option .reference} +This tool accepts the following Arguments: +* `--parameter` set configuration file +* `--input-file-grp` input file group of aligned ocr results with +their respective ground truth. +* `--output-file-grp` name of the file group for the stats file +* `--log-level` set log level +* `--mets` path to METS file in workspace diff --git a/data/docs/ocrd-cis-wer/outputFormatDescription.md b/data/docs/ocrd-cis-wer/outputFormatDescription.md new file mode 100644 index 00000000..3f6637d1 --- /dev/null +++ b/data/docs/ocrd-cis-wer/outputFormatDescription.md @@ -0,0 +1,10 @@ +# Output format {#outputFormatDescription .reference} + +```json +{ + "totalWords": 3, + "correctWords": 2, + "incorrectWords": 1, + "wordErrorRate": .3 +} +``` diff --git a/data/docs/ocrd-cis-wer/parameters.md b/data/docs/ocrd-cis-wer/parameters.md new file mode 100644 index 00000000..622aadfe --- /dev/null +++ b/data/docs/ocrd-cis-wer/parameters.md @@ -0,0 +1,16 @@ +# Parameters {#parameters .reference} +The tool ocrd-cis-wer accepts the following configuration parameters: +```json +{ + "testIndex": { + "description": "text equiv index for the test/ocr tokens", + "type": "integer", + "default": 0 + }, + "gtIndex": { + "type": "integer", + "description": "text equiv index for the gt tokens", + "default": -1 + } +} +``` diff --git a/data/docs/ocrd-cis-wer/release_notes.md b/data/docs/ocrd-cis-wer/release_notes.md new file mode 100644 index 00000000..d7de275f --- /dev/null +++ b/data/docs/ocrd-cis-wer/release_notes.md @@ -0,0 +1 @@ +# Release notes diff --git a/data/docs/ocrd-cis-wer/reporting.md b/data/docs/ocrd-cis-wer/reporting.md new file mode 100644 index 00000000..e71d72c5 --- /dev/null +++ b/data/docs/ocrd-cis-wer/reporting.md @@ -0,0 +1,2 @@ +# Reporting +Reports any bugs/problems at the [issues page](https://github.com/cisocrgroup/ocrd_cis/issues) diff --git a/data/docs/ocrd-cis-wer/tool.md b/data/docs/ocrd-cis-wer/tool.md new file mode 100644 index 00000000..329b2423 --- /dev/null +++ b/data/docs/ocrd-cis-wer/tool.md @@ -0,0 +1,2 @@ +# Tool ocrd-cis-wer {#Tool .concept} +calculate the word error rate for aligned page xml files diff --git a/data/docs/ocrd-cis-wer/topicmap.xml b/data/docs/ocrd-cis-wer/topicmap.xml new file mode 100644 index 00000000..a3c13f10 --- /dev/null +++ b/data/docs/ocrd-cis-wer/topicmap.xml @@ -0,0 +1,18 @@ + + + + + + + + + + + + + + + + + + diff --git a/data/docs/ocrd-cis-wer/troubleshooting.xml b/data/docs/ocrd-cis-wer/troubleshooting.xml new file mode 100644 index 00000000..2d113697 --- /dev/null +++ b/data/docs/ocrd-cis-wer/troubleshooting.xml @@ -0,0 +1,29 @@ + + + + Troubleshooting + + diff --git a/data/misc/ocrd-cis-dita.sh b/data/misc/ocrd-cis-dita.sh new file mode 100644 index 00000000..27940567 --- /dev/null +++ b/data/misc/ocrd-cis-dita.sh @@ -0,0 +1,214 @@ +#!/bin/bash + +overwrite=false +tools="" +for arg in $*; do + case $arg in + -f|--force) + overwrite=true;; + *) + tools="$arg $tools" + ;; + esac +done + +for tool in $tools; do + dir="data/docs/$tool" + if [[ $overwrite == true ]]; then + rm -rf "$dir" + fi + mkdir -p "$dir" || exit 1 + + # topicmap + cat < "$dir/topicmap.xml" + + + + + + + + + + + + + + + + + + +EOF + + # name + cat < "$dir/name.md" +# $tool +EOF + + # simple description + cat < "$dir/tool.md" +# Tool $tool {#Tool .concept} +$(cat ocrd-tool.json | jq -r ".tools.\"$tool\".description") +EOF + + # parameters + cat < "$dir/parameters.md" +# Parameters {#parameters .reference} +The tool $tool accepts the following configuration parameters: +\`\`\`json +$(cat ocrd-tool.json | jq ".tools.\"$tool\".parameters") +\`\`\` +EOF + + # installation + cat < "$dir/installation.md" +# Installation of $tool {#installation .task} +1. Initialize virtualenv: \`python3 -m venv path/to/dir\` (optional) +2. Install ocrd_cis: \`make install\` +3. Test the installation: \`make test\` (optional) +EOF + + # release notes + cat < "$dir/release_notes.md" +# Release notes +EOF + + # Authors + cat< "$dir/authors.md" +# Authors +1. Christoph Weber +2. Florian Fink +3. Robert Sachunsky +4. Tobias Englmeier +EOF + + # Reporting + cat< "$dir/reporting.md" +# Reporting +Reports any bugs/problems at the [issues page](https://github.com/cisocrgroup/ocrd_cis/issues) +EOF + + # Copyright + echo "# License" > "$dir/copyright.md" + cat LICENSE >> "$dir/copyright.md" + + # input format description + if [[ ! -f "$dir/inputFormatDescription.md" ]]; then + cat< "$dir/inputFormatDescription.md" +# Input format {#inputFormatDescription .reference} +EOF + fi + + # output format description + if [[ ! -f "$dir/outputFormatDescription.md" ]]; then + cat< "$dir/outputFormatDescription.md" +# Output format {#outputFormatDescription .reference} +EOF + fi + + # Troubleshooting + if [[ ! -f "$dir/troubleshooting.xml" ]]; then + cat< "$dir/troubleshooting.xml" + + + + Troubleshooting + + +EOF + fi + + if [[ ! -f "$dir/glossary.xml" ]]; then + cat< "$dir/glossary.xml" + + + + Glossar + + + +EOF + fi + + # generate description and options from README.md + blockn=0 + ofile="" + while read line; do + if echo "$line" | grep $tool > /dev/null; then + # echo "setting blockn=1" + ofile="$dir/description.md" + echo "# Description of $tool {#description .concept}" > "$ofile" + blockn=1 + elif [[ $blockn == 1 ]] && [[ "$line" == "" ]]; then + # echo "setting blockn=2" + ofile="$dir/option.md" + echo "# Options for $tool {#option .reference}" > "$ofile" + blockn=2 + elif [[ $blockn == 2 ]] && [[ "$line" == "" ]]; then + # echo "setting blockn=0" + blockn=0 + elif [[ $blockn == 1 ]] || [[ $blockn == 2 ]]; then + # echo "$blockn $line"; + echo "$line" >> "$ofile" + fi + done < README.md +done diff --git a/ocrd_cis/ocrd-tool.json b/ocrd_cis/ocrd-tool.json index ae762948..b3d4b674 100644 --- a/ocrd_cis/ocrd-tool.json +++ b/ocrd_cis/ocrd-tool.json @@ -342,7 +342,8 @@ "steps": [ "postprocessing/alignment" ], - "description": "Align multiple OCRs and/or GTs" + "description": "Align multiple OCRs and/or GTs", + "parameters": {} }, "ocrd-cis-wer": { "executable": "ocrd-cis-wer",