You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Convert images and scans to searchable and selectable (and merged) PDFs! The core logic resides in a Python script that extracts all the files from `todo`, transforms them with Tesseract via [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF), and loads them into `done`.
15
+
Merge images and scans into searchable and selectable PDFs! The core logic resides in a Python script that transforms the files with Tesseract via [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF). For information about available options, see the [OCRmyPDF documentation](https://ocrmypdf.readthedocs.io/en/latest).
16
+
17
+
A Bash script is provided to automate the installation of dependencies and the execution of the Python script. The Docker image provides a self-contained virtual environment that runs the Bash script in a container. The Google Colab notebook and GitHub Actions workflow both run the container in the cloud.
16
18
17
19
> [!NOTE]
18
20
> Files in subfolders will be merged in alphabetical order, but will still be available individually.
19
21
20
-
I recommend you use either:
21
-
22
-
- The Bash script, which runs the Python script
23
-
- The Docker image, which runs the Bash script
24
-
- A Google Colab or GitHub Actions server, both of which run the Docker image
25
-
26
-
Read on to find out which is best for you! For more information about the options, see the [OCRmyPDF documentation](https://ocrmypdf.readthedocs.io/en/latest).
27
-
28
22
## Fast Start
29
23
30
-
It's as easy as 1, 2, 3! Get up and going in no time with these options:
24
+
Get up and going in no time with these options:
31
25
32
26
### Cloud: Google Colab Notebook
33
27
34
28
Are you on mobile or simply want an easy and seamless experience?
35
29
36
-
1. Open [Colab](https://colab.research.google.com/github/ipitio/ocr-pdf/blob/master/colab.ipynb)cell in [Chrome](https://stackoverflow.com/a/48777857)
30
+
1. Open [Colab](https://colab.research.google.com/github/ipitio/ocr-pdf/blob/master/colab.ipynb) in [Chrome](https://stackoverflow.com/a/48777857)
37
31
2. Run the cell and follow the prompts
38
-
3. Find the OCR'd files in your [Drive](https://drive.google.com/drive/my-drive)`/ocr-pdf`
32
+
3. Find the PDFs in your [Drive](https://drive.google.com/drive/my-drive)`/ocr-pdf`
39
33
40
34
To add OCRmyPDF options, append them to the `run` command.
41
35
42
36
### Self-hosted
43
37
44
38
Do you want to run it on your own machine, but don't want to clone the repo?
45
39
46
-
1. Ensure you have Docker or Bash and cURL installed
47
-
2. Make a new `pdf` folder and put your files in `pdf/todo`
48
-
3. Run one of the following commands from the parent of `pdf`:
40
+
1. Ensure you have Docker, or Bash and cURL, installed
41
+
2. Make two new nested folders and put your files in them: `pdf/todo/*`
42
+
3. Run one of the following from the outer `pdf` folder:
49
43
50
44
#### Docker Container
51
45
52
46
If you want to skip building an image, just use mine:
53
47
54
48
```bash
55
-
docker run --rm -v ./pdf:/app/pdf ghcr.io/ipitio/ocr-pdf \
49
+
docker run --rm -v .:/app/pdf ghcr.io/ipitio/ocr-pdf \
56
50
bash predict.sh pdf [OCRmyPDF options]
57
51
```
58
52
@@ -62,20 +56,20 @@ Don't want to install Docker? No problem!
0 commit comments