Skip to content

Commit e5814bd

Browse files
committed
update readme
1 parent cf9227d commit e5814bd

File tree

1 file changed

+15
-21
lines changed

1 file changed

+15
-21
lines changed

README.md

Lines changed: 15 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -4,55 +4,49 @@
44

55
# ocr2pdf
66

7-
**OCRmyPDF and Merge it**
7+
**Merge images into actual PDFs with AI**
88

99
---
1010

1111
[![build](https://github.com/ipitio/ocr-pdf/actions/workflows/publish.yml/badge.svg)](https://github.com/ipitio/ocr-pdf/actions/workflows/publish.yml) [![downloads](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fipitio.github.io%2Fbackage%2Fipitio%2Focr-pdf%2Focr-pdf.json&query=%24.downloads&logo=github&logoColor=959da5&labelColor=333a41&label=pulls)](https://github.com/ipitio/ocr-pdf/pkgs/container/ocr-pdf) [![size](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fipitio.github.io%2Fbackage%2Fipitio%2Focr-pdf%2Focr-pdf.json&query=%24.size&logo=github&logoColor=959da5&label=size&labelColor=333a41&color=indigo)](https://github.com/ipitio/backage/pkgs/container/backage) [![latest](https://img.shields.io/badge/dynamic/xml?url=https%3A%2F%2Fipitio.github.io%2Fbackage%2Fipitio%2Focr-pdf%2Focr-pdf.xml&query=%2Fbkg%2Fversion%5B.%2Flatest%5B.%3D%22true%22%5D%5D%2Ftags%5B.!%3D%22latest%22%5D&logo=github&logoColor=959da5&label=latest&labelColor=333a41&color=darkgreen)](https://github.com/ipitio/backage/pkgs/container/backage)
1212

1313
</div>
1414

15-
Convert images and scans to searchable and selectable (and merged) PDFs! The core logic resides in a Python script that extracts all the files from `todo`, transforms them with Tesseract via [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF), and loads them into `done`.
15+
Merge images and scans into searchable and selectable PDFs! The core logic resides in a Python script that transforms the files with Tesseract via [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF). For information about available options, see the [OCRmyPDF documentation](https://ocrmypdf.readthedocs.io/en/latest).
16+
17+
A Bash script is provided to automate the installation of dependencies and the execution of the Python script. The Docker image provides a self-contained virtual environment that runs the Bash script in a container. The Google Colab notebook and GitHub Actions workflow both run the container in the cloud.
1618

1719
> [!NOTE]
1820
> Files in subfolders will be merged in alphabetical order, but will still be available individually.
1921
20-
I recommend you use either:
21-
22-
- The Bash script, which runs the Python script
23-
- The Docker image, which runs the Bash script
24-
- A Google Colab or GitHub Actions server, both of which run the Docker image
25-
26-
Read on to find out which is best for you! For more information about the options, see the [OCRmyPDF documentation](https://ocrmypdf.readthedocs.io/en/latest).
27-
2822
## Fast Start
2923

30-
It's as easy as 1, 2, 3! Get up and going in no time with these options:
24+
Get up and going in no time with these options:
3125

3226
### Cloud: Google Colab Notebook
3327

3428
Are you on mobile or simply want an easy and seamless experience?
3529

36-
1. Open [Colab](https://colab.research.google.com/github/ipitio/ocr-pdf/blob/master/colab.ipynb) cell in [Chrome](https://stackoverflow.com/a/48777857)
30+
1. Open [Colab](https://colab.research.google.com/github/ipitio/ocr-pdf/blob/master/colab.ipynb) in [Chrome](https://stackoverflow.com/a/48777857)
3731
2. Run the cell and follow the prompts
38-
3. Find the OCR'd files in your [Drive](https://drive.google.com/drive/my-drive)`/ocr-pdf`
32+
3. Find the PDFs in your [Drive](https://drive.google.com/drive/my-drive)`/ocr-pdf`
3933

4034
To add OCRmyPDF options, append them to the `run` command.
4135

4236
### Self-hosted
4337

4438
Do you want to run it on your own machine, but don't want to clone the repo?
4539

46-
1. Ensure you have Docker or Bash and cURL installed
47-
2. Make a new `pdf` folder and put your files in `pdf/todo`
48-
3. Run one of the following commands from the parent of `pdf`:
40+
1. Ensure you have Docker, or Bash and cURL, installed
41+
2. Make two new nested folders and put your files in them: `pdf/todo/*`
42+
3. Run one of the following from the outer `pdf` folder:
4943

5044
#### Docker Container
5145

5246
If you want to skip building an image, just use mine:
5347

5448
```bash
55-
docker run --rm -v ./pdf:/app/pdf ghcr.io/ipitio/ocr-pdf \
49+
docker run --rm -v .:/app/pdf ghcr.io/ipitio/ocr-pdf \
5650
bash predict.sh pdf [OCRmyPDF options]
5751
```
5852

@@ -62,20 +56,20 @@ Don't want to install Docker? No problem!
6256

6357
```bash
6458
curl -sSLNZ https://ipitio.github.io/ocr-pdf/src/predict.sh |\
65-
bash -s -- pdf [OCRmyPDF options]
59+
bash -s -- . [OCRmyPDF options]
6660
```
6761

6862
## Quick Start
6963

70-
It's still easy as 1, 2, 3!
64+
It's still as easy as 1, 2, 3!
7165

7266
1. Fork and clone this repo
73-
2. Put your files in `pdf/todo`
67+
2. Put your files in `pdf/todo/`
7468
3. Complete one of the following from the root of the repo:
7569

7670
### Cloud: GitHub Actions Workflow
7771

78-
Enable Actions and push your files:
72+
Enable Actions on GitHub, then push your files:
7973

8074
```bash
8175
git add .

0 commit comments

Comments
 (0)