Skip to content

Commit 484d5c6

Browse files
1. add explanation about 'why we ignore some files in output and samples folders'
2. add bash script
1 parent 9081d16 commit 484d5c6

File tree

3 files changed

+6
-10
lines changed

3 files changed

+6
-10
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
1+
# We don't want to publish other people's documents due to copyright issues.
12
output/*.txt
23
samples/*.pdf

README.md

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,6 @@
22
Python Multiple PDF Documents Text Extraction - Python 3.7
33
![Logo](XPDF.jpg)
44

5-
## CARA eksekusi
6-
sh main.sh
7-
8-
setelah itu cek 'folder/output'
9-
CTRL + H untuk replace
10-
1. \n with space
11-
2. kode <0xsdx>
12-
3.
135

146

157
## Introduction
@@ -46,6 +38,10 @@ That's why, **PDFs-TextExtract** project developed to **extract text from multip
4638
- **Step 4:** Execute **..\PDFs-TextExtract-master\Scripts\extract_text.py** script.
4739
- **Step 5:** Open **..\PDFs-TextExtract-master\output** and you will find the result there.
4840

41+
## With bash script
42+
Execute
43+
sh main.sh
44+
4945
## Resources
5046
- [Overview about PDF Processing with Python](https://towardsdatascience.com/pdf-preprocessing-with-python-19829752af9f)
5147
- **pdf2txt** tool forked from [pdfminer.six](https://github.com/pdfminer/pdfminer.six) project.

main.sh

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
11
python Scripts/merged.py
22
python Scripts/spliter.py
3-
python Scripts/extract_text.py
4-
3+
python Scripts/extract_text.py

0 commit comments

Comments
 (0)