Skip to content

Commit 2074a4b

Browse files
Merge pull request #6 from RiansyahTohamba/master
ignoring some files and adding bash script
2 parents f1a344b + 484d5c6 commit 2074a4b

File tree

7 files changed

+12
-863
lines changed

7 files changed

+12
-863
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# We don't want to publish other people's documents due to copyright issues.
2+
output/*.txt
3+
samples/*.pdf

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
Python Multiple and Large PDF Documents Text Extraction - Python 3.7
33
![Logo](XPDF.jpg)
44

5+
6+
57
## Introduction
68
**As a Data Scientist , You may not stick to data format.**
79

@@ -41,6 +43,10 @@ That's why, **PDFs-TextExtract** project developed to **extract text from multip
4143
- **Step 4:** Execute **..\PDFs-TextExtract-master\Scripts\extract_text.py** script.
4244
- **Step 5:** Open **..\PDFs-TextExtract-master\output** and you will find the result there.
4345

46+
## With bash script
47+
Execute
48+
sh main.sh
49+
4450
## Resources
4551
- [Overview about PDF Processing with Python](https://towardsdatascience.com/pdf-preprocessing-with-python-19829752af9f)
4652
- **pdf2txt** tool forked from [pdfminer.six](https://github.com/pdfminer/pdfminer.six) project.

main.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
python Scripts/merged.py
2+
python Scripts/spliter.py
3+
python Scripts/extract_text.py

output/Output.txt

Lines changed: 0 additions & 863 deletions
This file was deleted.
-10.1 MB
Binary file not shown.

samples/sample-pdf-file.pdf

-143 KB
Binary file not shown.

samples/sample.pdf

-53.6 KB
Binary file not shown.

0 commit comments

Comments
 (0)