1. add explanation about 'why we ignore some files in output and samples folders'

RiansyahTohamba · RiansyahTohamba · commit 484d5c6719ed · 2020-10-31T09:35:26.000+07:00
2. add bash script
diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,3 @@
+# We don't want to publish other people's documents due to copyright issues.
 output/*.txt
 samples/*.pdf
diff --git a/README.md b/README.md
@@ -2,14 +2,6 @@
 Python Multiple PDF Documents Text Extraction - Python 3.7
 ![Logo](XPDF.jpg)
 
-## CARA eksekusi
-sh main.sh
-
-setelah itu cek 'folder/output'
-CTRL + H untuk replace
-1. \n with space
-2. kode <0xsdx>
-3. 
 
 
 ## Introduction
@@ -46,6 +38,10 @@ That's why, **PDFs-TextExtract** project developed to **extract text from multip
 - **Step 4:** Execute **..\PDFs-TextExtract-master\Scripts\extract_text.py** script.
 - **Step 5:** Open **..\PDFs-TextExtract-master\output** and you will find the result there.
 
+## With bash script
+Execute 
+sh main.sh
+
 ## Resources 
 - [Overview about PDF Processing with Python](https://towardsdatascience.com/pdf-preprocessing-with-python-19829752af9f)
 - **pdf2txt** tool forked from [pdfminer.six](https://github.com/pdfminer/pdfminer.six) project.
diff --git a/main.sh b/main.sh
@@ -1,4 +1,3 @@
 python Scripts/merged.py
 python Scripts/spliter.py
-python Scripts/extract_text.py
-
+python Scripts/extract_text.py

Original file line number	Diff line number	Diff line change
`@@ -1,2 +1,3 @@`
	`1`	`+# We don't want to publish other people's documents due to copyright issues.`
`1`	`2`	`output/*.txt`
`2`	`3`	`samples/*.pdf`