We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 059e658 commit 2c6cd00Copy full SHA for 2c6cd00
lib/docsplit/text_extractor.rb
@@ -64,7 +64,7 @@ def extract_from_ocr(pdf, pages)
64
tiff = "#{tempdir}/#{@pdf_name}_#{page}.tif"
65
file = "#{base_path}_#{page}"
66
run "MAGICK_TMPDIR=#{tempdir} OMP_NUM_THREADS=2 gm convert +adjoin #{MEMORY_ARGS} #{OCR_FLAGS} #{pdf}[#{page - 1}] #{tiff} 2>&1"
67
- run "tesseract #{tiff} #{file} 2>&1"
+ run "tesseract #{tiff} #{file} -l eng 2>&1"
68
clean_text(file + '.txt') if @clean_ocr
69
FileUtils.remove_entry_secure tiff
70
end
0 commit comments