Skip to content

Commit 2c6cd00

Browse files
committed
default tesseract to english
1 parent 059e658 commit 2c6cd00

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

lib/docsplit/text_extractor.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ def extract_from_ocr(pdf, pages)
6464
tiff = "#{tempdir}/#{@pdf_name}_#{page}.tif"
6565
file = "#{base_path}_#{page}"
6666
run "MAGICK_TMPDIR=#{tempdir} OMP_NUM_THREADS=2 gm convert +adjoin #{MEMORY_ARGS} #{OCR_FLAGS} #{pdf}[#{page - 1}] #{tiff} 2>&1"
67-
run "tesseract #{tiff} #{file} 2>&1"
67+
run "tesseract #{tiff} #{file} -l eng 2>&1"
6868
clean_text(file + '.txt') if @clean_ocr
6969
FileUtils.remove_entry_secure tiff
7070
end

0 commit comments

Comments
 (0)