Skip to content

Commit 0172b7e

Browse files
committed
update docs + re-add win bundle
1 parent 72f1a5c commit 0172b7e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+23828
-678
lines changed

.github/workflows/test-coverage.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626
run: |
2727
sudo add-apt-repository -y ppa:alex-p/tesseract-ocr5
2828
sudo apt-get update
29-
sudo apt-get install -y libtesseract-dev tesseract-ocr tesseract-ocr-eng libpoppler-cpp-dev libmagick++-dev
29+
sudo apt-get install -y libtesseract-dev tesseract-ocr tesseract-ocr-eng libpoppler-cpp-dev
3030
3131
- name: Query dependencies
3232
run: |

DESCRIPTION

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,9 @@ Authors@R: c(person("Mauricio", "Vargas Sepulveda",
1717
role = "fnd")
1818
)
1919
Description: Bindings to 'Tesseract':
20-
a powerful optical character recognition (OCR) engine that supports over
20+
Tesseract
21+
(<[https://github.com/tesseract-ocr/tesseract]https://github.com/tesseract-ocr/tesseract>)
22+
is a powerful optical character recognition (OCR) engine that supports over
2123
100 languages. The engine is highly configurable in order to tune the
2224
detection algorithms and obtain the best possible results.
2325
License: Apache License (>= 2)
@@ -37,7 +39,6 @@ LinkingTo:
3739
RoxygenNote: 7.3.2
3840
Roxygen: list(markdown = TRUE)
3941
Suggests:
40-
magick (>= 2.7),
4142
spelling,
4243
knitr,
4344
tibble,

R/images.R

Lines changed: 0 additions & 8 deletions
This file was deleted.

R/ocr.R

Lines changed: 4 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -26,27 +26,16 @@
2626
#' @references [Tesseract: Improving Quality](https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality)
2727
#' @examples
2828
#' # Simple example
29-
#' file <- system.file("examples", "testocr.png", package = "cpp11tesseract")
29+
#' file <- system.file("examples", "wilde.jpg", package = "cpp11tesseract")
3030
#' text <- ocr(file)
3131
#' cat(text)
3232
ocr <- function(file, engine = tesseract("eng"), HOCR = FALSE, opw = "", upw = "") {
3333
if (is.character(engine)) {
3434
engine <- tesseract(engine)
3535
}
3636
stopifnot(inherits(engine, "externalptr"))
37-
if (isTRUE(inherits(file, "magick-image"))) {
38-
vapply(file, function(x) {
39-
tmp <- tempfile(fileext = ".png")
40-
on.exit(unlink(tmp))
41-
magick::image_write(x, tmp, format = "PNG", density = "300x300")
42-
ocr(tmp, engine = engine, HOCR = HOCR)
43-
}, character(1))
44-
} else if (isTRUE(is.character(file))) {
45-
if (isFALSE(is.tiff(file))) {
46-
vapply(file, ocr_file, character(1), ptr = engine, HOCR = HOCR, USE.NAMES = FALSE)
47-
} else {
48-
ocr(tiff_convert(file), engine, HOCR = HOCR)
49-
}
37+
if (isTRUE(is.character(file))) {
38+
vapply(file, ocr_file, character(1), ptr = engine, HOCR = HOCR, USE.NAMES = FALSE)
5039
} else if (isTRUE(is.raw(file))) {
5140
ocr_raw(file, engine, HOCR = HOCR)
5241
} else {
@@ -61,14 +50,7 @@ ocr_data <- function(file, engine = tesseract("eng")) {
6150
engine <- tesseract(engine)
6251
}
6352
stopifnot(inherits(engine, "externalptr"))
64-
df_list <- if (inherits(file, "magick-image")) {
65-
lapply(file, function(x) {
66-
tmp <- tempfile(fileext = ".png")
67-
on.exit(unlink(tmp))
68-
magick::image_write(x, tmp, format = "PNG", density = "300x300")
69-
ocr_data(tmp, engine = engine)
70-
})
71-
} else if (is.character(file)) {
53+
df_list <- if (is.character(file)) {
7254
lapply(file, function(im) {
7355
ocr_file_data(im, ptr = engine)
7456
})

README.Rmd

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ knitr::opts_chunk$set(
1313
)
1414
```
1515

16-
# cpp11tesseract <img src="man/figures/logo.svg" align="right" height="139" alt="" />
16+
# cpp11tesseract
1717

1818
[![R-CMD-check](https://github.com/pachadotdev/cpp11tesseract/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/pachadotdev/cpp11tesseract/actions/workflows/R-CMD-check.yaml)
1919
[![codecov](https://codecov.io/gh/pachadotdev/cpp11tesseract/graph/badge.svg?token=mWfiUCgfNu)](https://app.codecov.io/gh/pachadotdev/cpp11tesseract)
@@ -36,9 +36,9 @@ obtain the best possible results.
3636

3737
How to extract text from an image:
3838

39-
```r
40-
# Simple example
41-
text <- ocr("inst/examples/figures/testocr.png")
39+
```{r}
40+
library(cpp11tesseract)
41+
text <- ocr("inst/examples/wilde.jpg")
4242
cat(text)
4343
```
4444

README.md

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22
<!-- README.md is generated from README.Rmd. Please edit that file -->
33

4-
# cpp11tesseract <img src="man/figures/logo.svg" align="right" height="139" alt="" />
4+
# cpp11tesseract
55

66
[![R-CMD-check](https://github.com/pachadotdev/cpp11tesseract/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/pachadotdev/cpp11tesseract/actions/workflows/R-CMD-check.yaml)
77
[![codecov](https://codecov.io/gh/pachadotdev/cpp11tesseract/graph/badge.svg?token=mWfiUCgfNu)](https://app.codecov.io/gh/pachadotdev/cpp11tesseract)
@@ -30,9 +30,42 @@ detection algorithms and obtain the best possible results.
3030
How to extract text from an image:
3131

3232
``` r
33-
# Simple example
34-
text <- ocr("inst/examples/figures/testocr.png")
33+
library(cpp11tesseract)
34+
text <- ocr("inst/examples/wilde.jpg")
3535
cat(text)
36+
#> Act One
37+
#> [The living room of Algernon Moncrieff's flat in Mayfair, London.
38+
#> Lane is arranging afternoon tea on a table. Algemion enters}
39+
#> Algernon: Lane, have you made the cucumber sandwiches for
40+
#> Lady Bracknell’s tea?
41+
#> Lane: Yes, sir. [Handing them to Algernon on a silver tray]
42+
#> Algernon: [Looking carefully at them, taking two and sitting down
43+
#> on the sofa] Oh, by the way’, Lane, I looked at your notebook. |
44+
#> noticed that when Lord Shoreman and Mr Worthing dined with
45+
#> me on Thursday night, eight bottles of champagne were drunk,
46+
#> Lane: Yes, sir; eight bottles.
47+
#> Algernon: Why is it that, in a bachelor’s home, the servants
48+
#> always drink the champagne? I just ask because | am interested,
49+
#> Lane.
50+
#> Lane: I think that it is because the champagne is better in a
51+
#> bachelor’s home. | have noticed that the champagne in married
52+
#> people's homes is rarely very good.
53+
#> Algernon: Good heavens*! Is marriage so depressing?
54+
#> Lane: | believe marriage is very pleasant, sir. | haven't had much
55+
#> experience of it myself. [ have only been married once, and that
56+
#> was because of a misunderstanding” between myself and a young
57+
#> person.
58+
#> Algernon: [Lazily, without interest] I am not very interested in
59+
#> your family life, Lane.
60+
#> Lane: No, sir; it is not a very interesting subject. I never think
61+
#> of it myself.
62+
#> Algernon: That is very understandable. Well, thank you, Lane.
63+
#> [Lane goes off]
64+
#> Algernon: [To himself] Lane’s views on marriage seem very casual.
65+
#> Really, if the servants don’t set us a good example, what on earth
66+
#> is the use of them? They seem to have no morals
67+
#> [Lane enters]
68+
#> Lane: Mr Ernest Worthing is here, sir.
3669
```
3770

3871
## Differences with the original tesseract R package

docs/404.html

Lines changed: 87 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)