Skip to content

Commit 83d141d

Browse files
authored
Merge pull request #42 from poupeaua/dev
Prep release v0.3.0
2 parents 511ef84 + ea51c63 commit 83d141d

File tree

30 files changed

+1088
-295
lines changed

30 files changed

+1088
-295
lines changed

README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ pip install otary
5151
Let me illustrate the usage of Otary with a simple example. Imagine you need to:
5252

5353
1. read an image from a pdf file
54-
2. draw an ellipse on it
54+
2. draw an rectangle on it, shift and rotate the rectangle
5555
3. crop a part of the image
5656
4. rotate the cropped image
5757
5. apply a threshold
@@ -60,20 +60,21 @@ Let me illustrate the usage of Otary with a simple example. Imagine you need to:
6060
In order to compare the use of Otary versus other libraries, I will use the same example but with different libraries. Try it yourself on your favorite LLM (like [ChatGPT](https://chatgpt.com/)) by copying the query:
6161

6262
```text
63-
Generate a python code to read an image from a pdf, draw an ellipse on it, crop a part of the image, rotate the cropped image, apply a threshold on the image.
63+
Generate a python code to read an image from a pdf, draw an rectangle on it, shift and rotate the rectangle, crop a part of the image, rotate the cropped image, apply a threshold on the image.
6464
```
6565

6666
Using Otary you can do it with few lines of code:
6767

6868
```python
6969
import otary as ot
7070

71-
im = ot.Image.from_pdf("path/to/your/file.pdf", page_nb=0)
71+
im = ot.Image.from_pdf("path/to/you/file.pdf", page_nb=0)
7272

73-
ellipse = ot.Ellipse(foci1=[100, 100], foci2=[400, 400], semi_major_axis=250)
73+
rectangle = ot.Rectangle([[1, 1], [4, 1], [4, 4], [1, 4]]) * 100
74+
rectangle.shift([50, 50]).rotate(angle=30, is_degree=True)
7475

7576
im = (
76-
im.draw_ellipses([ellipse])
77+
im.draw_polygons([rectangle])
7778
.crop(x0=50, y0=50, x1=450, y1=450)
7879
.rotate(angle=90, is_degree=True)
7980
.threshold_simple(thresh=200)
@@ -84,7 +85,7 @@ im.show()
8485

8586
Using Otary makes the code:
8687

87-
- Much more **readable** and hence maintainable
88+
- Much more **readable** and hence **maintainable**
8889
- Much more **interactive**
8990
- Much simpler, simplifying **libraries management** by only using one library and not manipulating multiple libraries like Pillow, OpenCV, Scikit-Image, PyMuPDF etc.
9091

@@ -94,7 +95,7 @@ In a Jupyter notebook, you can easily test and iterate on transformations by sim
9495

9596
```python
9697
im = (
97-
im.draw_ellipses([ellipse])
98+
im.draw_polygons([rectangle])
9899
# .crop(x0=50, y0=50, x1=450, y1=450)
99100
# .rotate(angle=90, is_degree=True)
100101
.threshold_simple(thresh=200)

docs/examples/index/sample.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,11 @@
22

33
im = ot.Image.from_pdf("tests/data/test.pdf", page_nb=0)
44

5-
ellipse = ot.Ellipse(foci1=[100, 100], foci2=[400, 400], semi_major_axis=250)
5+
rectangle = ot.Rectangle([[1, 1], [4, 1], [4, 4], [1, 4]]) * 100
6+
rectangle.shift([50, 50]).rotate(angle=30, is_degree=True)
67

78
im = (
8-
im.draw_ellipses([ellipse])
9+
im.draw_polygons([rectangle])
910
.crop(x0=50, y0=50, x1=450, y1=450)
1011
.rotate(angle=90, is_degree=True)
1112
.threshold_simple(thresh=200)

docs/index.md

Lines changed: 200 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ The main features of Otary are:
4343
Let me illustrate the usage of Otary with a simple example. Imagine you need to:
4444

4545
1. read an image from a pdf file
46-
2. draw an ellipse on it
46+
2. draw an rectangle on it, shift and rotate the rectangle
4747
3. crop a part of the image
4848
4. rotate the cropped image
4949
5. apply a threshold
@@ -52,7 +52,7 @@ Let me illustrate the usage of Otary with a simple example. Imagine you need to:
5252
In order to compare the use of Otary versus other libraries, I will use the same example but with different libraries. Try it yourself on your favorite LLM (like [ChatGPT](https://chatgpt.com/)) by copying the query:
5353

5454
```text
55-
Generate a python code to read an image from a pdf, draw an ellipse on it, crop a part of the image, rotate the cropped image, apply a threshold on the image.
55+
Generate a python code to read an image from a pdf, draw an rectangle on it, shift and rotate the rectangle, crop a part of the image, rotate the cropped image, apply a threshold on the image.
5656
```
5757

5858
Using Otary you can do it with few lines of code:
@@ -62,12 +62,13 @@ Using Otary you can do it with few lines of code:
6262
```Python
6363
import otary as ot
6464

65-
im = ot.Image.from_pdf("path/to/your/file.pdf", page_nb=0)
65+
im = ot.Image.from_pdf("path/to/you/file.pdf", page_nb=0)
6666

67-
ellipse = ot.Ellipse(foci1=[100, 100], foci2=[400, 400], semi_major_axis=250)
67+
rectangle = ot.Rectangle([[1, 1], [4, 1], [4, 4], [1, 4]]) * 100
68+
rectangle.shift([50, 50]).rotate(angle=30, is_degree=True)
6869

6970
im = (
70-
im.draw_ellipses([ellipse])
71+
im.draw_polygons([rectangle])
7172
.crop(x0=50, y0=50, x1=450, y1=450)
7273
.rotate(angle=90, is_degree=True)
7374
.threshold_simple(thresh=200)
@@ -76,95 +77,220 @@ Using Otary you can do it with few lines of code:
7677
im.show()
7778
```
7879

79-
=== "Other libraries"
80+
=== "ChatGPT using other libraries"
8081

8182
```Python
83+
#!/usr/bin/env python3
8284
"""
83-
Providing the input to ChatGPT gives the following code
85+
Steps:
86+
- Load first page of a PDF as an image
87+
- Draw a rectangle
88+
- Shift & rotate that rectangle (visualized as a rotated box)
89+
- Crop a region of the image
90+
- Rotate the cropped image
91+
- Threshold the (rotated) crop
92+
93+
Dependencies:
94+
pip install pdf2image Pillow opencv-python
95+
# If pdf2image isn't available, install: pip install PyMuPDF
96+
# Note: pdf2image requires Poppler on your system.
97+
98+
Edit the CONFIG section below to suit your needs.
8499
"""
85-
import fitz # PyMuPDF
100+
101+
from pathlib import Path
102+
import math
103+
104+
# Pillow & OpenCV
105+
from PIL import Image, ImageDraw
86106
import numpy as np
87107
import cv2
88108

89-
def read_image_from_pdf(pdf_path, page_number=0, dpi=300):
90-
"""Extracts the specified page as an image from a PDF."""
91-
doc = fitz.open(pdf_path)
92-
page = doc[page_number]
93-
mat = fitz.Matrix(dpi / 72, dpi / 72) # scale to DPI
94-
pix = page.get_pixmap(matrix=mat)
95-
img = np.frombuffer(pix.samples, dtype=np.uint8).reshape(pix.height, pix.width, pix.n)
96-
if img.shape[2] == 4:
97-
img = cv2.cvtColor(img, cv2.COLOR_BGRA2BGR)
98-
return img
99-
100-
def draw_ellipse(img, center, axes, angle=0, color=(0, 255, 0), thickness=2):
101-
"""Draws an ellipse on the image."""
102-
return cv2.ellipse(img.copy(), center, axes, angle, 0, 360, color, thickness)
103-
104-
def crop_image(img, top_left, bottom_right):
105-
"""Crops the image using top-left and bottom-right coordinates."""
106-
x1, y1 = top_left
107-
x2, y2 = bottom_right
108-
return img[y1:y2, x1:x2]
109-
110-
def rotate_image(img, angle):
111-
"""Rotates the image around its center by the given angle."""
112-
(h, w) = img.shape[:2]
113-
center = (w // 2, h // 2)
114-
M = cv2.getRotationMatrix2D(center, angle, 1.0)
115-
rotated = cv2.warpAffine(img, M, (w, h), flags=cv2.INTER_LINEAR, borderMode=cv2.BORDER_REPLICATE)
109+
# Try to import a PDF rasterizer
110+
_loader = None
111+
try:
112+
from pdf2image import convert_from_path
113+
_loader = "pdf2image"
114+
except Exception:
115+
try:
116+
import fitz # PyMuPDF
117+
_loader = "pymupdf"
118+
except Exception:
119+
_loader = None
120+
121+
122+
# --------------------------- CONFIG --------------------------- #
123+
PDF_PATH = "example.pdf" # <- put your PDF path here
124+
OUTPUT_DIR = Path("out_steps")
125+
OUTPUT_DIR.mkdir(exist_ok=True)
126+
127+
# Rectangle (axis-aligned) you want to draw first:
128+
rect_x, rect_y, rect_w, rect_h = 200, 150, 400, 250 # pixels
129+
130+
# Shift to apply to the rectangle center (dx, dy):
131+
shift_dx, shift_dy = 120, -40 # pixels
132+
133+
# Rotation to apply to the rectangle (degrees, positive=CCW):
134+
rotate_deg = 25.0
135+
136+
# Crop region from the original image (x, y, w, h):
137+
crop_x, crop_y, crop_w, crop_h = 100, 100, 600, 400
138+
139+
# Rotation to apply to the cropped image (degrees):
140+
crop_rotate_deg = -15.0
141+
142+
# Threshold (use None to use Otsu automatically)
143+
fixed_threshold_value = None # e.g., set to 128 to force a fixed threshold
144+
# -------------------------------------------------------------- #
145+
146+
147+
def load_pdf_first_page_as_image(pdf_path: str, dpi: int = 300) -> Image.Image:
148+
"""Return the first page of a PDF as a Pillow RGB image."""
149+
if _loader == "pdf2image":
150+
pil_pages = convert_from_path(pdf_path, dpi=dpi, first_page=1, last_page=1)
151+
if not pil_pages:
152+
raise RuntimeError("No pages found in PDF.")
153+
return pil_pages[0].convert("RGB")
154+
elif _loader == "pymupdf":
155+
doc = fitz.open(pdf_path)
156+
if doc.page_count == 0:
157+
raise RuntimeError("No pages found in PDF.")
158+
page = doc.load_page(0)
159+
# 300 dpi equivalent scaling
160+
zoom = dpi / 72.0
161+
mat = fitz.Matrix(zoom, zoom)
162+
pix = page.get_pixmap(matrix=mat, alpha=False)
163+
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
164+
return img
165+
else:
166+
raise ImportError(
167+
"No PDF rasterizer available. Install either `pdf2image` (plus Poppler) or `PyMuPDF`."
168+
)
169+
170+
171+
def pil_to_cv(img_pil: Image.Image) -> np.ndarray:
172+
"""Pillow RGB -> OpenCV BGR"""
173+
return cv2.cvtColor(np.array(img_pil), cv2.COLOR_RGB2BGR)
174+
175+
176+
def cv_to_pil(img_cv: np.ndarray) -> Image.Image:
177+
"""OpenCV BGR -> Pillow RGB"""
178+
return Image.fromarray(cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB))
179+
180+
181+
def draw_axis_aligned_rectangle_pil(img_pil: Image.Image, x, y, w, h, width=4):
182+
"""Draw axis-aligned rectangle on a PIL image."""
183+
draw = ImageDraw.Draw(img_pil)
184+
draw.rectangle([x, y, x + w, y + h], outline=(255, 0, 0), width=width)
185+
return img_pil
186+
187+
188+
def draw_rotated_rectangle_cv(img_cv: np.ndarray, center, size, angle_deg: float, thickness=3, color=(0, 255, 0)):
189+
"""
190+
Draw a rotated rectangle using OpenCV. center=(cx,cy), size=(w,h), angle in degrees CCW.
191+
"""
192+
rect = (center, size, angle_deg)
193+
box = cv2.boxPoints(rect) # 4x2 float32 array of vertices
194+
box = np.int32(box)
195+
cv2.polylines(img_cv, [box], isClosed=True, color=color, thickness=thickness)
196+
return img_cv
197+
198+
199+
def rotate_image_keep_bounds(img_cv: np.ndarray, angle_deg: float) -> np.ndarray:
200+
"""
201+
Rotate an image about its center, expanding bounds so nothing is cropped.
202+
"""
203+
(h, w) = img_cv.shape[:2]
204+
c = (w // 2, h // 2)
205+
M = cv2.getRotationMatrix2D(c, angle_deg, 1.0)
206+
# compute new bounds
207+
cos = abs(M[0, 0])
208+
sin = abs(M[0, 1])
209+
new_w = int((h * sin) + (w * cos))
210+
new_h = int((h * cos) + (w * sin))
211+
# adjust rotation matrix to account for translation
212+
M[0, 2] += (new_w / 2) - c[0]
213+
M[1, 2] += (new_h / 2) - c[1]
214+
rotated = cv2.warpAffine(img_cv, M, (new_w, new_h), flags=cv2.INTER_LINEAR, borderMode=cv2.BORDER_REPLICATE)
116215
return rotated
117216

118-
def apply_threshold(img, thresh_value=127):
119-
"""Applies a binary threshold on the grayscale version of the image."""
120-
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
121-
_, thresh = cv2.threshold(gray, thresh_value, 255, cv2.THRESH_BINARY)
122-
return thresh
123-
124-
def main():
125-
pdf_path = "your_file.pdf"
126-
127-
# Step 1: Read image from PDF
128-
img = read_image_from_pdf(pdf_path)
129-
130-
# Step 2: Draw an ellipse on the image
131-
h, w = img.shape[:2]
132-
center = (w // 2, h // 2)
133-
axes = (w // 4, h // 6)
134-
img_with_ellipse = draw_ellipse(img, center, axes, angle=30, color=(0, 0, 255), thickness=3)
135217

136-
# Step 3: Crop a part of the image
137-
cropped_img = crop_image(img_with_ellipse, (100, 100), (500, 500))
218+
def threshold_image(img_cv_gray: np.ndarray, fixed_thresh: int | None = None) -> np.ndarray:
219+
"""
220+
Apply binary threshold. If fixed_thresh is None, use Otsu.
221+
"""
222+
if fixed_thresh is None:
223+
_, th = cv2.threshold(img_cv_gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
224+
else:
225+
_, th = cv2.threshold(img_cv_gray, int(fixed_thresh), 255, cv2.THRESH_BINARY)
226+
return th
138227

139-
# Step 4: Rotate the cropped image
140-
rotated_img = rotate_image(cropped_img, angle=45)
141228

142-
# Step 5: Apply threshold
143-
thresholded_img = apply_threshold(rotated_img, thresh_value=150)
144-
145-
# Display results
146-
cv2.imshow("Ellipse Image", img_with_ellipse)
147-
cv2.imshow("Cropped Image", cropped_img)
148-
cv2.imshow("Rotated Image", rotated_img)
149-
cv2.imshow("Thresholded Image", thresholded_img)
150-
cv2.waitKey(0)
151-
cv2.destroyAllWindows()
229+
def main():
230+
# 1) Load first page
231+
pil_img = load_pdf_first_page_as_image(PDF_PATH, dpi=300)
232+
pil_img.save(OUTPUT_DIR / "01_loaded_page.png")
233+
234+
# 2) Draw axis-aligned rectangle (Pillow)
235+
pil_with_rect = pil_img.copy()
236+
pil_with_rect = draw_axis_aligned_rectangle_pil(pil_with_rect, rect_x, rect_y, rect_w, rect_h, width=4)
237+
pil_with_rect.save(OUTPUT_DIR / "02_axis_aligned_rect.png")
238+
239+
# Convert to OpenCV for further operations
240+
cv_img = pil_to_cv(pil_with_rect)
241+
242+
# 3) Shift & rotate rectangle (OpenCV rotated box)
243+
# Start from the original rectangle center:
244+
cx = rect_x + rect_w / 2.0
245+
cy = rect_y + rect_h / 2.0
246+
# Apply shift
247+
cx_shifted = cx + shift_dx
248+
cy_shifted = cy + shift_dy
249+
# Draw rotated rectangle (in green)
250+
cv_img_rotrect = cv_img.copy()
251+
cv_img_rotrect = draw_rotated_rectangle_cv(
252+
cv_img_rotrect,
253+
center=(cx_shifted, cy_shifted),
254+
size=(rect_w, rect_h),
255+
angle_deg=rotate_deg,
256+
thickness=3,
257+
color=(0, 255, 0),
258+
)
259+
cv2.imwrite(str(OUTPUT_DIR / "03_shifted_rotated_rect.png"), cv_img_rotrect)
260+
261+
# 4) Crop a region (axis-aligned box on the original image)
262+
x1, y1 = int(crop_x), int(crop_y)
263+
x2, y2 = int(crop_x + crop_w), int(crop_y + crop_h)
264+
h, w = cv_img.shape[:2]
265+
# clamp to image
266+
x1 = max(0, min(w - 1, x1))
267+
y1 = max(0, min(h - 1, y1))
268+
x2 = max(0, min(w, x2))
269+
y2 = max(0, min(h, y2))
270+
crop = cv_img[y1:y2, x1:x2].copy()
271+
cv2.imwrite(str(OUTPUT_DIR / "04_crop.png"), crop)
272+
273+
# 5) Rotate the cropped image (keeping bounds)
274+
crop_rot = rotate_image_keep_bounds(crop, crop_rotate_deg)
275+
cv2.imwrite(str(OUTPUT_DIR / "05_crop_rotated.png"), crop_rot)
276+
277+
# 6) Threshold the (rotated) crop
278+
crop_gray = cv2.cvtColor(crop_rot, cv2.COLOR_BGR2GRAY)
279+
crop_th = threshold_image(crop_gray, fixed_threshold_value)
280+
cv2.imwrite(str(OUTPUT_DIR / "06_crop_threshold.png"), crop_th)
281+
282+
print("Done. See outputs in:", OUTPUT_DIR.resolve())
152283

153-
# Optionally save results
154-
cv2.imwrite("ellipse_image.jpg", img_with_ellipse)
155-
cv2.imwrite("cropped_image.jpg", cropped_img)
156-
cv2.imwrite("rotated_image.jpg", rotated_img)
157-
cv2.imwrite("thresholded_image.jpg", thresholded_img)
158284

159285
if __name__ == "__main__":
160286
main()
161287
```
162288

163-
ChatGPT proposes to re-invent the wheel.
289+
ChatGPT proposes to re-invent the wheel and over-complicates everything.
164290

165291
Using Otary makes the code:
166292

167-
- Much more **readable** and hence maintainable
293+
- Much more **readable** and hence **maintainable**
168294
- Much more **interactive**
169295
- Much simpler, simplifying **libraries management** by only using one library and not manipulating multiple libraries like Pillow, OpenCV, Scikit-Image, PyMuPDF etc.
170296

@@ -176,7 +302,7 @@ Using Otary makes the code:
176302

177303
```python
178304
im = (
179-
im.draw_ellipses([ellipse])
305+
im.draw_polygons([rectangle])
180306
# .crop(x0=50, y0=50, x1=450, y1=450)
181307
# .rotate(angle=90, is_degree=True)
182308
.threshold_simple(thresh=200)

0 commit comments

Comments
 (0)