Help: Parsing Product Nutrition Info from NFP Images #14614
Replies: 1 comment
-
To address your issue of parsing product nutrition information from Nutrition Facts Panel (NFP) images using PaddleOCR and OpenCV, here are some points and recommendations: 1. Image Format Requirement for PaddleOCRPaddleOCR expects input images in RGB format (not BGR). Since OpenCV loads images in BGR format by default, you need to convert the image to RGB before passing it to PaddleOCR. You can do this using the following code: import cv2
# Load the image using OpenCV
image = cv2.imread('path_to_your_image.jpg')
# Convert the image from BGR to RGB
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) This aligns with the discussion in the provided reference: PaddleOCR expects images in RGB or BGR format?. 2. Image Pre-Processing with OpenCVTo improve OCR accuracy, pre-processing is crucial. Here are some steps you can follow:
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Adaptive Thresholding
thresh_image = cv2.adaptiveThreshold(gray_image, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
blurred_image = cv2.GaussianBlur(gray_image, (5, 5), 0)
3. OCR Configuration in PaddleOCRWhen using PaddleOCR, ensure that:
Here’s a basic example of how to run PaddleOCR after pre-processing: from paddleocr import PaddleOCR
# Initialize PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='en') # Set language as needed
# Run OCR on the pre-processed image
results = ocr.ocr(image_rgb)
# Print results
for line in results[0]:
print(line) 4. Handling Nutrition Facts Panel Specific ChallengesNFP images can have structured layouts, which may require additional processing:
5. Common Issues and Debugging Tips
Summary
Feel free to share specific issues or errors you encounter for further assistance! Response generated by feifei-bot | chatgpt-4o-latest |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm working on a school project that involves extracting product nutrition information from images of Nutrition Facts Panels (NFPs) using Optical Character Recognition (OCR) and Natural Language Processing (NLP) techniques. I've encountered some challenges in parsing the extracted text data and would greatly appreciate any guidance.
I tried to use paddleocr python package to do the text extraction after doing some image pre-processing using cv2 (no gpu), but was not successful. Are there any specific params to set up for this kind of use-case ?
a sample nfp image used as input:

TIA
Beta Was this translation helpful? Give feedback.
All reactions