Granite Docling Recipe by AishaDarga · Pull Request #244 · ibm-granite-community/granite-snack-cookbook

AishaDarga · 2025-11-11T06:08:46Z

PR Checklist

Model Interaction

Flexible LLM platform support The platform should be easily switchable. Use LangChain or LlamaIndex.
Use prompt guide corresponding to the model For example for Granite 3.x Language Models

Data

Example data: Follow the example data guidance.

Notebook requirements

Notebook outputs cleared: Ensure all notebook outputs are cleared.
Pre-commit hooks run: Ensure the pre-commit hooks for notebooks have been run.
Automated testing: Add the recipe to the automated tests as described here
Test in Google Colab:
- Test that it works in Google Colab (Python 3.10.12).
- Colab has its own package set and Python version, so ensure compatibility.
Test locally:
- Ensure the code works in a fresh Python virtual environment (venv).
Standard access to secrets and variables Include %pip install git+https://github.com/ibm-granite-community/utils in the first code cell in order to make get_env_var available to accessing secrets and variables in the recipe.

Incoming References

README.md updates:
- Add a link to the recipe in the Table of Contents (ToC).
- Include a Colab button after that link if the notebook can be run in Colab.

GitHub

Commits signed: All commits must be GPG or SSH signed.
DCO Compliance: Developer Certificate of Origin (DCO) applies to the code, documentation, and any example data provided. Ensure commits are signed off.

bjhargrave

Some comments.

bjhargrave · 2025-11-12T15:11:17Z

recipes/Contract-Analysis/Granite_Docling_Price_Adjustment_Detection.ipynb

+    "\n",
+    "Before extracting and classifying contracts, we need to initialize our two main engines:\n",
+    "\n",
+    "- **Granite Docling** – **ibm-granite/granite-docling-258M-mlx** a multimodal Image-Text-to-Text model designed for converting complex documents (PDFs, scanned images, etc.) into structured, machine-readable formats like Markdown, HTML, or JSON.\n",


This looks like the recipe will only work on macOS (MLX). It should be able to run on any system (linux, windows). Generally we like to test the notebooks in the CI build which is a linux system.

bjhargrave · 2025-11-12T15:12:16Z

recipes/Contract-Analysis/Granite_Docling_Price_Adjustment_Detection.ipynb

+    "api_key = os.getenv(\"WATSON_API_KEY\")\n",
+    "project_id = os.getenv(\"WATSON_PROJECT_ID\")\n",
+    "watsonx_url = os.getenv(\"WATSON_URL\", \"https://us-south.ml.cloud.ibm.com\")\n",


We want to use the get_env method from ibm-granite-community.utils. See how other notebooks get these secrets.

bjhargrave · 2025-11-12T15:13:04Z

recipes/Contract-Analysis/Granite_Docling_Price_Adjustment_Detection.ipynb

+    "    logger.error(\"WATSON_API_KEY or WATSON_PROJECT_ID environment variables not set\")\n",
+    "    raise ValueError(\"Missing required environment variables\")\n",
+    "\n",
+    "llm = WatsonxLLM(\n",


Any reason why you are not using ChatWatsonx? In general, I have been trying to move the notebooks to use chat completion API.

bjhargrave · 2025-11-12T15:14:36Z

recipes/Contract-Analysis/Granite_Docling_Price_Adjustment_Detection.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def extract_contract_text(file_path, max_chars=32000):\n",


Please use type hints on function arguments and return value.

bjhargrave · 2025-11-12T15:17:24Z

recipes/Contract-Analysis/Granite_Docling_Price_Adjustment_Detection.ipynb

+    "\n",
+    "    try:\n",
+    "        logger.info(f\"Classifying: {contract_name}\")\n",
+    "        response = llm(prompt).strip()\n",


The prompt text is not properly formatted using the appropriate Granite chat template. This is a reason to use ChatWatsonx since the proper formatting will be done on the server-side. If you want to use WatsonxLLM (old completion API), then you will need to format the prompt locally. See the use of TokenizerChatPromtTemplate in other notebooks.

Using ChatWatsonx will also allow you to use structured responses which will provide more consistent results for the json schema.

bjhargrave · 2025-11-12T15:20:15Z

recipes/Contract-Analysis/Granite_Docling_Price_Adjustment_Detection.ipynb

+    "    langchain_ibm \\\n",
+    "    langchain_community \\\n",
+    "    transformers \\\n",
+    "    mlx-vlm \n",


My guess is that installing mlx on a non-mac will be an error.

Granite Docling Recipe

1f76f57

bjhargrave requested changes Nov 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Granite Docling Recipe#244

Granite Docling Recipe#244
AishaDarga wants to merge 1 commit intoibm-granite-community:mainfrom
AishaDarga:granite-docling-recipe

AishaDarga commented Nov 11, 2025

Uh oh!

bjhargrave left a comment

Uh oh!

bjhargrave Nov 12, 2025

Uh oh!

bjhargrave Nov 12, 2025

Uh oh!

bjhargrave Nov 12, 2025

Uh oh!

bjhargrave Nov 12, 2025

Uh oh!

bjhargrave Nov 12, 2025

Uh oh!

bjhargrave Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AishaDarga commented Nov 11, 2025

PR Checklist

Model Interaction

Data

Notebook requirements

Incoming References

GitHub

Uh oh!

bjhargrave left a comment

Choose a reason for hiding this comment

Uh oh!

bjhargrave Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

bjhargrave Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

bjhargrave Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

bjhargrave Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

bjhargrave Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

bjhargrave Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants