Conversation
| "\n", | ||
| "Before extracting and classifying contracts, we need to initialize our two main engines:\n", | ||
| "\n", | ||
| "- **Granite Docling** – **ibm-granite/granite-docling-258M-mlx** a multimodal Image-Text-to-Text model designed for converting complex documents (PDFs, scanned images, etc.) into structured, machine-readable formats like Markdown, HTML, or JSON.\n", |
There was a problem hiding this comment.
This looks like the recipe will only work on macOS (MLX). It should be able to run on any system (linux, windows). Generally we like to test the notebooks in the CI build which is a linux system.
| "api_key = os.getenv(\"WATSON_API_KEY\")\n", | ||
| "project_id = os.getenv(\"WATSON_PROJECT_ID\")\n", | ||
| "watsonx_url = os.getenv(\"WATSON_URL\", \"https://us-south.ml.cloud.ibm.com\")\n", |
There was a problem hiding this comment.
We want to use the get_env method from ibm-granite-community.utils. See how other notebooks get these secrets.
| " logger.error(\"WATSON_API_KEY or WATSON_PROJECT_ID environment variables not set\")\n", | ||
| " raise ValueError(\"Missing required environment variables\")\n", | ||
| "\n", | ||
| "llm = WatsonxLLM(\n", |
There was a problem hiding this comment.
Any reason why you are not using ChatWatsonx? In general, I have been trying to move the notebooks to use chat completion API.
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "def extract_contract_text(file_path, max_chars=32000):\n", |
There was a problem hiding this comment.
Please use type hints on function arguments and return value.
| "\n", | ||
| " try:\n", | ||
| " logger.info(f\"Classifying: {contract_name}\")\n", | ||
| " response = llm(prompt).strip()\n", |
There was a problem hiding this comment.
The prompt text is not properly formatted using the appropriate Granite chat template. This is a reason to use ChatWatsonx since the proper formatting will be done on the server-side. If you want to use WatsonxLLM (old completion API), then you will need to format the prompt locally. See the use of TokenizerChatPromtTemplate in other notebooks.
Using ChatWatsonx will also allow you to use structured responses which will provide more consistent results for the json schema.
| " langchain_ibm \\\n", | ||
| " langchain_community \\\n", | ||
| " transformers \\\n", | ||
| " mlx-vlm \n", |
There was a problem hiding this comment.
My guess is that installing mlx on a non-mac will be an error.
PR Checklist
Model Interaction
Data
Notebook requirements
%pip install git+https://github.com/ibm-granite-community/utilsin the first code cell in order to makeget_env_varavailable to accessing secrets and variables in the recipe.Incoming References
GitHub