Blog: Granite Switch in Mellea by planetf1 · Pull Request #59 · generative-computing/mellea-website

planetf1 · 2026-06-01T10:46:25Z

Closes #58

Summary

Granite Switch bakes a curated set of Granite intrinsics into a single
vLLM-served checkpoint. This post leads with the end-user value
(adding validation, requirement checks, hallucination detection to a
Mellea program is a single function call), shows two intrinsics
running end-to-end against a live vLLM server, and explains the
architectural shift that makes it work — the routing between
behaviours is part of the model itself, not orchestration around it.

Content checklist

Hook — drop-in validation as the value proposition
How it works — routing baked into the model (LoRA-hot-swap and
LLM-as-judge framed as the alternatives the reader knows)
Setup — granite-switch plugin install + vllm serve + mellea install
Demo — answerability with real output ("answerable"/"unanswerable")
Demo — hallucination detection with sentence-level output
When this fits — light closer; links to intrinsics overview docs
Try it — links to repo, examples, docs, model card
Header image (hub-and-spoke SVG)

Technical validation

markdownlint: 0 errors
Live smoke test confirmed against
ibm-granite/granite-switch-4.1-3b-preview: both code snippets run
clean; output values ("answerable"/"unanswerable",
"faithful"/"unfaithful") match the post exactly.

Verification status

Platform	Runtime	Status
Linux (IBM LSF, via `bvllm`)	vLLM	Verified — both snippets executed end-to-end, output matches blog
macOS	?	Open — Switch doesn't support Ollama; investigating what the macOS path looks like.

To run vLLM for validation (internal tooling — bvllm):

bv run ibm-granite/granite-switch-4.1-3b-preview

Launches vLLM on an IBM LSF cluster and returns an OpenAI-compatible
endpoint.

Open questions for the reviewer

Each of these has a callout at the relevant point in the post; this
list is the consolidated todo. The post is honest about being preview
software, but several setup details are stricter than upstream docs
and need a re-test before publish.

granite-switch[vllm20] vs granite-switch[vllm] — upstream
granite-switch README leads with [vllm] (broad CUDA 12.x compat,
vLLM 0.19.x); we've only validated [vllm20] (CUDA 13+, vLLM 0.20+)
on LSF. Decide which to lead with, and consider showing only one to
keep install simple.
--enable-auto-tool-choice --tool-call-parser granite4 — these
flags were added after bvllm testing showed intrinsics didn't
dispatch without them. The upstream granite-switch README, the HF
model card, and docs/docs/integrations/openai.md all omit them.
Re-test against a vanilla vllm serve <model> invocation; if
dispatch works without them, drop to match upstream.
macOS path — Switch doesn't run under Ollama. A macOS option is
under investigation; nothing confirmed. Expand the setup section if
a macOS path lands before merge.

Claims avoided

Switch is not framed as a "better aLoRA" (IBM frames it as
coarse-grained expert switching).
No claim of multi-backend support — OpenAIBackend only on main
today.
Adapter selection is described as a chat-template control token, not
a runtime API call.
No production recommendation — Switch model IDs are -preview.

Notes for reviewer

All code snippets match the logic in
docs/examples/granite-switch/ in the main Mellea repository — they're
tested by the e2e suite and confirmed against a live vLLM instance.

The three reviewer callouts in the post (one per open question above)
are tagged **Reviewer note —** so they're easy to grep before publish.

FYI @ajbozarth — this is a candidate to publish soon; flagging early in
case you want to weigh in on framing or timing.

Introduces Granite Switch as a delivery mechanism for Mellea intrinsics — single vLLM-served checkpoint, no adapter-weight lifecycle management. Shows answerability checking and hallucination detection end-to-end. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Hub-and-spoke SVG showing one Granite Switch checkpoint serving five intrinsic capabilities (answerability, hallucination, citations, req. check, guardian), colour-coded by library family (RAG/Core/Safety). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Flags vLLM/Linux as the validated path and notes omlx/vmlx as unvalidated macOS alternatives pending end-to-end smoke-test. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

- Add granite-switch plugin package install (registers GraniteSwitchForCausalLM architecture with vLLM; without it vLLM crashes immediately on model load) - Add required vLLM flags: --enable-auto-tool-choice and --tool-call-parser granite4 (without the latter, adapter dispatch silently fails) - Replace "What this post does" callout with prerequisites callout - Fix faithfulness output values: faithful/unfaithful (lowercase) — confirmed against live ibm-granite/granite-switch-4.1-3b-preview - Fix pip install quoting: pip install 'mellea[switch]' - Remove "(OpenAI-compatible)" qualifier from table Runtime cell - Strengthen hook: matrix framing; "zero adapter files to manage" - Strengthen trade-off section: eliminate lifecycle management entirely - Strengthen operational cost section: N×M matrix argument; adding a new intrinsic is a code change, not an infrastructure change - Strengthen vision section: "just a request parameter", clear path to broad runtime support, same code path in prod and on a laptop Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

planetf1

Code and content review — all runnable snippets validated against a live ibm-granite/granite-switch-4.1-3b-preview vLLM instance.

What's landing well: the intrinsics-vs-Switch distinction in the second section is sharp and exactly right. The demo flows cleanly from setup → answerability → hallucination detection on the same backend object — that's the core "same call, different intrinsic" point in action.

- Callout now flags the server-side plugin as a prerequisite - Setup section distinguishes server environment (granite-switch plugin) from application environment (mellea[switch]) - Inline comment explains api_key="EMPTY" for readers unfamiliar with vLLM's API key handling Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

- "twelve-plus intrinsics" → "a dozen" (exact count from model_ids.py is 12; future-proofed phrasing as the curated set evolves) - Adapter config download description now mentions JSON + YAML, matching the later paragraph that names adapter_index.json and per-adapter io.yaml - Linked granitelib-{rag,core,guardian} HF repos at first mention - Linked the granite-switch plugin package on PyPI at first mention - Try-it section now points at the top-level Mellea repo and the intrinsics overview docs in addition to the OpenAI-backend page Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

The earlier draft singled out two specific Apple Silicon MLX servers (omlx, vmlx) as candidates. The investigation is broader than that — the open question is "what's the macOS story for Switch given Ollama doesn't support it", not a comparison of named MLX runtimes. Reword the reviewer note to match that framing. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

The earlier draft led with "managing adapter weights is painful" — but that hook only resonates with readers already running intrinsics on PEFT. Most readers come in cold and want to know what they get. Reframed the post around the value users actually see: dropping validation and requirement checks into a Mellea program is one function call. The hook now shows the trio of calls (check_answerability, flag_hallucinated_content, requirement_check) and the rest of the post follows: what Switch is, how to run it, the demo, and where it fits. Cuts: - "What it costs to ship" — its core point ("a code change, not an infrastructure change") moved into the hook; the matrix-of-binaries detail was mellea-internal noise - "Where this is going" — issue/epic references and roadmap detail read as project status and made the post feel WIP-heavy - "When to reach for Switch vs PEFT" comparison table — replaced with a short "When this fits" closer that links to the intrinsics overview docs for readers who want the full picture - The intrinsics-vs-Switch terminology section is now a single tight "How it works" paragraph The reader still gets enough to run it end-to-end; everything beyond that lives behind the docs links in "Try it". Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Previous version said the validators are "baked into the model weights" but stopped there. The genuinely distinctive thing about Switch — the reason it can be drop-in where alternatives can't — is that the *routing* between validators is part of the model itself, not an orchestration layer wrapped around it. Spelled that out by contrasting with the two mechanisms a reader is most likely to know (LoRA hot-swap and LLM-as-judge), in one short paragraph each. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

The previous single reviewer note covered macOS only. Cross-checking against the granite-switch repo README, the HF model card, and the mellea OpenAI backend doc surfaced two more environment questions that need verification before merge: 1. Whether to lead with `granite-switch[vllm20]` (CUDA 13+) or `granite-switch[vllm]` (broader CUDA 12.x compat). Upstream prefers `[vllm]`; we've only validated `[vllm20]` on LSF. 2. Whether `--enable-auto-tool-choice --tool-call-parser granite4` is actually required. Upstream sources all omit it, but internal testing on bvllm needed it to make intrinsics dispatch. Each callout now sits at the install/serve line it relates to, so a reviewer reading the post can resolve them in place. The macOS note keeps its existing position. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Retitle to "one checkpoint, every intrinsic" — drops "adapter wrangling" framing the rewritten hook no longer leans on, and matches the SVG tagline. Removes the trailing "the dispatch happens inside OpenAIBackend" aside which is mellea-internal detail readers don't need. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

ajbozarth

Overall looks good, only one note outside the items you've already called out, and making sure to update the date to the actual expected date when you're ready to schedule.

ajbozarth · 2026-06-02T22:07:41Z

+
+## Setting it up
+
+The [`granite-switch`](https://pypi.org/project/granite-switch/) plugin package


The macOS-not-supported note currently lives only inside the reviewer-note callout below, which gets deleted before merge. Worth surfacing it at the top of this section in some form so macOS readers can bail before pip install.

It's my intent to sort out the macOS limitation if I can as noted in the pr. I'll keep this conv. open as a reminder of that action item, since all the callouts need removing and the overall body needs to flow. I just need to try it out on macOS

planetf1 · 2026-06-03T10:09:31Z

Terminology / timing note

The post uses intrinsic and the current class/import names (Intrinsic, from mellea.stdlib.components.intrinsic import ...). generative-computing/mellea#929 (Epic: Fix Intrinsic Adapter Lifecycle & Consistency) will change these class names and import paths, but it's a multi-PR epic with no firm date.

Plan: build the example against the current code rather than waiting on the rename. For this pass we'll just get the positioning terminology right — lead with "adapter function" (the term already shipped in the public Granite Switch material), and define "intrinsic" as the earlier name. The example's class names stay as-is for now; we'll update the blog to the new symbols once the epic lands (the deprecation plan keeps the current code runnable in the meantime).

Separately, one dependency on the epic side: the epic's own naming work still needs to realign with the shipped Granite Switch vocabulary. Its current target (AdapterBasedComponent, treated as a placeholder pending an IBM decision) predates "adapter function" being public.

planetf1 · 2026-06-03T10:16:55Z

Blog updated: terminology + background

Reader-facing prose now leads with "adapter function" to match the public Granite Switch vocabulary; "intrinsic" is defined once as the earlier name. Code symbols are kept as the current code (mellea.stdlib.components.intrinsic, the load_embedded_adapters flag, the function names) — see the timing note above; we'll update those once the rename lands.

Also added:

A definition bridge for adapter function (task-specific capability with a defined I/O contract) and the intrinsic→adapter-function naming note.
Background on activated LoRA (aLoRA) — shared KV cache, control-token activation — and the Granite Libraries (Core / RAG / Guardian) mapping to the core/rag/guardian modules.
Links to the granite-switch source repo (plugin source + Try it section).

Open items parked for the next pass (previously inline reviewer notes, removed from the draft so they don't ship):

[vllm20] vs [vllm] install extra. Upstream granite-switch README recommends [vllm] as the broad-compat default and [vllm20] for newer-CUDA performance. [vllm20] is validated end-to-end in our internal environment; the [vllm] path hasn't been re-confirmed. Decide which to lead with before merge — and consider showing only one to keep the install simple.
--enable-auto-tool-choice --tool-call-parser granite4 flags. Added after internal testing where the adapter functions didn't dispatch without them. The upstream granite-switch README, the HF model card, and the Mellea OpenAI integration docs all omit the flags. Re-test against a vanilla vllm serve <model> before publish; if dispatch works without them, drop them to match upstream and simplify the snippet.
macOS path. vLLM on Linux is the validated path. Switch doesn't run under Ollama, so a macOS option is still being investigated — nothing confirmed yet. Expand the setup section if a macOS path lands before merge.

…search blog - Add LocalHFBackend macOS path (pip install mellea[hf], no server needed; MPS auto-selected on Apple Silicon) - Resolve reviewer note 1: show both [vllm] and [vllm20] with their version ranges stated explicitly - Resolve reviewer note 2: drop --enable-auto-tool-choice and --tool-call-parser granite4 — not needed, not in upstream docs, confirmed against live deployment - Add aLoRA KV cache advantage paragraph (aligns with IBM Research blog) - Add 51% → 84% IFEval accuracy number for requirement-check - Bridge adapter functions / intrinsics terminology for readers coming from the IBM Research blog - Add experimental qualifier in When this fits section - Update Try it and opening callout for both paths - Verified: macOS (LocalHFBackend + MPS) and vLLM (upstream 0.20.2 on BlueVela) both produce correct intrinsic outputs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove --enable-auto-tool-choice --tool-call-parser granite4 from the Try it install line — contradicted the setup section which correctly omits them; caught in Opus review - Add "16 GB unified memory recommended" to macOS setup note Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

planetf1 · 2026-06-08T16:39:03Z

FYI @ajbozarth — publish date updated to 2026-06-11 (this Wednesday) as you flagged.

ajbozarth

Re-reviewing as we move this out of draft with the latest updates. Three reviewer notes from the description are resolved cleanly, and the macOS path is in good shape end-to-end. Manual walkthrough surfaced two structural issues with the Linux-first ordering and a couple of cosmetic notes.

ajbozarth · 2026-06-08T19:01:59Z

+
+## Setting it up
+
+The [`granite-switch`](https://pypi.org/project/granite-switch/) plugin package


Consider leading the section with the macOS path instead of vLLM. Right now a macOS reader has to skim past the full server-side install + vllm serve block before reaching the LocalHFBackend instructions at line 104 — and most blog readers trying this out are on a laptop, not a Linux GPU server. Reordering so macOS comes first matches the more likely reader journey and frames vLLM as the "graduate to production" path (which is already how "When this fits" positions them). Same applies to the backend code section below.

ajbozarth · 2026-06-08T19:01:59Z

+```python
+from mellea.backends.model_ids import IBM_GRANITE_SWITCH_4_1_3B_PREVIEW
+from mellea.backends.openai import OpenAIBackend
+from mellea.formatters import TemplateFormatter
+from mellea.stdlib.components import Document, Message
+from mellea.stdlib.components.intrinsic import rag
+from mellea.stdlib.context import ChatContext
+
+MODEL = IBM_GRANITE_SWITCH_4_1_3B_PREVIEW.hf_model_name
+backend = OpenAIBackend(
+    model_id=MODEL,
+    formatter=TemplateFormatter(model_id=MODEL),
+    base_url="http://localhost:8000/v1",
+    api_key="EMPTY",  # vLLM doesn't validate API keys — any string works
+    load_embedded_adapters=True,
+)
+```
+
+The `load_embedded_adapters=True` flag tells Mellea to fetch the I/O configuration
+files for each intrinsic from the Hugging Face model repo — a few kilobytes of JSON
+and YAML, not adapter weights — and register the embedded adapters automatically.
+
+**On macOS**, replace the backend setup with two lines — everything after this is
+unchanged:
+
+```python
+from mellea.backends.huggingface import LocalHFBackend
+
+backend = LocalHFBackend(model_id="ibm-granite/granite-4.1-3b")
+```


Same lead-with-macOS reordering applies here, plus a correctness issue with the current structure. The macOS swap at lines 148-152 only includes the LocalHFBackend import and the backend = ... line — but the answerability and hallucination snippets below depend on Document, Message, rag, and ChatContext, which are only imported in the Linux block above (lines 124-129). A reader who takes "everything after this is unchanged" literally will hit NameError (confirmed in manual walkthrough).

Simplest fix: repeat the four shared imports in the macOS block so it's self-contained. Combined with leading macOS first, the section becomes "here's the macOS block, here's the Linux swap" — and either block runs standalone.

ajbozarth · 2026-06-08T19:01:59Z

+```python
+from mellea.backends.huggingface import LocalHFBackend
+
+backend = LocalHFBackend(model_id="ibm-granite/granite-4.1-3b")


Manual walkthrough note — not a blog issue, flagging for awareness: this line prints `torch_dtype` is deprecated! Use `dtype` instead! from transformers. Still present on main at mellea/backends/huggingface.py:334. Worth a follow-up issue against the mellea repo.

ajbozarth · 2026-06-08T19:01:59Z

+Install Mellea in your **application environment**:
+
+```bash
+pip install 'mellea[switch]'


Quote style is mixed across the install lines ('mellea[switch]' here vs "mellea[hf]" at line 109, and again at line 212). Cosmetic.

Suggested change

pip install 'mellea[switch]'

pip install "mellea[switch]"

ajbozarth · 2026-06-08T19:01:59Z

+matters when you're chaining several validators in a single request.
+The accuracy improvement is real too: on IFEval, prompting the base
+Granite 4.1 3B model for requirement checking achieves 51% balanced
+accuracy; the embedded requirement-check adapter reaches 84%.


Consider linking the number to its source (IBM Research blog or model card) — defensible if a reader asks where it came from.

ajbozarth · 2026-06-08T19:03:22Z

@planetf1 in addition to the review above don't forget to move this out of draft, also:

publish date updated to 2026-06-11 (this Wednesday)

the 11th is fine, but that's Thursday not Wednesday

planetf1 added 4 commits June 1, 2026 11:45

docs(blog): add reviewer note on runtime validation

666827b

Flags vLLM/Linux as the validated path and notes omlx/vmlx as unvalidated macOS alternatives pending end-to-end smoke-test. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

planetf1 commented Jun 1, 2026

View reviewed changes

planetf1 added 7 commits June 1, 2026 13:38

ajbozarth reviewed Jun 2, 2026

View reviewed changes

planetf1 and others added 3 commits June 8, 2026 16:07

docs(blog): set publish date to 2026-06-11

b607661

ajbozarth reviewed Jun 8, 2026

View reviewed changes


		## Setting it up

		The [`granite-switch`](https://pypi.org/project/granite-switch/) plugin package

Conversation

planetf1 commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Content checklist

Technical validation

Verification status

Open questions for the reviewer

Claims avoided

Notes for reviewer

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 commented Jun 3, 2026

Uh oh!

planetf1 commented Jun 3, 2026

Uh oh!

planetf1 commented Jun 8, 2026

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

planetf1 commented Jun 1, 2026 •

edited

Loading