Skip to content

Commit b981f40

Browse files
adding langchain library guide
1 parent d0b913b commit b981f40

File tree

2 files changed

+392
-1
lines changed

2 files changed

+392
-1
lines changed

pages/spicedb/ops/_meta.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@
22
"observability": "Observability Tooling",
33
"deploying-spicedb-operator": "Deploying the SpiceDB Operator",
44
"ai-agent-authorization": "Authorization for AI Agents",
5-
"secure-rag-pipelines": "Secure Your RAG Pipelines with Fine Grained Authorization"
5+
"secure-rag-pipelines": "Secure Your RAG Pipelines with Fine Grained Authorization",
6+
"spicedb-langchain-langgraph-rag": "Fine-Grained Authorization for RAG using LangChain & LangGraph"
67
}
Lines changed: 390 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,390 @@
1+
# Fine-Grained Authorization for RAG Applications using LangChain (or LangGraph)
2+
3+
This guide explains how to enforce **fine-grained, per-document authorization** in Retrieval-Augmented Generation (RAG) pipelines using **SpiceDB**, **LangChain**, and **LangGraph**.
4+
5+
It demonstrates how to plug authorization directly into an LLM workflow using a post-retrieval filter powered by SpiceDB — ensuring that **every document used by the LLM has been explicitly authorized** for the requesting user.
6+
7+
---
8+
9+
## Overview
10+
11+
Modern AI-assisted applications use RAG to retrieve documents and generate responses.
12+
However, **standard RAG pipelines do not consider permissions** - meaning LLMs may hallucinate or leak information from unauthorized sources.
13+
14+
This guide shows how to solve that problem using:
15+
16+
- **SpiceDB** as the source of truth for authorization
17+
- **spicedb-rag-authorization** (library) for fast post-retrieval filtering
18+
- **LangChain** for LLM pipelines (or)
19+
- **LangGraph** for stateful, multi-step workflows and agents
20+
21+
The library implements **post-filter authorization**, meaning:
22+
23+
1. Retrieve the best semantic matches.
24+
2. Filter them using SpiceDB permission checks.
25+
3. Feed *only authorized documents* to the LLM.
26+
27+
---
28+
29+
## 1. Installation
30+
31+
The package is not yet published on PyPI.
32+
Install directly from GitHub:
33+
34+
```bash
35+
pip install "git+https://github.com/sohanmaheshwar/spicedb-rag-authorization.git#egg=spicedb-rag-auth[all]"
36+
```
37+
38+
Or clone locally:
39+
40+
```python
41+
import sys
42+
sys.path.append("/path/to/spicedb-rag-authorization")
43+
```
44+
45+
---
46+
47+
## 2. Prerequisites
48+
49+
This guide will demonstrate how to do fine-grained authorization with SpiceDB, for RAG running locally.
50+
To run in production, run SpiceDB on [AuthZed Cloud](https://authzed.com/docs/spicedb/getting-started/protecting-a-blog#create-a-permissions-system-on-authzed-cloud)
51+
52+
### Run SpiceDB locally
53+
54+
```bash
55+
docker run --rm -p 50051:50051 authzed/spicedb serve --grpc-preshared-key "sometoken" --grpc-no-tls
56+
```
57+
58+
### Create a SpiceDB schema
59+
60+
```
61+
definition user {}
62+
63+
definition article {
64+
relation viewer: user
65+
permission view = viewer
66+
}
67+
```
68+
69+
We use [zed](https://github.com/authzed/zed) - the CLI for SpiceDB, to write schema and relationships.
70+
Typically, this would be a gRPC/API call in your application.
71+
72+
```bash
73+
zed schema write <(cat << EOF
74+
definition user {}
75+
definition article {
76+
relation viewer: user
77+
permission view = viewer
78+
}
79+
EOF
80+
) --insecure
81+
```
82+
83+
### Add relationships
84+
85+
```bash
86+
zed relationship create article:doc1 viewer user:alice --insecure
87+
zed relationship create article:doc2 viewer user:bob --insecure
88+
zed relationship create article:doc4 viewer user:alice --insecure
89+
```
90+
91+
---
92+
93+
## 3. Document Metadata Requirements
94+
95+
Every document used in RAG **must include a resource ID** in metadata.
96+
This is what enables SpiceDB to check which `user` has what permissions for each `doc`.
97+
98+
```python
99+
Document(
100+
page_content="Example text",
101+
metadata={"article_id": "doc4"}
102+
)
103+
```
104+
105+
The metadata key must match the configured `resource_id_key`.
106+
107+
---
108+
109+
## 4. LangChain Integration
110+
111+
This is the simplest way to add authorization to a LangChain RAG pipeline.
112+
113+
[LangChain](https://www.langchain.com/langchain) is a framework for building LLM-powered applications by composing modular components such as retrievers, prompts, memory, tools, and models.
114+
It provides a high-level abstraction called the LangChain Expression Language (LCEL) which lets you construct RAG pipelines as reusable, declarative graphs — without needing to manually orchestrate each step.
115+
116+
You would typically use LangChain when:
117+
118+
- You want a composable pipeline that chains together retrieval, prompting, model calls, and post-processing.
119+
- You are building a RAG system where each step (retriever → filter → LLM → parser) should be easily testable and swappable.
120+
- You need integrations with many LLM providers, vector stores, retrievers, and tools.
121+
- You want built-in support for streaming, parallelism, or structured output.
122+
123+
LangChain is an excellent fit for straightforward RAG pipelines where the control flow is mostly linear.
124+
For more complex, branching, stateful, or agent-style workflows, you would likely [choose LangGraph](#5-langgraph-integration) instead.
125+
126+
**Core component:** `SpiceDBAuthFilter` or `SpiceDBAuthLambda`.
127+
128+
### Example Pipeline
129+
130+
```python
131+
auth = SpiceDBAuthFilter(
132+
spicedb_endpoint="localhost:50051",
133+
spicedb_token="sometoken",
134+
resource_type="article",
135+
resource_id_key="article_id",
136+
)
137+
```
138+
139+
Build your chain once:
140+
141+
```python
142+
chain = (
143+
RunnableParallel({
144+
"context": retriever | auth, # Authorization happens here
145+
"question": RunnablePassthrough(),
146+
})
147+
| prompt
148+
| llm
149+
| StrOutputParser()
150+
)
151+
```
152+
153+
Invoke:
154+
155+
```python
156+
# Pass user at runtime - reuse same chain for different users
157+
answer = await chain.ainvoke(
158+
"Your question?",
159+
config={"configurable": {"subject_id": "alice"}}
160+
)
161+
162+
# Different user, same chain
163+
answer = await chain.ainvoke(
164+
"Another question?",
165+
config={"configurable": {"subject_id": "bob"}}
166+
)
167+
```
168+
169+
---
170+
171+
## 5. LangGraph Integration
172+
173+
[LangGraph](https://www.langchain.com/langgraph) is a framework for building stateful, multi-step, and branching LLM applications using a graph-based architecture.
174+
Unlike LangChain’s linear pipelines, LangGraph allows you to define explicit nodes, edges, loops, and conditional branches — enabling **deterministic**, reproducible, agent-like workflows.
175+
176+
You would choose LangGraph when:
177+
178+
- You are building multi-step RAG pipelines (retrieve → authorize → rerank → generate → reflect).
179+
- Your application needs state management across steps (conversation history, retrieved docs, user preferences).
180+
- You require a strong separation of responsibilities (e.g., retriever node, authorization node, generator node).
181+
182+
LangGraph is ideal for more advanced AI systems, such as conversational RAG assistants, agents with tool-use, or pipelines with complex authorization or business logic.
183+
184+
The library provides:
185+
186+
- `RAGAuthState` — a TypedDict defining the required state fields
187+
- `create_auth_node()` — auto-configured authorization node
188+
- `AuthorizationNode` — reusable class-based node
189+
190+
---
191+
192+
## 5.1 LangGraph Example
193+
194+
```python
195+
from langgraph.graph import StateGraph, END
196+
from spicedb_rag_auth import create_auth_node, RAGAuthState
197+
from langchain_openai import ChatOpenAI
198+
from langchain_core.prompts import ChatPromptTemplate
199+
200+
# Use the provided RAGAuthState TypedDict
201+
graph = StateGraph(RAGAuthState)
202+
203+
# Define your nodes
204+
def retrieve_node(state):
205+
"""Retrieve documents from vector store"""
206+
docs = retriever.invoke(state["question"])
207+
return {"retrieved_documents": docs}
208+
209+
def generate_node(state):
210+
"""Generate answer from authorized documents"""
211+
# Create prompt
212+
prompt = ChatPromptTemplate.from_messages([
213+
("system", "Answer based only on the provided context."),
214+
("human", "Question: {question}\n\nContext:\n{context}")
215+
])
216+
217+
# Format context from authorized documents
218+
context = "\n\n".join([doc.page_content for doc in state["authorized_documents"]])
219+
220+
# Generate answer
221+
llm = ChatOpenAI(model="gpt-4o-mini")
222+
messages = prompt.format_messages(question=state["question"], context=context)
223+
answer = llm.invoke(messages)
224+
225+
return {"answer": answer.content}
226+
227+
# Add nodes
228+
graph.add_node("retrieve", retrieve_node)
229+
graph.add_node("authorize", create_auth_node(
230+
spicedb_endpoint="localhost:50051",
231+
spicedb_token="sometoken",
232+
resource_type="article",
233+
resource_id_key="article_id",
234+
))
235+
graph.add_node("generate", generate_node)
236+
237+
# Wire it up
238+
graph.set_entry_point("retrieve")
239+
graph.add_edge("retrieve", "authorize")
240+
graph.add_edge("authorize", "generate")
241+
graph.add_edge("generate", END)
242+
243+
# Compile and run
244+
app = graph.compile()
245+
result = await app.ainvoke({
246+
"question": "What is SpiceDB?",
247+
"subject_id": "alice",
248+
})
249+
250+
print(result["answer"]) # The actual answer to the question
251+
```
252+
253+
---
254+
255+
## 5.2 Extending State with LangGraph
256+
257+
Add custom fields to track additional state like conversation history, user preferences, or metadata.
258+
259+
```python
260+
class MyCustomState(RAGAuthState):
261+
user_preferences: dict
262+
conversation_history: list
263+
264+
graph = StateGraph(MyCustomState)
265+
# ... add nodes and edges
266+
```
267+
268+
**When to use:**
269+
270+
- Multi-turn conversations that need history
271+
- Personalized responses based on user preferences
272+
- Complex workflows requiring additional context
273+
274+
**Example use case:** A chatbot that remembers previous questions and tailors responses based on user role (engineer vs manager).
275+
276+
---
277+
278+
## 5.3 Reusable Class-Based Authorization Node
279+
280+
Create reusable authorization node instances that can be shared across multiple graphs or configured with custom state key mappings.
281+
282+
```python
283+
from spicedb_rag_auth import AuthorizationNode
284+
285+
auth_node = AuthorizationNode(
286+
spicedb_endpoint="localhost:50051",
287+
spicedb_token="sometoken",
288+
resource_type="article",
289+
resource_id_key="article_id",
290+
)
291+
292+
graph = StateGraph(RAGAuthState)
293+
graph.add_node("authorize", auth_node)
294+
```
295+
296+
You can define it once and reuse everywhere.
297+
298+
```python
299+
article_auth = AuthorizationNode(resource_type="article", ...)
300+
video_auth = AuthorizationNode(resource_type="video", ...)
301+
302+
# Use in multiple graphs
303+
blog_graph.add_node("auth", article_auth)
304+
media_graph.add_node("auth", video_auth)
305+
learning_graph.add_node("auth_articles", article_auth)
306+
```
307+
308+
**When to use:**
309+
310+
- Multiple graphs need the same authorization logic
311+
- Your state uses different key names than the defaults
312+
- Building testable code (easy to swap prod/test instances)
313+
- Team collaboration (security team provides authZ nodes)
314+
315+
**Example use case:** A multi-resource platform (articles, videos, code snippets) where each resource type has its own authorization node that's reused across different workflows.
316+
317+
For production applications, you'll often use a mix of Option 2 and 3: A custom state for your workflow + reusable authZ nodes for flexibility.
318+
Here's an example:
319+
320+
```python
321+
class CustomerSupportState(RAGAuthState):
322+
conversation_history: list
323+
customer_tier: str
324+
sentiment_score: float
325+
326+
docs_auth = AuthorizationNode(resource_type="support_doc", ...)
327+
kb_auth = AuthorizationNode(resource_type="knowledge_base", ...)
328+
329+
graph = StateGraph(CustomerSupportState)
330+
graph.add_node("auth_docs", docs_auth)
331+
graph.add_node("auth_kb", kb_auth)
332+
```
333+
334+
---
335+
336+
## 6. Metrics & Observability
337+
338+
The library exposes:
339+
340+
- number of retrieved documents
341+
- number authorized
342+
- denied resource IDs
343+
- latency per SpiceDB check
344+
345+
### In LangChain
346+
347+
```python
348+
auth = SpiceDBAuthFilter(..., subject_id="alice", return_metrics=True)
349+
result = await auth.ainvoke(docs)
350+
351+
print(result.authorized_documents)
352+
print(result.total_authorized)
353+
print(result.check_latency_ms)
354+
# ... all other metrics
355+
```
356+
357+
### In LangGraph
358+
359+
Metrics appear in `auth_results` in the graph state.
360+
361+
```python
362+
graph = StateGraph(RAGAuthState)
363+
# ... add nodes including create_auth_node()
364+
365+
result = await app.ainvoke({"question": "...", "subject_id": "alice"})
366+
367+
# Access metrics from state
368+
print(result["auth_results"]["total_retrieved"])
369+
print(result["auth_results"]["total_authorized"])
370+
print(result["auth_results"]["authorization_rate"])
371+
print(result["auth_results"]["denied_resource_ids"])
372+
print(result["auth_results"]["check_latency_ms"])
373+
```
374+
375+
---
376+
377+
## 7. Complete Example
378+
379+
See the full example in the [repo here](<https://github.com/sohanmaheshwar/spicedb-rag-authorization>)
380+
381+
- `langchain_example.py`
382+
- `README_langchain.md`
383+
384+
---
385+
386+
## 8. Next Steps
387+
388+
- Read [this guide](https://authzed.com/blog/building-a-multi-tenant-rag-with-fine-grain-authorization-using-motia-and-spicedb) on creating a production-grade RAG with SpiceDB & Motia.dev
389+
- Check out this [self-guided workshop](https://github.com/authzed/workshops/tree/main/secure-rag-pipelines) for a closer look at how fine-grained authorization with SpiceDB works in RAG.
390+
This guide also includes the pre-filtration technique.

0 commit comments

Comments
 (0)