Skip to content

Commit 2c82d5d

Browse files
Merge pull request #304 from sohanmaheshwar/jupyter-component
Jupyter Notebook component & RAG pipeline guide
2 parents bc5b341 + f3452fd commit 2c82d5d

File tree

3 files changed

+102
-2
lines changed

3 files changed

+102
-2
lines changed
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
import React from "react";
2+
3+
interface JupyterNotebookViewerProps {
4+
fileUrl: string;
5+
}
6+
7+
const JupyterNotebookViewer: React.FC<JupyterNotebookViewerProps> = ({ fileUrl }) => {
8+
const nbviewerUrl = `https://nbviewer.org/github/${encodeURIComponent(fileUrl)}`;
9+
10+
return (
11+
<div className="p-4">
12+
<iframe
13+
src={nbviewerUrl}
14+
width="100%"
15+
height="800px"
16+
style={{ border: "none" }}
17+
/>
18+
</div>
19+
);
20+
};
21+
22+
export default JupyterNotebookViewer;

pages/spicedb/ops/_meta.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
{
22
"observability": "Observability Tooling",
33
"deploying-spicedb-operator": "Deploying the SpiceDB Operator",
4-
"deploying-spicedb-on-eks": "Deploying SpiceDB on AWS EKS",
5-
"bulk-operations": "Bulk Importing Relationships"
4+
"deploying-spicedb-on-eks": "Deploying SpiceDB on Amazon EKS",
5+
"bulk-operations": "Bulk Importing Relationships",
6+
"secure-rag-pipelines": "Secure Your RAG Pipelines with Fine Grained Authorization"
67
}
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
import JupyterNotebookViewer from "@/components/JupyterNotebookViewer";
2+
3+
# Secure Your RAG Pipelines With Fine Grained Authorization
4+
5+
Here's how you can use SpiceDB to safeguard sensitive data in RAG pipelines.
6+
You will learn how to pre-filter and post-filter vector database queries with a list of authorized object IDs to improve security and efficiency.
7+
8+
This guide uses OpenAI, Pinecone, Langchain, Jupyter Notebook and SpiceDB
9+
10+
## Why is this important?
11+
12+
Building enterprise-ready AI poses challenges around data security, accuracy, scalability, and integration, especially in compliance-regulated industries like healthcare and finance.
13+
Firms are increasing efforts to mitigate risks associated with LLMs, particularly regarding sensitive data exfiltration of personally identifiable information and/or sensitive company data.
14+
The primary mitigation strategy is to build guardrails around Retrieval-Augmented Generation (RAG) to safeguard data while also optimizing query response quality and efficiency.
15+
16+
To enable precise guardrails, one must implement permissions systems with advanced fine grained authorization capabilities such as returning lists of authorized subjects and accessible resources.
17+
Such systems ensure timely access to authorized data while preventing exfiltration of sensitive information, making RAGs more efficient and improving performance at scale.
18+
19+
## Setup and Prerequisites
20+
21+
- Access to a [SpiceDB](https://authzed.com/spicedb) instance.
22+
You can find instructions for installing SpiceDB [here](https://authzed.com/docs/spicedb/getting-started/install/macos)
23+
- A [Pinecone account](https://www.pinecone.io/) and API key
24+
- An [OpenAI Platform account](https://platform.openai.com/docs/overview) and API key
25+
- [Jupyter Notebook](https://jupyter.org/) running locally
26+
27+
### Running SpiceDB
28+
29+
Once you've installed SpiceDB, run a local instance with this command in your terminal:
30+
31+
`spicedb serve --grpc-preshared-key rag-rebac-walkthrough`
32+
33+
and you should see something like this that indicates an instance of SpiceDB is running locally:
34+
35+
```
36+
8:28PM INF configured logging async=false format=auto log_level=inf
37+
o provider=zerolog
38+
8:28PM INF GOMEMLIMIT is updated GOMEMLIMIT=25769803776 package=git
39+
hub.com/KimMachineGun/automemlimit/memlimit
40+
8:28PM INF configured opentelemetry tracing endpoint= insecure=fals
41+
e provider=none sampleRatio=0.01 service=spicedb v=0
42+
8:28PM WRN this version of SpiceDB is out of date. See: https://git
43+
hub.com/authzed/spicedb/releases/tag/v1.39.1 latest-released-versio
44+
n=v1.39.1 this-version=v1.37.2
45+
8:28PM INF configuration ClusterDispatchCacheConfig.CacheKindForTes
46+
ting=(empty) ClusterDispatchCacheConfig.Enabled=true ClusterDispatc
47+
8:28PM INF using memory datastore engine
48+
8:28PM WRN in-memory datastore is not persistent and not feasible t
49+
8:28PM INF configured namespace cache defaultTTL=0 maxCost="32 MiB"
50+
8:28PM INF schema watch explicitly disabled
51+
8:28PM INF configured dispatch cache defaultTTL=20600 maxCost="164
52+
8:28PM INF configured dispatcher balancerconfig={"loadBalancingConfig":[{"consistent-hashring":{"replicationFactor":100,"spread":1}}]} concurrency-limit-check-permission=50 concurrency-limit-lookup-resources=50 concurrency-limit-lookup-subjects=50 concurrency-limit-reachable-resources=50
53+
8:28PM INF grpc server started serving addr=:50051 insecure=true network=tcp service=grpc workers=0
54+
8:28PM INF running server datastore=*schemacaching.definitionCachingProxy
55+
8:28PM INF http server started serving addr=:9090 insecure=true service=metrics
56+
8:28PM INF telemetry reporter scheduled endpoint=https://telemetry.authzed.com interval=1h0m0s next=5m14s
57+
```
58+
59+
#### Download the Jupyter Notebook
60+
61+
Clone the `workshops` [repository](https://github.com/authzed/workshops/) to your system and type `cd secure-rag-pipelines` to enter the working directory.
62+
63+
Start the `01-rag.ipynb` Notebook locally by typing `jupyter 01-rag.ipynb` (or `python3 -m notebook`) in your terminal.
64+
65+
## Add Fine Grained Authorization
66+
67+
Here's the Jupyter Notebook with step-by-step instructions
68+
69+
<JupyterNotebookViewer fileUrl="authzed/workshops/blob/main/secure-rag-pipelines/01-rag.ipynb" />
70+
71+
## Using DeepSeek or Google Colab
72+
73+
If you want to replace the OpenAI LLM with the DeepSeek (or any other) LLM, [check out this branch](https://github.com/authzed/workshops/tree/deepseek).
74+
It follows similar steps as the above guide, but uses the DeepSeek LLM via [OpenRouter](https://openrouter.ai/)
75+
76+
To run through this workshop on a cloud notebook, [here's a branch](https://github.com/authzed/workshops/tree/google-colab) that uses Google Colab.
77+
Note that this guide requires an instance of SpiceDB running on [AuthZed Serverless](https://app.authzed.com/) for which you can create a free account.

0 commit comments

Comments
 (0)