You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+47-9Lines changed: 47 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,6 +52,32 @@ RAG Studio can utilize the local file system or an S3 bucket for storing documen
52
52
53
53
S3 will also require providing the AWS credentials for the bucket.
54
54
55
+
### Vector Database Options
56
+
57
+
RAG Studio supports Qdrant (default), OpenSearch (Cloudera Semantic Search), and ChromaDB.
58
+
59
+
- To choose the vector DB, set `VECTOR_DB_PROVIDER` to one of `QDRANT`, `OPENSEARCH`, or `CHROMADB` in your `.env`.
60
+
61
+
#### ChromaDB Setup
62
+
63
+
If you select ChromaDB, configure the following environment variables in `.env`:
64
+
65
+
-`CHROMADB_HOST` - Hostname or URL for ChromaDB. Use `localhost` for local Docker.
66
+
-`CHROMADB_PORT` - Port for ChromaDB (default `8000`). Not required if `CHROMADB_HOST` starts with `https://` and the server infers the port.
67
+
-`CHROMADB_TENANT` - Optional. Defaults to the Chroma default tenant.
68
+
-`CHROMADB_DATABASE` - Optional. Defaults to the Chroma default database.
69
+
-`CHROMADB_TOKEN` - Optional. Include if your Chroma server requires an auth token.
70
+
-`CHROMADB_SERVER_SSL_CERT_PATH` - Optional. Path to PEM bundle for TLS verification when using HTTPS with a private CA.
71
+
-`CHROMADB_ENABLE_ANONYMIZED_TELEMETRY` - Optional. Enables anonymized telemetry in the ChromaDB client; defaults to `false`.
72
+
73
+
Notes:
74
+
75
+
- The local-dev script will automatically start a ChromaDB Docker container when `VECTOR_DB_PROVIDER=CHROMADB`, `CHROMADB_HOST=localhost` on `CHROMADB_PORT=8000`.
76
+
- ChromaDB collections are automatically namespaced using the tenant and database values to avoid conflicts between different RAG Studio instances.
77
+
- For production deployments, consider using a dedicated ChromaDB server with authentication enabled via `CHROMADB_TOKEN`.
78
+
- When using HTTPS endpoints, ensure your certificate chain is properly configured or provide the CA bundle path via `CHROMADB_SERVER_SSL_CERT_PATH`.
79
+
- Anonymized telemetry is disabled by default. You can enable it either by setting `CHROMADB_ENABLE_ANONYMIZED_TELEMETRY=true`.
80
+
55
81
### Enhanced Parsing Options:
56
82
57
83
RAG Studio can optionally enable enhanced parsing by providing the `USE_ENHANCED_PDF_PROCESSING` environment variable. Enabling this will allow RAG Studio to parse images and tables from PDFs. When enabling this feature, we strongly recommend using this with a GPU and at least 16GB of memory.
@@ -82,7 +108,7 @@ This variable can be set from the project settings for the AMP in CML.
82
108
## Air-gapped Environments
83
109
84
110
If you are using an air-gapped environment, you will need to whitelist at the minimum the following domains in order to use the AMP.
85
-
There may be other domains that need to be whitelisted depending on your environment and the model service provider you select.
111
+
There may be other domains that need to be whitelisted depending on your environment and the model service provider you select.
86
112
87
113
-`https://github.com`
88
114
-`https://raw.githubusercontent.com`
@@ -150,17 +176,29 @@ the Node service locally, you can do so by following these steps:
150
176
docker run -p 6333:6333 -p 6334:6334 -v $(pwd)/databases/qdrant_storage:/qdrant/storage:z qdrant/qdrant
0 commit comments