Skip to content

Commit 2f3e3b8

Browse files
fenosCoolAssPuppyMildTomato
authored
blog: vector buckets (supabase#40911)
* blog: vector buckets * Adjusted formatting and cleaned up opening lede. Still need og image * Fixed the lede to lead with the announcement. * Update vector buckets blog images and formatting Replaced the blog post thumbnail with a new image, updated the og.png image, and improved formatting and indentation in the 2025-12-01-vector-buckets.mdx file for clarity and consistency. --------- Co-authored-by: Prashant Sridharan <[email protected]> Co-authored-by: Jonathan Summers-Muir <[email protected]>
1 parent 866185d commit 2f3e3b8

File tree

4 files changed

+319
-1
lines changed

4 files changed

+319
-1
lines changed
Lines changed: 311 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,311 @@
1+
---
2+
title: 'Introducing Vector Buckets'
3+
description: 'Introducing vector storage in Supabase: a durable storage layer with similarity search built-in.'
4+
author: fabrizio
5+
image: 2025-12-01-vector-buckets/og.png?v=3
6+
thumb: 2025-12-01-vector-buckets/thumb.png?v=3
7+
categories:
8+
- product
9+
date: '2025-12-01'
10+
toc_depth: 2
11+
---
12+
13+
Today, we're introducing [Vector Buckets](/docs/guides/storage/vector/introduction), a new storage option that gives you the durability and cost efficiency of Amazon S3 with built-in similarity search.
14+
15+
Vector search is becoming a core primitive for modern apps: semantic search, recommendations, RAG, image and audio similarity, and more.
16+
17+
Supabase already gives you powerful tools for vectors, such as `pgvector` in Postgres. With Vector Buckets, you now have more options for how you store vectors:
18+
19+
- Use pgvector for smaller, latency-sensitive datasets that belong tightly in your database.
20+
- Use Vector Buckets when you need to store a large amount of vectors—up to tens of millions—on a durable storage layer with similarity search built in.
21+
22+
## What are Vector Buckets?
23+
24+
**Vector Buckets** are a new bucket type in Supabase Storage.
25+
26+
Conceptually:
27+
28+
- A **Vector Bucket** is where your vector indexes live.
29+
- Inside each bucket, you define one or more **vector indexes** (for example: `documents-openai`).
30+
- Each index stores high-dimensional vectors plus optional metadata.
31+
- You query those indexes using Supabase clients or directly from Postgres via a foreign data wrapper.
32+
33+
## What do Vector Buckets bring to the table?
34+
35+
### Scalable vector storage for large datasets
36+
37+
Embeddings add up quickly: thousands of floats per vector, multiplied by millions of items.
38+
39+
Instead of putting everything in Postgres, Vector Buckets store your embeddings in S3-backed object storage, which gives you:
40+
41+
- Capacity for tens of millions of vectors per index
42+
- A storage layer designed for large, durable datasets
43+
- Room to keep full archives of vectors without over-optimising your Postgres schema or worrying about table bloat
44+
45+
Your vectors live in a storage layer built for large datasets, while you still query them through Postgres.
46+
47+
### Built-in similarity search
48+
49+
Vector Buckets are not just blobs of float arrays. Each index supports similarity search out of the box.
50+
51+
Similarity search lets you find items that are conceptually related based on their vector representations, not just exact keyword matches. That’s what powers:
52+
53+
- Semantic document search (“find content about this topic, even if the keywords differ”)
54+
- Product and content recommendations (“find items similar to this one”)
55+
- Image, audio, or video similarity (“find assets that look or sound like this”)
56+
- De-duplication and near-duplicate detection across large media libraries
57+
58+
With Vector Buckets, you can:
59+
60+
- Insert vectors with a key, a float32 vector, and metadata
61+
- Run k-NN queries (for example, “return the 20 closest vectors to this embedding”)
62+
- Use a familiar distance metric such as cosine similarity
63+
- Ask for distances and metadata along with the results
64+
65+
No extra vector database to run, no new query language. Just vector indexes with search, available from the same Supabase SDKs you already use or directly via Postgres.
66+
67+
### Performance that fits most app workflows
68+
69+
Vector Buckets are designed to provide sub-second similarity search over large datasets, which is more than enough for:
70+
71+
- Backend workflows and batch processing
72+
- AI agents and background jobs
73+
- Dashboards and internal tools
74+
- Many user-facing features where “fast” means hundreds of milliseconds, not single-digit milliseconds
75+
76+
If you’re chasing ultra-low latency at very high QPS, `pgvector` in a tuned Postgres cluster (or a dedicated vector database) remains the best place to push performance. Vector Buckets focus on simple, scalable similarity search at large scale, not on being the absolute fastest option.
77+
78+
### Metadata filtering
79+
80+
Each vector can include an arbitrary metadata object, for example:
81+
82+
```tsx
83+
metadata: {
84+
title: 'Getting started with Vector Buckets',
85+
type: 'doc',
86+
language: 'en',
87+
project_id: '1234',
88+
}
89+
90+
```
91+
92+
You can:
93+
94+
- Filter by metadata during similarity search (e.g. `type = 'doc' AND language = 'en'`)
95+
- Query through Postgres and join the results with your relational tables
96+
- Build multi-tenant or multi-project search just by encoding tenant/project IDs into metadata
97+
98+
This makes it easy to build domain-aware, tenant-aware semantic search.
99+
100+
## When should you use Vector Buckets vs `pgvector`?
101+
102+
Vector Buckets and `pgvector` are complementary. They serve different roles and work best together.
103+
104+
### Use `pgvector` when…
105+
106+
- You’re optimizing for **lowest possible latency** on user-facing queries
107+
- Vectors are **part of your core relational model** (for example, a column on `documents` or `products`)
108+
- You want **transactional guarantees** (data and embeddings written together)
109+
- Your vector dataset is **small to medium** and you’re comfortable scaling Postgres specifically for vector workloads
110+
111+
### Use Vector Buckets when…
112+
113+
- You want **S3-style durability and scale** for embeddings
114+
- You’re dealing with a **large amount of vectors** (up to tens of millions) that you don’t want sitting in Postgres
115+
- You’re building **AI-heavy Supabase apps** (semantic search, recommendations, RAG, media similarity) and want a managed vector storage tier
116+
- You prefer a clear split between:
117+
- **Hot vectors** in `pgvector` for the highest-traffic / most latency-sensitive queries
118+
- **Warm or cold vectors** in Vector Buckets for everything else
119+
120+
In practice, many apps will use both:
121+
122+
- Keep your most frequently queried vectors (for example, current content, top products) in `pgvector`.
123+
- Store the full archive (older content, long tail SKUs, historical embeddings, large media corpora) in Vector Buckets.
124+
125+
## How do Vector Buckets work?
126+
127+
At a high level, here’s what happens under the hood:
128+
129+
**1. Vector Bucket in Supabase Storage**
130+
131+
You create a bucket of type Vector Bucket in the Dashboard or via API.
132+
133+
```jsx
134+
import { createClient } from '@supabase/supabase-js'
135+
136+
const supabase = createClient('https://your-project.supabase.co', 'your-service-key')
137+
138+
await supabase.storage.vectors.createBucket('embeddings')
139+
```
140+
141+
**2. Create Vector indexes inside the bucket**
142+
143+
Inside the Vector Bucket, you create one or more indexes.
144+
145+
```jsx
146+
// Create an index in that bucket
147+
await supabase.storage.vectors.from('embeddings').createIndex('documents-openai', {
148+
dimension: 1536,
149+
distanceMetric: 'cosine',
150+
})
151+
```
152+
153+
**3. Store vectors**
154+
155+
You can store vectors directly from the SDK, an Edge Function, or Postgres.
156+
157+
```jsx
158+
// Postgres
159+
INSERT INTO s3_vectors.documents_openai (key, data, metadata)
160+
VALUES
161+
(
162+
'doc-1',
163+
'[0.1, 0.2, 0.3, /* ... rest of embedding ... */]'::embd,
164+
'{"title": "Getting Started with Vector Buckets", "source": "documentation"}'::jsonb
165+
),
166+
(
167+
'doc-2',
168+
'[0.4, 0.5, 0.6, /* ... rest of embedding ... */]'::embd,
169+
'{"title": "Advanced Vector Search", "source": "blog"}'::jsonb
170+
);
171+
172+
// JS-SDK (server only)
173+
const index = supabase.storage.vectors
174+
.from('embeddings')
175+
.index('documents-openai')
176+
177+
const { error } = await index.putVectors({
178+
vectors: [
179+
{
180+
key: 'doc-1',
181+
data: {
182+
float32: [0.1, 0.2, 0.3 /* ... */],
183+
},
184+
metadata: {
185+
title: 'Getting started with Vector Buckets',
186+
type: 'doc',
187+
language: 'en',
188+
},
189+
},
190+
],
191+
})
192+
193+
```
194+
195+
**4. Query vectors**
196+
197+
You can run similarity search queries against your indexes, either via the SDK or Postgres.
198+
199+
```jsx
200+
// Postgres
201+
SELECT
202+
key,
203+
metadata->>'title' as title,
204+
embd_distance(data) as distance
205+
FROM s3_vectors.documents_openai
206+
WHERE data <==> '[0.1, 0.2, 0.3, /* ... embedding ... */]'::embd
207+
ORDER BY embd_distance(data) ASC
208+
LIMIT 5;
209+
210+
// JS-SDK (Server only)
211+
const index = supabase.storage.vectors
212+
.from('embeddings')
213+
.index('documents-openai')
214+
215+
// Query with a vector embedding
216+
const { data, error } = await index.queryVectors({
217+
queryVector: {
218+
float32: [0.1, 0.2, 0.3 /* ... embedding of 1536 dimensions ... */],
219+
},
220+
topK: 5,
221+
returnDistance: true,
222+
returnMetadata: true,
223+
})
224+
225+
```
226+
227+
## Designed for workloads up to tens of millions of vectors
228+
229+
Vector Buckets currently can handle large-but-not-infinite workloads:
230+
231+
- Each vector index supports up to **tens of millions of vectors** (50M per index today).
232+
- You can create multiple indexes per bucket (for tenants, models, or domains).
233+
234+
That makes Vector Buckets a great fit for:
235+
236+
- Multi-tenant SaaS apps
237+
- Documentation and content libraries
238+
- Product catalogues and recommendation systems
239+
- Media libraries and image/video/audio similarity search
240+
- AI builders who want semantic search without running their own vector infrastructure
241+
242+
## Example scenarios
243+
244+
A few concrete ways to put Vector Buckets to work:
245+
246+
### 1. AI documentation search
247+
248+
- Store all your documentation (including old versions, drafts, and translations) as embeddings in a Vector Bucket.
249+
- Keep the most recent / highest-traffic docs in `pgvector` for instant in-app search.
250+
- Implement a search endpoint that queries `pgvector` first and falls back to Vector Buckets when needed.
251+
252+
### 2. Long-tail product search and recommendations
253+
254+
- Vectorise your entire catalogue and store it in a Vector Bucket.
255+
- Include metadata for category, brand, stock status, and region.
256+
- Use metadata filters to refine search (e.g. “in stock, in this region, same category”).
257+
- Let recommendation jobs and AI agents work against the full set of products without bloating Postgres.
258+
259+
### 3. Media similarity and de-duplication
260+
261+
- Store embeddings for images, audio or video frames in a Vector Bucket.
262+
- Use similarity search to:
263+
- Find visually similar assets for content discovery or recommendations
264+
- Detect possible copyright issues by finding near-duplicate content
265+
- Clean up your library by removing duplicate or near-duplicate media
266+
267+
## Availability
268+
269+
Vector Buckets are currently available in **Public Alpha** for Pro projects and above.
270+
271+
Currently supported in the following regions:
272+
273+
- us-east-1
274+
- us-east-2
275+
- us-west-2
276+
- eu-central-1
277+
- ap-southeast-2
278+
279+
More regions will be added in the near future.
280+
281+
We’re using this phase to refine the APIs, scaling behaviour, and search experience based on real workloads. Limits may evolve as we learn from how you use the feature in production.
282+
283+
Vector Buckets are **free to use (fair use policy applies)** during Public Alpha. Egress costs still apply.
284+
285+
## Get started
286+
287+
You can try Vector Buckets in your project today:
288+
289+
1. **Create a Vector Bucket**
290+
291+
Dashboard → **Storage → Create bucket → Vector Bucket**.
292+
293+
2. **Create an index**
294+
295+
Pick a dimension that matches your embedding model and choose a distance metric.
296+
297+
3. **Store vectors**
298+
299+
Use Supabase clients to upsert vectors with metadata.
300+
301+
4. **Query vectors**
302+
303+
Build endpoints for semantic search, recommendations, or retrieval-augmented generation.
304+
305+
5. **Layer with `pgvector`**
306+
307+
Keep your hottest, most latency-sensitive vectors in `pgvector`, and store large archives and media-heavy datasets in Vector Buckets.
308+
309+
We’re excited to see what you build with this new vector storage tier.
310+
311+
As you try Vector Buckets during the Public Alpha, please send feedback—what works, what’s confusing, and what you’d like to see next will directly shape where we take this feature.
47.2 KB
Loading
52.5 KB
Loading

apps/www/public/rss.xml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,16 @@
44
<link>https://supabase.com</link>
55
<description>Latest news from Supabase</description>
66
<language>en</language>
7-
<lastBuildDate>Thu, 16 Oct 2025 00:00:00 -0700</lastBuildDate>
7+
<lastBuildDate>Mon, 01 Dec 2025 00:00:00 -0700</lastBuildDate>
88
<atom:link href="https://supabase.com/rss.xml" rel="self" type="application/rss+xml"/>
99
<item>
10+
<guid>https://supabase.com/blog/vector-buckets</guid>
11+
<title>Introducing Vector Buckets</title>
12+
<link>https://supabase.com/blog/vector-buckets</link>
13+
<description>Introducing vector storage in Supabase: a durable storage layer with similarity search built-in.</description>
14+
<pubDate>Mon, 01 Dec 2025 00:00:00 -0700</pubDate>
15+
</item>
16+
<item>
1017
<guid>https://supabase.com/blog/snap-launches-snap-cloud</guid>
1118
<title>Snap, Inc. Launches Snap Cloud, Powered by Supabase</title>
1219
<link>https://supabase.com/blog/snap-launches-snap-cloud</link>

0 commit comments

Comments
 (0)