Skip to content

Commit 68cb1a1

Browse files
authored
feat(content api): add management api references to semantic search (supabase#36289)
* docs: add cursor rule for embedding generation process Add documentation for cursor IDE about how docs embeddings are generated, including the workflow for creating and uploading semantic search content. * feat: improve API reference metadata upload with descriptive content - Add preembeddings script to run codegen before embedding generation - Enhance OpenApiReferenceSource to generate more descriptive content including parameters, responses, path information, and better structured documentation * feat: add Management API references to searchDocs GraphQL query - Add ManagementApiReference GraphQL type and model for API endpoint search results - Integrate Management API references into global search results - Update test snapshots and add comprehensive test coverage for Management API search * style: format
1 parent 97d80a7 commit 68cb1a1

File tree

11 files changed

+315
-31
lines changed

11 files changed

+315
-31
lines changed
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Documentation Embeddings Generation System
2+
3+
## Overview
4+
5+
The documentation embeddings generation system processes various documentation sources and uploads their metadata to a database for semantic search functionality. The system is located in `apps/docs/scripts/search/` and works by:
6+
7+
1. **Discovering content sources** from multiple types of documentation
8+
2. **Processing content** into structured sections with checksums
9+
3. **Generating embeddings** using OpenAI's text-embedding-ada-002 model
10+
4. **Storing in database** with vector embeddings for semantic search
11+
12+
## Architecture
13+
14+
### Main Entry Point
15+
- `generate-embeddings.ts` - Main script that orchestrates the entire process
16+
- Supports `--refresh` flag to force regeneration of all content
17+
18+
### Content Sources (`sources/` directory)
19+
20+
#### Base Classes
21+
- `BaseLoader` - Abstract class for loading content from different sources
22+
- `BaseSource` - Abstract class for processing and formatting content
23+
24+
#### Source Types
25+
1. **Markdown Sources** (`markdown.ts`)
26+
- Processes `.mdx` files from guides and documentation
27+
- Extracts frontmatter metadata and content sections
28+
29+
2. **Reference Documentation** (`reference-doc.ts`)
30+
- **OpenAPI References** - Management API documentation from OpenAPI specs
31+
- **Client Library References** - JavaScript, Dart, Python, C#, Swift, Kotlin SDKs
32+
- **CLI References** - Command-line interface documentation
33+
- Processes YAML/JSON specs and matches with common sections
34+
35+
3. **GitHub Discussions** (`github-discussion.ts`)
36+
- Fetches troubleshooting discussions from GitHub using GraphQL API
37+
- Uses GitHub App authentication for access
38+
39+
4. **Partner Integrations** (`partner-integrations.ts`)
40+
- Fetches approved partner integration documentation from Supabase database
41+
- Technology integrations only (excludes agencies)
42+
43+
### Processing Flow
44+
45+
1. **Content Discovery**: Each source loader discovers and loads content files/data
46+
2. **Content Processing**: Each source processes content into:
47+
- Checksum for change detection
48+
- Metadata (title, subtitle, etc.)
49+
- Sections with headings and content
50+
3. **Change Detection**: Compares checksums against existing database records
51+
4. **Embedding Generation**: Uses OpenAI to generate embeddings for new/changed content
52+
5. **Database Storage**: Stores in `page` and `page_section` tables with embeddings
53+
6. **Cleanup**: Removes outdated pages using version tracking
54+
55+
### Database Schema
56+
57+
- **`page`** table: Stores page metadata, content, checksum, version
58+
- **`page_section`** table: Stores individual sections with embeddings, token counts
59+

apps/docs/app/api/graphql/__snapshots__/route.test.ts.snap

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,20 @@ type CLICommandReference implements SearchResult {
8484
content: String
8585
}
8686
87+
"""
88+
A reference document containing a description of a Supabase Management API endpoint
89+
"""
90+
type ManagementApiReference implements SearchResult {
91+
"""The title of the document"""
92+
title: String
93+
94+
"""The URL of the document"""
95+
href: String
96+
97+
"""The content of the reference document, as text"""
98+
content: String
99+
}
100+
87101
"""
88102
A reference document containing a description of a function from a Supabase client library
89103
"""

apps/docs/app/api/graphql/tests/searchDocs.smoke.test.ts

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,4 +204,40 @@ describe('prod smoke test: graphql: searchDocs', () => {
204204
expect(guideNode).toHaveProperty('href')
205205
expect(guideNode).toHaveProperty('content')
206206
})
207+
208+
it('searchDocs query includes Management API references', async () => {
209+
const query = `
210+
query SearchDocsQuery($query: String!) {
211+
searchDocs(query: $query) {
212+
nodes {
213+
...on ManagementApiReference {
214+
title
215+
href
216+
content
217+
}
218+
}
219+
}
220+
}
221+
`
222+
const result = await fetch(GRAPHQL_URL, {
223+
method: 'POST',
224+
body: JSON.stringify({ query, variables: { query: 'create SSO provider' } }),
225+
})
226+
227+
expect(result.status).toBe(200)
228+
const { data, errors } = await result.json()
229+
expect(errors).toBeUndefined()
230+
231+
const {
232+
searchDocs: { nodes },
233+
} = data
234+
expect(Array.isArray(nodes)).toBe(true)
235+
expect(nodes.length).toBeGreaterThan(0)
236+
237+
const managementApiNode = nodes.find((node: any) => !!node.title)
238+
expect(managementApiNode).toBeDefined()
239+
expect(managementApiNode).toHaveProperty('title')
240+
expect(managementApiNode).toHaveProperty('href')
241+
expect(managementApiNode).toHaveProperty('content')
242+
})
207243
})

apps/docs/app/api/graphql/tests/searchDocs.test.ts

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,16 @@ const rpcSpy = vi.fn().mockImplementation((funcName, params) => {
3333
content: params?.include_full_content ? 'Another content' : null,
3434
subsections: [{ title: 'Getting Started', content: 'Getting Started content' }],
3535
},
36+
{
37+
type: 'reference',
38+
page_title: 'Create a SSO provider',
39+
href: 'https://supabase.com/docs/reference/api/v1-create-a-sso-provider',
40+
content: params?.include_full_content ? 'Creates a new SSO provider for a project' : null,
41+
metadata: {
42+
title: 'Create a SSO provider',
43+
subtitle: 'Management API Reference: Create a SSO provider',
44+
},
45+
},
3646
]
3747
return Promise.resolve({ data: mockResults.slice(0, limit), error: null })
3848
}
@@ -190,4 +200,40 @@ describe('/api/graphql searchDocs', () => {
190200
expect(json.errors).toBeDefined()
191201
expect(json.errors[0].message).toContain('required')
192202
})
203+
204+
it('should return Management API references with proper fields', async () => {
205+
const searchQuery = `
206+
query {
207+
searchDocs(query: "SSO provider", limit: 3) {
208+
nodes {
209+
... on ManagementApiReference {
210+
title
211+
href
212+
content
213+
}
214+
}
215+
}
216+
}
217+
`
218+
const request = new Request('http://localhost/api/graphql', {
219+
method: 'POST',
220+
body: JSON.stringify({ query: searchQuery }),
221+
})
222+
223+
const response = await POST(request)
224+
const json = await response.json()
225+
226+
expect(json.errors).toBeUndefined()
227+
expect(json.data).toBeDefined()
228+
expect(json.data.searchDocs).toBeDefined()
229+
expect(json.data.searchDocs.nodes).toBeInstanceOf(Array)
230+
expect(json.data.searchDocs.nodes).toHaveLength(3)
231+
232+
const managementApiNode = json.data.searchDocs.nodes[2]
233+
expect(managementApiNode).toMatchObject({
234+
title: 'Create a SSO provider',
235+
href: 'https://supabase.com/docs/reference/api/v1-create-a-sso-provider',
236+
content: 'Creates a new SSO provider for a project',
237+
})
238+
})
193239
})

apps/docs/lib/supabase.ts

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,12 @@ type Database = {
1818
DatabaseGenerated['public']['Functions']['search_content']['Returns'][number],
1919
'subsections' | 'metadata'
2020
> & {
21-
metadata: { language?: string; methodName?: string; platform?: string }
21+
metadata: {
22+
subtitle?: string
23+
language?: string
24+
methodName?: string
25+
platform?: string
26+
}
2227
subsections: Array<{ title?: string; href?: string; content?: string }>
2328
}
2429
>

apps/docs/package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
"postbuild": "pnpm run build:sitemap && pnpm run build:llms && ./../../scripts/upload-static-assets.sh",
2626
"prebuild": "pnpm run codegen:graphql && pnpm run codegen:references && pnpm run codegen:examples",
2727
"predev": "pnpm run codegen:graphql && pnpm run codegen:references && pnpm run codegen:examples",
28+
"preembeddings": "pnpm run codegen:references",
2829
"preinstall": "npx only-allow pnpm",
2930
"presync": "pnpm run codegen:graphql",
3031
"pretest": "pnpm run codegen:examples",

apps/docs/resources/globalSearch/globalSearchModel.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import {
88
DB_METADATA_TAG_PLATFORM_CLI,
99
ReferenceCLICommandModel,
1010
} from '../reference/referenceCLIModel'
11+
import { ReferenceManagementApiModel } from '../reference/referenceManagementApiModel'
1112
import { ReferenceSDKFunctionModel, SDKLanguageValues } from '../reference/referenceSDKModel'
1213
import { TroubleshootingModel } from '../troubleshooting/troubleshootingModel'
1314
import { SearchResultInterface } from './globalSearchInterface'
@@ -74,6 +75,13 @@ function createModelFromMatch({
7475
content,
7576
subsections,
7677
})
78+
// TODO [Charis 2025-06-09] replace with less hacky check
79+
} else if (metadata.subtitle?.startsWith('Management API Reference')) {
80+
return new ReferenceManagementApiModel({
81+
title: page_title,
82+
href,
83+
content,
84+
})
7785
} else {
7886
return null
7987
}
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
import { type SearchResultInterface } from '../globalSearch/globalSearchInterface'
2+
3+
export class ReferenceManagementApiModel implements SearchResultInterface {
4+
public title?: string
5+
public href?: string
6+
public content?: string
7+
8+
constructor({ title, href, content }: { title?: string; href?: string; content?: string }) {
9+
this.title = title
10+
this.href = href
11+
this.content = content
12+
}
13+
}
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
import { GraphQLObjectType, GraphQLString } from 'graphql'
2+
import { GraphQLInterfaceTypeSearchResult } from '../globalSearch/globalSearchSchema'
3+
import { ReferenceManagementApiModel } from './referenceManagementApiModel'
4+
5+
export const GraphQLObjectTypeReferenceManagementApi = new GraphQLObjectType({
6+
name: 'ManagementApiReference',
7+
interfaces: [GraphQLInterfaceTypeSearchResult],
8+
isTypeOf: (value: unknown) => value instanceof ReferenceManagementApiModel,
9+
description:
10+
'A reference document containing a description of a Supabase Management API endpoint',
11+
fields: {
12+
title: {
13+
type: GraphQLString,
14+
description: 'The title of the document',
15+
},
16+
href: {
17+
type: GraphQLString,
18+
description: 'The URL of the document',
19+
},
20+
content: {
21+
type: GraphQLString,
22+
description: 'The content of the reference document, as text',
23+
},
24+
},
25+
})

apps/docs/resources/rootSchema.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ import { errorRoot, errorsRoot } from './error/errorResolver'
1010
import { searchRoot } from './globalSearch/globalSearchResolver'
1111
import { GraphQLObjectTypeGuide } from './guide/guideSchema'
1212
import { GraphQLObjectTypeReferenceCLICommand } from './reference/referenceCLISchema'
13+
import { GraphQLObjectTypeReferenceManagementApi } from './reference/referenceManagementApiSchema'
1314
import { GraphQLObjectTypeReferenceSDKFunction } from './reference/referenceSDKSchema'
1415
import { GraphQLObjectTypeTroubleshooting } from './troubleshooting/troubleshootingSchema'
1516

@@ -43,6 +44,7 @@ export const rootGraphQLSchema = new GraphQLSchema({
4344
types: [
4445
GraphQLObjectTypeGuide,
4546
GraphQLObjectTypeReferenceCLICommand,
47+
GraphQLObjectTypeReferenceManagementApi,
4648
GraphQLObjectTypeReferenceSDKFunction,
4749
GraphQLObjectTypeTroubleshooting,
4850
],

0 commit comments

Comments
 (0)