Skip to content

Commit 36af0d1

Browse files
authored
[DOC]: add doc page for web sync onboarding (#5821)
1 parent 53fa376 commit 36af0d1

File tree

7 files changed

+31
-0
lines changed

7 files changed

+31
-0
lines changed
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
id: web
3+
name: Web Sync
4+
---
5+
6+
Web Sync allows you to easily sync content from any publicly accessible website into your Chroma Cloud database. Given a starting URL, Sync will crawl the website and its links up to a specified depth, extracting the content as Markdown, chunking it, and inserting it into your Chroma database with embeddings.
7+
8+
# Walkthrough
9+
10+
If you do not already have a Chroma Cloud account, you will need to create one at [trychroma.com](https://www.trychroma.com). After creating an account, you can create a database by specifying a name:
11+
12+
{% MarkdocImage lightSrc="/sync/sync_web_new_db.png" darkSrc="/sync/sync_web_new_db.png" alt="Create database screen" /%}
13+
14+
Then, select the Web source during onboarding:
15+
16+
{% MarkdocImage lightSrc="/sync/sync_web_onboarding.png" darkSrc="/sync/sync_web_onboarding.png" alt="Onboarding screen" /%}
17+
18+
Next, configure the Web source by providing a starting URL:
19+
20+
{% MarkdocImage lightSrc="/sync/sync_web_url_config.png" darkSrc="/sync/sync_web_url_config.png" alt="Web source config" /%}
21+
22+
Optionally, you can configure other parameters like the page limit and include path regexes. Here, we're scraping a maximum of 50 pages under `https://docs.trychroma.com/cloud` (all our cloud docs):
23+
24+
{% MarkdocImage lightSrc="/sync/sync_web_advanced_config.png" darkSrc="/sync/sync_web_advanced_config.png" alt="Web source config" /%}
25+
26+
You can also change the default collection name if you want. After clicking "Create Sync Source", an initial sync will start:
27+
28+
{% MarkdocImage lightSrc="/sync/sync_web_progress.png" darkSrc="/sync/sync_web_progress.png" alt="Web sync in progress" /%}
29+
30+
After it finishes, you'll be redirected to the created collection.

docs/docs.trychroma.com/markdoc/content/sidebar-config.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,7 @@ const sidebarConfig: AppSection[] = [
167167
pages: [
168168
{ id: "overview", name: "Overview" },
169169
{ id: "github", name: "GitHub" },
170+
{ id: "web", name: "Web" },
170171
],
171172
},
172173
{
87.7 KB
Loading
114 KB
Loading
55.8 KB
Loading
45.9 KB
Loading
52.7 KB
Loading

0 commit comments

Comments
 (0)