Add a tutorial for GraphVectorStore #3

cbornet · 2024-07-24T22:23:21Z

No description provided.

bjchambers · 2024-07-25T19:55:44Z

docs/docs/tutorials/index.mdx

 ## Working with external knowledge
 - [Build a Retrieval Augmented Generation (RAG) Application](/docs/tutorials/rag)
 - [Build a Conversational RAG Application](/docs/tutorials/qa_chat_history)
+- [Build a Tech Support Bot from an existing Knowledge Base](/docs/tutorials/graph_vectorstore)


Possible titles to get RAG and links in:

Build a Tech Support RAG Application with Content Links (Content Links or Hyperlinks or something like that perhaps)?

kerinin · 2024-07-25T20:05:21Z

The example feels very DataStax-specific - I think we should make sure the way it's written doesn't assume the reader has ever heard of us or any of our products, for example rather than "Load the Astra Documentation" something like "Load the Documentation pages".

I also think we need to cut the length down a lot - this spends too much time doing environment setup. We could do several things to simplify (off the top of my head):

Hard-code URLs rather than using the sitemap
Rely on env vars rather than using getpass
Create links for all URLs regardless of prefix
Load documents in a single call rather than batching them

...basically look for anything that isn't explaining graph RAG directly and try to simplify it away.

bjchambers · 2024-07-25T20:15:02Z

Not really.

Hard-code URLs rather than using the sitemap

There are 4000 pages or something like that. It would take a little more than using the sitemap. It would be better if we could get the sitemap logic into LangChain. We could (perhaps) pickle the list and load that. Or, we could break it into a function so it doesn't show up in the notebook. But, it is part of showing a "real" example, and I think it's actually useful to show how to use the sitemap to crawl your own knowledge base (very re-usable).

Rely on env vars rather than using getpass

Sure. Won't save too much I don't think. And not typical for re-usable notebooks (harder to set if they want to run it).

Create links for all URLs regardless of prefix

It already does this. If you're referring to not using the CSS selectors based on the prefix, that is important to avoid the header/footer/navigation from being part of the content and flooding the links. This is important to show (although we could simplify with a HTML-to-markdown-plus-html-extractors document transformer) since it is part of real examples.

Load documents in a single call rather than batching them

Too many documents to load in a single call.

Add a tutorial for GraphVectorStore

a4ba044

cbornet marked this pull request as draft July 24, 2024 22:23

bjchambers reviewed Jul 25, 2024

View reviewed changes

cbornet added 2 commits July 29, 2024 15:58

Update tutorial

8d73b21

Update tutorial

485853d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a tutorial for GraphVectorStore #3

Add a tutorial for GraphVectorStore #3

Uh oh!

cbornet commented Jul 24, 2024

Uh oh!

bjchambers Jul 25, 2024

Uh oh!

kerinin commented Jul 25, 2024

Uh oh!

bjchambers commented Jul 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add a tutorial for GraphVectorStore #3

Are you sure you want to change the base?

Add a tutorial for GraphVectorStore #3

Uh oh!

Conversation

cbornet commented Jul 24, 2024

Uh oh!

bjchambers Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

kerinin commented Jul 25, 2024

Uh oh!

bjchambers commented Jul 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants