Skip to content

Remove adaptor docs rag#397

Open
hanna-paasivirta wants to merge 3 commits intomainfrom
remove-adaptor-rag
Open

Remove adaptor docs rag#397
hanna-paasivirta wants to merge 3 commits intomainfrom
remove-adaptor-rag

Conversation

@hanna-paasivirta
Copy link
Contributor

Short Description

Removes adaptor documentation embedding similarity search, as this returns many irrelevant docs and we have already added key information about relevant adaptors to the prompt.

  • Removed adaptor_docs as a RAG search option, so that queries now only search general OpenFn platform docs
  • Updated search_docs_system_prompt examples to reflect general docs (not adaptor-specific)
  • Updated job_chat system prompt to clarify these are general docs only and may not be relevant to the user's adaptor/situation
  • Updates tests accordingly to remove the doc_type assertion from test_generate_queries_returns_valid_structure

Fixes #396

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy

@josephjclark
Copy link
Collaborator

@hanna-paasivirta is everything commited to this branch? I can't see any functional change between some wording in prompts 🤔

@hanna-paasivirta
Copy link
Contributor Author

@josephjclark Sorry you're right – I just removed the docs type filter when I should have specified the docs type to narrow it down. I've added a test as well to verify it only searches general docs.

@josephjclark
Copy link
Collaborator

So I've run a local test on this.

On main, when including the adaptor docs, if I ask a question about dhis2, I get 5 hits from this dhis2 docs page, all scoring 0.82+. Only one of those looks relevant to my question, but OK.

On this branch, the same question gives me 5 basically random results from general docs (again all scoring above 0.82)

"metadata": { "doc_title": "standards", "docs_type": "general_docs" },
"metadata": {
            "doc_title": "javascript",
            "docs_type": "general_docs"
          },
{ "doc_title": "home", "docs_type": "general_docs" }
{
            "doc_title": "build-compliant-apps",
            "docs_type": "general_docs"
          },
 { "doc_title": "cli-usage", "docs_type": "general_docs" },

So OK, this PR is working and adaptor docs have been stripped out.

But we just return the top 5 rag results right? So is the next result of this that we just pass different irrelevant information from the RAG?

For what it's worth the answer to my question - what functions does dhis2 provide? - is basically identical in both cases.

I think we should talk about this a little more before taking action. I forgot that we were just returning the top 5 hits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Embedding similarity-based search returns the wrong adaptor docs

2 participants