-
Notifications
You must be signed in to change notification settings - Fork 5
Context Retrieval Persona #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Context Retrieval Persona #13
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested the PR and the code works well.
- With a notebook in context, the persona collects the relevant context from the Python Data Science Handbook - the context retrieved appears to be relevant, however need to check if the persona misses anything that is relevant.
- Markdown file created explaining what is added to the RAG db extracted from the Python Data Science Handbook repo.
- RAG db created as well.
Items to check:
- If additional notebooks are used for context retrieval, does it overwrite or add to the existing vector store?
- The markdown file is overwritten at the moment but we may want to retain it. Better to create a new markdown file for each notebook that is processed with the title of the notebook included.
Will review the code and leave comments as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs some revisions.
Modify parameters in `rag_core.py`: | ||
```python | ||
rag = PythonDSHandbookRAG( | ||
embedding_model="sentence-transformers/all-MiniLM-L6-v2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be updated to take in the chosen embedding model from Jupyter-AI. The embedding model would then need to be called using the functions in Jupyter AI.
if not os.path.exists(notebook_path): | ||
return f"Error: Notebook file not found at {notebook_path}" | ||
|
||
if not notebook_path.endswith('.ipynb'): | ||
return f"Error: File must be a .ipynb notebook file, got {notebook_path}" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This should be sent to the chat panel not just printed in the logs.
- When I tried this with a
.py
file instead of a notebook.ipynb
file, it still processed the context retrieval? Not sure why. - When I gave it a non-existent file, it still processed the RAG, pulling up various pandas notebook from the PDSH.
repo_url: str = "https://github.com/jakevdp/PythonDataScienceHandbook.git", | ||
local_repo_path: str = None, | ||
vector_store_path: str = None, | ||
embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to use this model? Can we take the chosen embedding model from Jupyter AI's config file?
Problem
Professionals working in Jupyter Notebooks often need to find relevant examples, documentation, and best practices while coding, but existing solutions require manually searching through documentation or switching between multiple tools. Users need a context-aware assistant that can analyze their current notebook work and understand their technical context, search through comprehensive data science resources using semantic search, and generate structured reports with relevant code examples and next steps.
Solution
This is a persona that combines multi-agent intelligence with RAG to provide intelligent context-aware assistance. The solution uses a 3-agent team architecture: NotebookAnalyzer extracts libraries and analysis stage from notebooks, KnowledgeSearcher performs semantic search through the Python Data Science Handbook using RAG, and MarkdownGenerator creates comprehensive actionable reports.
Changes
Testing Instructions
Test Notebook: Open test_context_retrieval.ipynb and follow the test cases. RAG Integration: Run test_rag_integration() function to verify RAG system which auto-clones handbook and builds vector store. Jupyter-AI Chat: Test with @ContextRetriever help me with pandas operations or @ContextRetriever notebook: /path/to/test_context_retrieval.ipynb.
Future Work
I may potentially explore opportunities to implement Pocketflow into this persona to implement a more simple and efficient core graph abstraction.