Skip to content

Conversation

jonahjung22
Copy link
Contributor

Problem

Professionals working in Jupyter Notebooks often need to find relevant examples, documentation, and best practices while coding, but existing solutions require manually searching through documentation or switching between multiple tools. Users need a context-aware assistant that can analyze their current notebook work and understand their technical context, search through comprehensive data science resources using semantic search, and generate structured reports with relevant code examples and next steps.

Solution

This is a persona that combines multi-agent intelligence with RAG to provide intelligent context-aware assistance. The solution uses a 3-agent team architecture: NotebookAnalyzer extracts libraries and analysis stage from notebooks, KnowledgeSearcher performs semantic search through the Python Data Science Handbook using RAG, and MarkdownGenerator creates comprehensive actionable reports.

Changes

  • Code: Implemented RAG and file reading tools as well as the main persona
  • Tests: Included a test_rag_integration.py file that helps users to set up the RAG system. You may experiment with the persona using the text_context_retrieval.ipynb notebook.
  • Docs: Follow the README.md for explicit instructions and information.

Testing Instructions

Test Notebook: Open test_context_retrieval.ipynb and follow the test cases. RAG Integration: Run test_rag_integration() function to verify RAG system which auto-clones handbook and builds vector store. Jupyter-AI Chat: Test with @ContextRetriever help me with pandas operations or @ContextRetriever notebook: /path/to/test_context_retrieval.ipynb.

Future Work

I may potentially explore opportunities to implement Pocketflow into this persona to implement a more simple and efficient core graph abstraction.

Copy link
Collaborator

@srdas srdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the PR and the code works well.

  1. With a notebook in context, the persona collects the relevant context from the Python Data Science Handbook - the context retrieved appears to be relevant, however need to check if the persona misses anything that is relevant.
  2. Markdown file created explaining what is added to the RAG db extracted from the Python Data Science Handbook repo.
  3. RAG db created as well.

Items to check:

  1. If additional notebooks are used for context retrieval, does it overwrite or add to the existing vector store?
  2. The markdown file is overwritten at the moment but we may want to retain it. Better to create a new markdown file for each notebook that is processed with the title of the notebook included.

Will review the code and leave comments as well.

Copy link
Collaborator

@srdas srdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some revisions.

Modify parameters in `rag_core.py`:
```python
rag = PythonDSHandbookRAG(
embedding_model="sentence-transformers/all-MiniLM-L6-v2",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be updated to take in the chosen embedding model from Jupyter-AI. The embedding model would then need to be called using the functions in Jupyter AI.

Comment on lines +25 to +30
if not os.path.exists(notebook_path):
return f"Error: Notebook file not found at {notebook_path}"

if not notebook_path.endswith('.ipynb'):
return f"Error: File must be a .ipynb notebook file, got {notebook_path}"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This should be sent to the chat panel not just printed in the logs.
  2. When I tried this with a .py file instead of a notebook .ipynb file, it still processed the context retrieval? Not sure why.
  3. When I gave it a non-existent file, it still processed the RAG, pulling up various pandas notebook from the PDSH.

repo_url: str = "https://github.com/jakevdp/PythonDataScienceHandbook.git",
local_repo_path: str = None,
vector_store_path: str = None,
embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to use this model? Can we take the chosen embedding model from Jupyter AI's config file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants