Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added .gitattributes
Empty file.
18 changes: 18 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Python artifacts
__pycache__/
*.py[cod]
*.egg-info/
# Virtual envs
venv/
.env

# Data
chroma/
cache/
*.apkg

# study PDFs kept local
Dev/data/

# Other
.idea/
7 changes: 7 additions & 0 deletions Dev/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Contributing

1. Install dependencies using Poetry or `requirements.txt`.
2. Follow the existing module structure under `src/study_tools`.
3. Add tests for new functionality in `Dev/tests`.
4. Run `ruff`, `black`, and `pytest` before submitting a PR.
5. Document changes in `docs/changelog.md` and update `TODO.md` if needed.
22 changes: 22 additions & 0 deletions Dev/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Study Tools Dev Package

This `Dev/` directory houses the refactored implementation of the **Universal Study Tutor**. The old prototype remains in `messy_start/` for reference.
Course PDFs should be placed in `Dev/data/`, which is ignored by Git.

## Features
- Configurable PDF ingestion and chunking
- Async summarisation using local Mistral and OpenAI GPT‑4o
- CLI tools for building the index, chat, flashcards and maintenance
- Learning Unit JSON schema with status counters and categories
- Externalised configuration via `config.yaml`
- Course PDFs stored locally in `Dev/data/` (see `docs/MIGRATE_LARGE_FILES.md`)

## Quickstart
```bash
python -m pip install -r requirements.txt
python -m study_tools.build_index
python -m study_tools.summarize
python -m study_tools.cli_chat
```

See `docs/overview.md` for more details.
13 changes: 13 additions & 0 deletions Dev/agents.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
agents:
- name: Ingestor
role: Split PDFs into sentence-aware chunks and store them in Qdrant.
- name: Summariser
role: Summarise chunks using GPT-4o and cache results.
- name: Tagger
role: Classify chunks into categories with local Mistral.
- name: LUManager
role: Persist Learning Units with status counters and relations.
- name: Chat
role: Interactive Q&A and tutoring over the stored materials.
- name: FlashcardBuilder
role: Generate Anki-compatible decks from summaries.
23 changes: 23 additions & 0 deletions Dev/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
paths:
docs_dir: data
chroma_dir: chroma
cache_dir: cache
chunking:
chunk_size: 1024
chunk_overlap: 128
pages_per_group: 2
page_overlap: 1
chunk_group_limit: 6000
models:
default: gpt-4o
tagging: mistral-7b-instruct
summarizer: gpt-4o
context_windows:
gpt-4o: 128000
gpt-4-turbo: 128000
gpt-4: 8192
gpt-3.5-turbo: 16385
mistral-7b-instruct: 32768
limits:
tokens_per_minute: 40000
token_margin: 512
11 changes: 11 additions & 0 deletions Dev/docs/MIGRATE_LARGE_FILES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Handling Large PDF Files

Place course PDFs inside `Dev/data/` which is ignored by Git. They are not versioned by default.

If repository limits become a problem later, you can retroactively move PDFs into Git LFS with:

```bash
git lfs migrate import '*.pdf'
```

Otherwise keep the files locally and back them up to Google Drive or GCS as needed.
23 changes: 23 additions & 0 deletions Dev/docs/TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# TODO Backlog

## P0
- Centralised configuration loader (`utils.load_config`).
- Remove hard coded paths; read from `config.yaml`.
- Store PDFs in `Dev/data/` (optionally migrate to Git LFS later).

## P1
- OCR fallback and duplicate detection during ingestion.
- Implement KnowledgeNode graph with status counters.
- Tagging pipeline using local Mistral model.
- CLI commands via `python -m study_tools <command>`.

## P2
- Evaluation harness (ROUGE-L, entity overlap, manual rubric).
- Streamlit MVP for progress view.

## P3
- Difficulty-graded exam question generator (IRT).
- Anki `*.apkg` exporter with AnkiConnect.

## P4
- Visual progress dashboard and Obsidian vault export.
7 changes: 7 additions & 0 deletions Dev/docs/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Changelog

## 2025-07-03
- Initial refactor: new `Dev/` package created.
- Configuration moved to `config.yaml`.
- PDFs now stored in `Dev/data/`; Git LFS usage is optional.
- Migrated documentation and created skeleton tests.
File renamed without changes.
12 changes: 12 additions & 0 deletions Dev/docs/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Overview

The Dev package implements the second iteration of the study bot based on the **Hybrid‑Edge** architecture:

- **Local tagging** with Mistral‑7B‑Instruct classifies text chunks into categories.
- **GPT‑4o/4.1** performs heavy summarisation and tutoring logic.
- **SQLite** stores metadata and Learning Units. **Qdrant** provides vector search.
- Outputs are plain JSON which are rendered to Markdown files.

Course PDFs belong in `Dev/data/` and are not tracked in Git.

Scripts read defaults from `config.yaml` so chunk sizes and model names are easily changed.
23 changes: 23 additions & 0 deletions Dev/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
[tool.poetry]
name = "study-tools"
version = "0.2.0"
description = "Universal Study Tutor"
authors = ["Study Bot Team"]
packages = [{include = "study_tools", from = "src"}]

[tool.poetry.dependencies]
python = "^3.12"
llama-index-core = "*"
llama-index-llms-openai = "*"
chromadb = "*"
tiktoken = "*"
tenacity = "*"
qdrant-client = "*"
genanki = "*"
tqdm = "*"
pyyaml = "*"

[tool.poetry.group.dev.dependencies]
pytest = "*"
ruff = "*"
black = "*"
9 changes: 9 additions & 0 deletions Dev/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
llama-index-core
llama-index-llms-openai
chromadb
tiktoken
tenacity
qdrant-client
genanki
tqdm
pyyaml
File renamed without changes.
11 changes: 11 additions & 0 deletions Dev/src/study_tools/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
"""Study Tools package."""

__all__ = [
"build_index",
"summarize",
"cli_chat",
"flashcards",
"ingest",
"reset",
"utils",
]
67 changes: 67 additions & 0 deletions Dev/src/study_tools/build_index.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
"""PDF ingestion and vector index creation."""

from pathlib import Path
import shutil

# Heavy imports are done inside functions to allow importing this module without
# optional dependencies.

from .utils import load_config


def extract_pages(pdf_path: Path, pages_per_group: int, overlap: int):
import fitz # PyMuPDF
from llama_index.core import Document
doc = fitz.open(pdf_path)
for i in range(0, len(doc), pages_per_group - overlap):
end = min(i + pages_per_group, len(doc))
text = "\n\n".join(doc[pg].get_text() for pg in range(i, end))
meta = {
"file_path": str(pdf_path),
"file_name": pdf_path.name,
"page_start": i + 1,
"page_end": end,
}
yield Document(text=text, metadata=meta)


def main():
from llama_index.core import VectorStoreIndex, StorageContext, Document
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove the unused import.

Document is already imported within the extract_pages function where it's used.

-    from llama_index.core import VectorStoreIndex, StorageContext, Document
+    from llama_index.core import VectorStoreIndex, StorageContext
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from llama_index.core import VectorStoreIndex, StorageContext, Document
from llama_index.core import VectorStoreIndex, StorageContext
🧰 Tools
🪛 Flake8 (7.2.0)

[error] 29-29: 'llama_index.core.Document' imported but unused

(F401)

🪛 Ruff (0.11.9)

29-29: llama_index.core.Document imported but unused

Remove unused import: llama_index.core.Document

(F401)

🤖 Prompt for AI Agents
In Dev/src/study_tools/build_index.py at line 29, the import statement includes
Document which is not used in this scope because it is already imported inside
the extract_pages function. Remove Document from the import statement to clean
up unused imports.

from llama_index.core.node_parser import SentenceSplitter
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient

cfg = load_config()
paths = cfg["paths"]
docs_dir = Path(paths["docs_dir"])
chroma_dir = Path(paths["chroma_dir"])
chunk = cfg["chunking"]

if chroma_dir.exists():
shutil.rmtree(chroma_dir)

Comment on lines +40 to +42
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add safety measures for destructive directory removal.

The unconditional removal of the vector store directory could lead to accidental data loss. Consider adding a confirmation prompt or command-line flag.

+    import sys
+    
     if chroma_dir.exists():
-        shutil.rmtree(chroma_dir)
+        response = input(f"Warning: {chroma_dir} exists. Remove it? [y/N]: ")
+        if response.lower() == 'y':
+            shutil.rmtree(chroma_dir)
+        else:
+            print("Aborting to preserve existing index.")
+            sys.exit(0)

Alternatively, make this behavior configurable via command-line arguments:

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--force", action="store_true", help="Force rebuild by removing existing index")
args = parser.parse_args()

if chroma_dir.exists():
    if args.force:
        shutil.rmtree(chroma_dir)
    else:
        print(f"Error: {chroma_dir} already exists. Use --force to rebuild.")
        sys.exit(1)
🤖 Prompt for AI Agents
In Dev/src/study_tools/build_index.py around lines 40 to 42, the code
unconditionally removes the chroma_dir directory, which risks accidental data
loss. Modify the code to add a command-line argument like --force to control
this behavior. Parse the argument using argparse, and only remove the directory
if --force is specified; otherwise, print an error message and exit. This makes
the destructive removal explicit and safer.

docs = []
for pdf in docs_dir.rglob("*.pdf"):
docs.extend(
extract_pages(
pdf,
chunk["pages_per_group"],
chunk["page_overlap"],
)
)

splitter = SentenceSplitter(
chunk_size=chunk["chunk_size"],
chunk_overlap=chunk["chunk_overlap"],
)
nodes = splitter.get_nodes_from_documents(docs)

client = QdrantClient(path=str(chroma_dir))
store = QdrantVectorStore(client, collection_name="study")
storage = StorageContext.from_defaults(vector_store=store)
VectorStoreIndex(nodes, storage_context=storage)
storage.persist(persist_dir=str(chroma_dir))


if __name__ == "__main__":
main()
43 changes: 43 additions & 0 deletions Dev/src/study_tools/cli_chat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
"""CLI chat interface."""

import argparse
from pathlib import Path

# heavy imports done in main()

from .utils import load_config


def main():
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.llms.openai import OpenAI
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient

cfg = load_config()
llm = OpenAI(model=cfg["models"]["summarizer"])
chroma_path = cfg["paths"]["chroma_dir"]
client = QdrantClient(path=chroma_path)
store = QdrantVectorStore(client, collection_name="study")
storage = StorageContext.from_defaults(persist_dir=chroma_path, vector_store=store)
index = load_index_from_storage(storage)
engine = index.as_chat_engine(chat_mode="condense_question", llm=llm, verbose=True)
Comment on lines +11 to +24
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for configuration and API operations.

The code lacks error handling for configuration loading, vector store initialization, and index loading. These operations could fail and cause the application to crash ungracefully.

Consider adding try-catch blocks around critical operations:

def main():
    from llama_index.core import StorageContext, load_index_from_storage
    from llama_index.llms.openai import OpenAI
    from llama_index.vector_stores.qdrant import QdrantVectorStore
    from qdrant_client import QdrantClient

+   try:
        cfg = load_config()
        llm = OpenAI(model=cfg["models"]["summarizer"])
        chroma_path = cfg["paths"]["chroma_dir"]
        client = QdrantClient(path=chroma_path)
        store = QdrantVectorStore(client, collection_name="study")
        storage = StorageContext.from_defaults(persist_dir=chroma_path, vector_store=store)
        index = load_index_from_storage(storage)
        engine = index.as_chat_engine(chat_mode="condense_question", llm=llm, verbose=True)
+   except Exception as e:
+       print(f"Error initializing chat engine: {e}")
+       return
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def main():
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.llms.openai import OpenAI
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
cfg = load_config()
llm = OpenAI(model=cfg["models"]["summarizer"])
chroma_path = cfg["paths"]["chroma_dir"]
client = QdrantClient(path=chroma_path)
store = QdrantVectorStore(client, collection_name="study")
storage = StorageContext.from_defaults(persist_dir=chroma_path, vector_store=store)
index = load_index_from_storage(storage)
engine = index.as_chat_engine(chat_mode="condense_question", llm=llm, verbose=True)
def main():
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.llms.openai import OpenAI
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
try:
cfg = load_config()
llm = OpenAI(model=cfg["models"]["summarizer"])
chroma_path = cfg["paths"]["chroma_dir"]
client = QdrantClient(path=chroma_path)
store = QdrantVectorStore(client, collection_name="study")
storage = StorageContext.from_defaults(persist_dir=chroma_path, vector_store=store)
index = load_index_from_storage(storage)
engine = index.as_chat_engine(chat_mode="condense_question", llm=llm, verbose=True)
except Exception as e:
print(f"Error initializing chat engine: {e}")
return
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 11-11: Too many local variables (16/15)

(R0914)

🤖 Prompt for AI Agents
In Dev/src/study_tools/cli_chat.py around lines 11 to 24, add try-except blocks
around the configuration loading, QdrantClient initialization, QdrantVectorStore
creation, StorageContext setup, and index loading to catch potential exceptions.
Log or print meaningful error messages and handle failures gracefully to prevent
the application from crashing unexpectedly.


parser = argparse.ArgumentParser()
parser.add_argument("question", nargs="*")
args = parser.parse_args()

if args.question:
q = " ".join(args.question)
print(engine.chat(q).response)
else:
print("Ask questions (blank to exit)")
while True:
q = input("? ")
if not q.strip():
break
print(engine.chat(q).response)


if __name__ == "__main__":
main()
39 changes: 39 additions & 0 deletions Dev/src/study_tools/flashcards.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
"""Generate Anki deck from summaries."""

import uuid
from pathlib import Path

# heavy imports in main()

from .utils import load_config


def main():
import genanki
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient

cfg = load_config()
chroma_path = cfg["paths"]["chroma_dir"]
client = QdrantClient(path=chroma_path)
store = QdrantVectorStore(client, collection_name="study")
storage = StorageContext.from_defaults(persist_dir=chroma_path, vector_store=store)
index = load_index_from_storage(storage)
retriever = index.as_retriever(similarity_top_k=50)

deck = genanki.Deck(uuid.uuid4().int >> 64, "Study-Bot Deck")
for node in index.docstore.docs.values():
qa = retriever.query(f"Turn this into Q&A flashcards:\n\n{node.text}").response
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify retriever API usage.

The retriever.query() method call may be incorrect. Retrievers typically use retrieve() method and don't return objects with a response attribute.


🏁 Script executed:

#!/bin/bash
# Verify the correct API for llama_index retrievers
ast-grep --pattern 'retriever.query($$$)'

Length of output: 280


🏁 Script executed:

#!/bin/bash
# Print the setup of the retriever in flashcards.py for context
sed -n '1,200p' Dev/src/study_tools/flashcards.py

Length of output: 1355


🏁 Script executed:

#!/bin/bash
# Install llama-index and inspect retriever API
pip install llama-index >/dev/null 2>&1
python3 - << 'EOF'
from llama_index import GPTVectorStoreIndex
import inspect

# Create a minimal index (empty documents list)
index = GPTVectorStoreIndex([])
retriever = index.as_retriever()

# List public methods
methods = [m for m in dir(retriever) if not m.startswith('_')]
print("PUBLIC_METHODS:", methods)

# Check for 'query' and 'retrieve'
print("HAS_QUERY:", hasattr(retriever, 'query'))
print("HAS_RETRIEVE:", hasattr(retriever, 'retrieve'))

# Inspect signature if present
if hasattr(retriever, 'retrieve'):
    print("retrieve SIG:", inspect.signature(retriever.retrieve))
if hasattr(retriever, 'query'):
    print("query SIG:", inspect.signature(retriever.query))
EOF

Length of output: 25


🏁 Script executed:

#!/bin/bash
# Re-verify retriever API methods using available Python runtime
pip install llama-index >/dev/null 2>&1

# Use `python` instead of `python3` to accommodate the sandbox environment
python - << 'EOF'
from llama_index import GPTVectorStoreIndex
import inspect

# Create a minimal index and retriever
index = GPTVectorStoreIndex([])
retriever = index.as_retriever()

# List public methods on the retriever
methods = [m for m in dir(retriever) if not m.startswith('_')]
print("RETRIEVER_METHODS:", methods)

# Check for 'query' vs 'retrieve'
print("HAS_QUERY:", hasattr(retriever, 'query'))
print("HAS_RETRIEVE:", hasattr(retriever, 'retrieve'))

# If either exists, print its signature
if hasattr(retriever, 'query'):
    print("query SIG:", inspect.signature(retriever.query))
if hasattr(retriever, 'retrieve'):
    print("retrieve SIG:", inspect.signature(retriever.retrieve))
EOF

Length of output: 201


🏁 Script executed:

#!/bin/bash
# Install llama-index and locate its package
pip install llama-index >/dev/null 2>&1

# Get llama-index installation directory
loc=$(pip show llama-index | grep Location | awk -F': ' '{print $2}')
echo "LLAMA_INDEX_LOCATION: $loc"

# Search for retriever class definitions
grep -R "class .*Retriever" "$loc"/llama_index

# Search for 'query' and 'retrieve' method definitions
grep -R "def query" "$loc"/llama_index
grep -R "def retrieve" "$loc"/llama_index

Length of output: 15284


Use the correct Retriever API (retrieve) or switch to a QueryEngine

The object returned by index.as_retriever() is a BaseRetriever that only implements

retrieve(str_or_query_bundle: QueryType) -> List[NodeWithScore]

It does not have a query() method or a .response attribute. You have two options:

  • If you want raw nodes and will drive your own Q&A logic, call retrieve():
    - qa = retriever.query(f"Turn this into Q&A flashcards:\n\n{node.text}").response
    + nodes = retriever.retrieve(f"Turn this into Q&A flashcards:\n\n{node.text}")
    + # TODO: convert `nodes` into Q&A lines yourself
  • If you really want the built-in “query → Response” pattern, use a QueryEngine:
    query_engine = index.as_query_engine(similarity_top_k=50)
    qa = query_engine.query(f"Turn this into Q&A flashcards:\n\n{node.text}").response

Locations to update:

  • Dev/src/study_tools/flashcards.py: line 27 (replace the retriever.query(...).response call)
🤖 Prompt for AI Agents
In Dev/src/study_tools/flashcards.py at line 27, the code incorrectly calls
retriever.query(...).response, but the retriever object from
index.as_retriever() only supports the retrieve() method and does not have
query() or response attributes. To fix this, either replace the call with
retriever.retrieve(...) if you want raw nodes, or switch to using a QueryEngine
by creating it with index.as_query_engine(...) and then call
query_engine.query(...).response to get the response as intended.

for line in qa.splitlines():
if "?" in line:
q, a = line.split("?", 1)
note = genanki.Note(model=genanki.BASIC_MODEL, fields=[q.strip()+"?", a.strip()])
deck.add_note(note)
Comment on lines +28 to +32
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve Q&A parsing logic.

The current parsing logic is fragile - it splits on any "?" character, which could fail with complex text containing multiple question marks or questions without clear answer separation.

Consider a more robust parsing approach:

-        for line in qa.splitlines():
-            if "?" in line:
-                q, a = line.split("?", 1)
-                note = genanki.Note(model=genanki.BASIC_MODEL, fields=[q.strip()+"?", a.strip()])
-                deck.add_note(note)
+        # Look for Q: A: pattern or numbered questions
+        import re
+        qa_pattern = r'Q:\s*(.+?)\s*A:\s*(.+?)(?=Q:|$)'
+        matches = re.findall(qa_pattern, qa, re.DOTALL)
+        for question, answer in matches:
+            note = genanki.Note(model=genanki.BASIC_MODEL, fields=[question.strip(), answer.strip()])
+            deck.add_note(note)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for line in qa.splitlines():
if "?" in line:
q, a = line.split("?", 1)
note = genanki.Note(model=genanki.BASIC_MODEL, fields=[q.strip()+"?", a.strip()])
deck.add_note(note)
# Look for Q: A: pattern or numbered questions
import re
qa_pattern = r'Q:\s*(.+?)\s*A:\s*(.+?)(?=Q:|$)'
matches = re.findall(qa_pattern, qa, re.DOTALL)
for question, answer in matches:
note = genanki.Note(model=genanki.BASIC_MODEL, fields=[question.strip(), answer.strip()])
deck.add_note(note)
🤖 Prompt for AI Agents
In Dev/src/study_tools/flashcards.py around lines 28 to 32, the current logic
splits each line on the first "?" character to separate question and answer,
which is fragile for complex text with multiple question marks. To fix this,
implement a more robust parsing method such as using a regex pattern to identify
the question part ending with a question mark followed by the answer, or define
a clear delimiter between question and answer. Update the code to extract the
question and answer reliably before creating the genanki.Note and adding it to
the deck.


genanki.Package(deck).write_to_file("study.apkg")
print("study.apkg ready – import into Anki")
Comment on lines +11 to +35
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for file operations and API calls.

The code lacks error handling for critical operations like loading the index, querying the retriever, and writing the Anki package file.

Add comprehensive error handling:

def main():
    import genanki
    from llama_index.core import StorageContext, load_index_from_storage
    from llama_index.vector_stores.qdrant import QdrantVectorStore
    from qdrant_client import QdrantClient

+   try:
        cfg = load_config()
        chroma_path = cfg["paths"]["chroma_dir"]
        client = QdrantClient(path=chroma_path)
        store = QdrantVectorStore(client, collection_name="study")
        storage = StorageContext.from_defaults(persist_dir=chroma_path, vector_store=store)
        index = load_index_from_storage(storage)
        retriever = index.as_retriever(similarity_top_k=50)
+   except Exception as e:
+       print(f"Error initializing components: {e}")
+       return

    deck = genanki.Deck(uuid.uuid4().int >> 64, "Study-Bot Deck")
+   try:
        for node in index.docstore.docs.values():
            qa = retriever.query(f"Turn this into Q&A flashcards:\n\n{node.text}").response
            # ... parsing logic ...

        genanki.Package(deck).write_to_file("study.apkg")
        print("study.apkg ready – import into Anki")
+   except Exception as e:
+       print(f"Error generating flashcards: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def main():
import genanki
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
cfg = load_config()
chroma_path = cfg["paths"]["chroma_dir"]
client = QdrantClient(path=chroma_path)
store = QdrantVectorStore(client, collection_name="study")
storage = StorageContext.from_defaults(persist_dir=chroma_path, vector_store=store)
index = load_index_from_storage(storage)
retriever = index.as_retriever(similarity_top_k=50)
deck = genanki.Deck(uuid.uuid4().int >> 64, "Study-Bot Deck")
for node in index.docstore.docs.values():
qa = retriever.query(f"Turn this into Q&A flashcards:\n\n{node.text}").response
for line in qa.splitlines():
if "?" in line:
q, a = line.split("?", 1)
note = genanki.Note(model=genanki.BASIC_MODEL, fields=[q.strip()+"?", a.strip()])
deck.add_note(note)
genanki.Package(deck).write_to_file("study.apkg")
print("study.apkg ready – import into Anki")
def main():
import genanki
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
try:
cfg = load_config()
chroma_path = cfg["paths"]["chroma_dir"]
client = QdrantClient(path=chroma_path)
store = QdrantVectorStore(client, collection_name="study")
storage = StorageContext.from_defaults(persist_dir=chroma_path, vector_store=store)
index = load_index_from_storage(storage)
retriever = index.as_retriever(similarity_top_k=50)
except Exception as e:
print(f"Error initializing components: {e}")
return
deck = genanki.Deck(uuid.uuid4().int >> 64, "Study-Bot Deck")
try:
for node in index.docstore.docs.values():
qa = retriever.query(f"Turn this into Q&A flashcards:\n\n{node.text}").response
for line in qa.splitlines():
if "?" in line:
q, a = line.split("?", 1)
note = genanki.Note(model=genanki.BASIC_MODEL, fields=[q.strip() + "?", a.strip()])
deck.add_note(note)
genanki.Package(deck).write_to_file("study.apkg")
print("study.apkg ready – import into Anki")
except Exception as e:
print(f"Error generating flashcards: {e}")
🧰 Tools
🪛 Pylint (3.3.7)

[refactor] 11-11: Too many local variables (19/15)

(R0914)

🤖 Prompt for AI Agents
In Dev/src/study_tools/flashcards.py around lines 11 to 35, add try-except
blocks around critical operations such as loading the index from storage,
querying the retriever for Q&A flashcards, and writing the Anki package file.
Catch exceptions to handle errors gracefully, log or print meaningful error
messages, and ensure the program does not crash unexpectedly during these file
operations and API calls.



if __name__ == "__main__":
main()
17 changes: 17 additions & 0 deletions Dev/src/study_tools/ingest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
"""Simple document count utility."""

from pathlib import Path

from .utils import load_config


def main():
from llama_index.core import SimpleDirectoryReader
cfg = load_config()
docs_dir = Path(cfg["paths"]["docs_dir"])
docs = SimpleDirectoryReader(str(docs_dir)).load_data()
print(f"Loaded {len(docs)} docs")


if __name__ == "__main__":
main()
25 changes: 25 additions & 0 deletions Dev/src/study_tools/reset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
"""Remove generated data."""

import shutil
from pathlib import Path

from .utils import load_config


def main():
cfg = load_config()
paths = cfg["paths"]
for key in ("chroma_dir", "cache_dir"):
p = Path(paths[key])
if p.exists():
shutil.rmtree(p)
print(f"Deleted {p}")
Comment on lines +12 to +16
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for directory deletion.

Directory deletion operations should include error handling to gracefully handle permission issues or other filesystem errors.

     for key in ("chroma_dir", "cache_dir"):
         p = Path(paths[key])
         if p.exists():
-            shutil.rmtree(p)
-            print(f"Deleted {p}")
+            try:
+                shutil.rmtree(p)
+                print(f"Deleted {p}")
+            except OSError as e:
+                print(f"Failed to delete {p}: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for key in ("chroma_dir", "cache_dir"):
p = Path(paths[key])
if p.exists():
shutil.rmtree(p)
print(f"Deleted {p}")
for key in ("chroma_dir", "cache_dir"):
p = Path(paths[key])
if p.exists():
try:
shutil.rmtree(p)
print(f"Deleted {p}")
except OSError as e:
print(f"Failed to delete {p}: {e}")
🤖 Prompt for AI Agents
In Dev/src/study_tools/reset.py around lines 12 to 16, the code deletes
directories without error handling, which can cause the program to crash on
permission or filesystem errors. Wrap the shutil.rmtree call in a try-except
block to catch exceptions like PermissionError or OSError, and handle them
gracefully by logging or printing an error message instead of letting the
exception propagate.

for f in ("summary.md", "summary.pdf", "study.apkg"):
fp = Path(f)
if fp.exists():
fp.unlink()
print(f"Deleted {fp}")


if __name__ == "__main__":
main()
Loading