Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added .gitattributes
Empty file.
18 changes: 18 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Python artifacts
__pycache__/
*.py[cod]
*.egg-info/
# Virtual envs
venv/
.env

# Data
chroma/
cache/
*.apkg

# study PDFs kept local
Dev/data/

# Other
.idea/
7 changes: 7 additions & 0 deletions Dev/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Contributing

1. Install dependencies using Poetry or `requirements.txt`.
2. Follow the existing module structure under `src/study_tools`.
3. Add tests for new functionality in `Dev/tests`.
4. Run `ruff`, `black`, and `pytest` before submitting a PR.
5. Document changes in `docs/changelog.md` and update `TODO.md` if needed.
22 changes: 22 additions & 0 deletions Dev/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Study Tools Dev Package

This `Dev/` directory houses the refactored implementation of the **Universal Study Tutor**. The old prototype remains in `messy_start/` for reference.
Course PDFs should be placed in `Dev/data/`, which is ignored by Git.

## Features
- Configurable PDF ingestion and chunking
- Async summarisation using local Mistral and OpenAI GPT‑4o
- CLI tools for building the index, chat, flashcards and maintenance
- Learning Unit JSON schema with status counters and categories
- Externalised configuration via `config.yaml`
- Course PDFs stored locally in `Dev/data/` (see `docs/MIGRATE_LARGE_FILES.md`)

## Quickstart
```bash
python -m pip install -r requirements.txt
python -m study_tools.build_index
python -m study_tools.summarize
python -m study_tools.cli_chat
```

See `docs/overview.md` for more details.
13 changes: 13 additions & 0 deletions Dev/agents.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
agents:
- name: Ingestor
role: Split PDFs into sentence-aware chunks and store them in Qdrant.
- name: Summariser
role: Summarise chunks using GPT-4o and cache results.
- name: Tagger
role: Classify chunks into categories with local Mistral.
- name: LUManager
role: Persist Learning Units with status counters and relations.
- name: Chat
role: Interactive Q&A and tutoring over the stored materials.
- name: FlashcardBuilder
role: Generate Anki-compatible decks from summaries.
23 changes: 23 additions & 0 deletions Dev/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
paths:
docs_dir: data
chroma_dir: chroma
cache_dir: cache
chunking:
chunk_size: 1024
chunk_overlap: 128
pages_per_group: 2
page_overlap: 1
chunk_group_limit: 6000
models:
default: gpt-4o
tagging: mistral-7b-instruct
summarizer: gpt-4o
context_windows:
gpt-4o: 128000
gpt-4-turbo: 128000
gpt-4: 8192
gpt-3.5-turbo: 16385
mistral-7b-instruct: 32768
limits:
tokens_per_minute: 40000
token_margin: 512
11 changes: 11 additions & 0 deletions Dev/docs/MIGRATE_LARGE_FILES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Handling Large PDF Files

Place course PDFs inside `Dev/data/` which is ignored by Git. They are not versioned by default.

If repository limits become a problem later, you can retroactively move PDFs into Git LFS with:

```bash
git lfs migrate import '*.pdf'
```

Otherwise keep the files locally and back them up to Google Drive or GCS as needed.
23 changes: 23 additions & 0 deletions Dev/docs/TODO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# TODO Backlog

## P0
- Centralised configuration loader (`utils.load_config`).
- Remove hard coded paths; read from `config.yaml`.
- Store PDFs in `Dev/data/` (optionally migrate to Git LFS later).

## P1
- OCR fallback and duplicate detection during ingestion.
- Implement KnowledgeNode graph with status counters.
- Tagging pipeline using local Mistral model.
- CLI commands via `python -m study_tools <command>`.

## P2
- Evaluation harness (ROUGE-L, entity overlap, manual rubric).
- Streamlit MVP for progress view.

## P3
- Difficulty-graded exam question generator (IRT).
- Anki `*.apkg` exporter with AnkiConnect.

## P4
- Visual progress dashboard and Obsidian vault export.
7 changes: 7 additions & 0 deletions Dev/docs/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Changelog

## 2025-07-03
- Initial refactor: new `Dev/` package created.
- Configuration moved to `config.yaml`.
- PDFs now stored in `Dev/data/`; Git LFS usage is optional.
- Migrated documentation and created skeleton tests.
File renamed without changes.
12 changes: 12 additions & 0 deletions Dev/docs/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Overview

The Dev package implements the second iteration of the study bot based on the **Hybrid‑Edge** architecture:

- **Local tagging** with Mistral‑7B‑Instruct classifies text chunks into categories.
- **GPT‑4o/4.1** performs heavy summarisation and tutoring logic.
- **SQLite** stores metadata and Learning Units. **Qdrant** provides vector search.
- Outputs are plain JSON which are rendered to Markdown files.

Course PDFs belong in `Dev/data/` and are not tracked in Git.

Scripts read defaults from `config.yaml` so chunk sizes and model names are easily changed.
23 changes: 23 additions & 0 deletions Dev/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
[tool.poetry]
name = "study-tools"
version = "0.2.0"
description = "Universal Study Tutor"
authors = ["Study Bot Team"]
packages = [{include = "study_tools", from = "src"}]

[tool.poetry.dependencies]
python = "^3.12"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify Python 3.12 requirement necessity.

Python 3.12 is quite recent and may limit adoption. Verify if this strict requirement is necessary or if Python 3.10+ would suffice.


🏁 Script executed:

#!/bin/bash
# Check if the codebase uses Python 3.12+ specific features
rg -n "match|case|\\|" --type py
rg -n "Self|TypeAlias|Literal" --type py
rg -n "from __future__ import annotations" --type py

Length of output: 784


Lower Python requirement to 3.10 (no 3.12-only features detected)

Our codebase uses PEP 604 union types (Path | str) which require Python 3.10+, but no features exclusive to 3.11 or 3.12 were found. Please relax the constraint unless there’s another justification for requiring 3.12.

• File needing update:

  • Dev/pyproject.toml

Suggested change:

- python = "^3.12"
+ python = "^3.10"
🤖 Prompt for AI Agents
In Dev/pyproject.toml at line 9, the Python version requirement is set to 3.12
but the code only uses features available from Python 3.10 onwards. Change the
python version specifier from "^3.12" to "^3.10" to relax the version constraint
accordingly.

llama-index-core = "*"
llama-index-llms-openai = "*"
chromadb = "*"
tiktoken = "*"
tenacity = "*"
qdrant-client = "*"
genanki = "*"
tqdm = "*"
pyyaml = "*"
Comment on lines +10 to +18
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Pin dependency versions for reproducible builds.

Using "*" for all dependencies makes builds non-reproducible and can lead to dependency conflicts. Consider pinning to specific version ranges.

-llama-index-core = "*"
-llama-index-llms-openai = "*"
-chromadb = "*"
-tiktoken = "*"
-tenacity = "*"
-qdrant-client = "*"
-genanki = "*"
-tqdm = "*"
-pyyaml = "*"
+llama-index-core = "^0.10.0"
+llama-index-llms-openai = "^0.1.0"
+chromadb = "^0.4.0"
+tiktoken = "^0.5.0"
+tenacity = "^8.0.0"
+qdrant-client = "^1.7.0"
+genanki = "^2.1.0"
+tqdm = "^4.65.0"
+pyyaml = "^6.0.0"
🤖 Prompt for AI Agents
In Dev/pyproject.toml between lines 10 and 18, the dependencies are currently
specified with "*" which allows any version and leads to non-reproducible
builds. Replace the "*" with specific version numbers or version ranges for each
dependency to ensure consistent and reproducible builds. You can find the
appropriate versions by checking the latest stable releases or the versions
currently used in your environment.


[tool.poetry.group.dev.dependencies]
pytest = "*"
ruff = "*"
black = "*"
9 changes: 9 additions & 0 deletions Dev/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
llama-index-core
llama-index-llms-openai
chromadb
tiktoken
tenacity
qdrant-client
genanki
tqdm
pyyaml
File renamed without changes.
11 changes: 11 additions & 0 deletions Dev/src/study_tools/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
"""Study Tools package."""

__all__ = [
"build_index",
"summarize",
"cli_chat",
"flashcards",
"ingest",
"reset",
"utils",
]
67 changes: 67 additions & 0 deletions Dev/src/study_tools/build_index.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
"""PDF ingestion and vector index creation."""

from pathlib import Path
import shutil

# Heavy imports are done inside functions to allow importing this module without
# optional dependencies.

from .utils import load_config


def extract_pages(pdf_path: Path, pages_per_group: int, overlap: int):
import fitz # PyMuPDF
from llama_index.core import Document
doc = fitz.open(pdf_path)
for i in range(0, len(doc), pages_per_group - overlap):
end = min(i + pages_per_group, len(doc))
Comment on lines +16 to +17
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential infinite loop with invalid overlap configuration.

The range calculation range(0, len(doc), pages_per_group - overlap) could result in an infinite loop if overlap >= pages_per_group, as the step would be zero or negative.

Add input validation to prevent this edge case:

def extract_pages(pdf_path: Path, pages_per_group: int, overlap: int):
+    if overlap >= pages_per_group:
+        raise ValueError(f"Overlap ({overlap}) must be less than pages_per_group ({pages_per_group})")
+    if pages_per_group <= 0 or overlap < 0:
+        raise ValueError("pages_per_group must be positive and overlap must be non-negative")
    import fitz  # PyMuPDF
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i in range(0, len(doc), pages_per_group - overlap):
end = min(i + pages_per_group, len(doc))
def extract_pages(pdf_path: Path, pages_per_group: int, overlap: int):
if overlap >= pages_per_group:
raise ValueError(f"Overlap ({overlap}) must be less than pages_per_group ({pages_per_group})")
if pages_per_group <= 0 or overlap < 0:
raise ValueError("pages_per_group must be positive and overlap must be non-negative")
import fitz # PyMuPDF
# ... rest of extract_pages implementation ...
for i in range(0, len(doc), pages_per_group - overlap):
end = min(i + pages_per_group, len(doc))
🤖 Prompt for AI Agents
In Dev/src/study_tools/build_index.py around lines 16 to 17, the loop step is
calculated as pages_per_group minus overlap, which can be zero or negative if
overlap is greater than or equal to pages_per_group, causing an infinite loop.
Add input validation before the loop to check if overlap is less than
pages_per_group and raise an appropriate error or handle the case to prevent the
loop from running with an invalid step value.

text = "\n\n".join(doc[pg].get_text() for pg in range(i, end))
meta = {
"file_path": str(pdf_path),
"file_name": pdf_path.name,
"page_start": i + 1,
"page_end": end,
}
yield Document(text=text, metadata=meta)


def main():
from llama_index.core import VectorStoreIndex, StorageContext, Document
from llama_index.core.node_parser import SentenceSplitter
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient

cfg = load_config()
paths = cfg["paths"]
docs_dir = Path(paths["docs_dir"])
chroma_dir = Path(paths["chroma_dir"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Misleading variable name: using Qdrant but named chroma_dir.

The variable is named chroma_dir but the code uses Qdrant for vector storage, not Chroma. This creates confusion and suggests a copy-paste error from a previous Chroma implementation.

-    chroma_dir = Path(paths["chroma_dir"])
+    vector_store_dir = Path(paths["chroma_dir"])  # Consider renaming config key to "vector_store_dir"

And update all references:

-    if chroma_dir.exists():
-        shutil.rmtree(chroma_dir)
+    if vector_store_dir.exists():
+        shutil.rmtree(vector_store_dir)
-    client = QdrantClient(path=str(chroma_dir))
+    client = QdrantClient(path=str(vector_store_dir))
-    storage.persist(persist_dir=str(chroma_dir))
+    storage.persist(persist_dir=str(vector_store_dir))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
chroma_dir = Path(paths["chroma_dir"])
# Initialize the directory for vector store
- chroma_dir = Path(paths["chroma_dir"])
+ vector_store_dir = Path(paths["chroma_dir"]) # Consider renaming config key to "vector_store_dir"
# Remove any existing data
- if chroma_dir.exists():
- shutil.rmtree(chroma_dir)
+ if vector_store_dir.exists():
+ shutil.rmtree(vector_store_dir)
# Connect to Qdrant vector store
- client = QdrantClient(path=str(chroma_dir))
+ client = QdrantClient(path=str(vector_store_dir))
# Persist the embeddings/storage
- storage.persist(persist_dir=str(chroma_dir))
+ storage.persist(persist_dir=str(vector_store_dir))
🤖 Prompt for AI Agents
In Dev/src/study_tools/build_index.py at line 37, rename the variable
`chroma_dir` to a name that reflects its use with Qdrant, such as `qdrant_dir`.
Update all references to this variable throughout the file to maintain
consistency and avoid confusion about the vector storage backend.

chunk = cfg["chunking"]

if chroma_dir.exists():
shutil.rmtree(chroma_dir)

docs = []
for pdf in docs_dir.rglob("*.pdf"):
docs.extend(
extract_pages(
pdf,
chunk["pages_per_group"],
chunk["page_overlap"],
)
)

splitter = SentenceSplitter(
chunk_size=chunk["chunk_size"],
chunk_overlap=chunk["chunk_overlap"],
)
nodes = splitter.get_nodes_from_documents(docs)

client = QdrantClient(path=str(chroma_dir))
store = QdrantVectorStore(client, collection_name="study")
storage = StorageContext.from_defaults(vector_store=store)
VectorStoreIndex(nodes, storage_context=storage)
storage.persist(persist_dir=str(chroma_dir))


if __name__ == "__main__":
main()
43 changes: 43 additions & 0 deletions Dev/src/study_tools/cli_chat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
"""CLI chat interface."""

import argparse
from pathlib import Path

# heavy imports done in main()

from .utils import load_config


def main():
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.llms.openai import OpenAI
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient

cfg = load_config()
llm = OpenAI(model=cfg["models"]["summarizer"])
chroma_path = cfg["paths"]["chroma_dir"]
client = QdrantClient(path=chroma_path)
store = QdrantVectorStore(client, collection_name="study")
storage = StorageContext.from_defaults(persist_dir=chroma_path, vector_store=store)
index = load_index_from_storage(storage)
engine = index.as_chat_engine(chat_mode="condense_question", llm=llm, verbose=True)

parser = argparse.ArgumentParser()
parser.add_argument("question", nargs="*")
args = parser.parse_args()

if args.question:
q = " ".join(args.question)
print(engine.chat(q).response)
else:
print("Ask questions (blank to exit)")
while True:
q = input("? ")
if not q.strip():
break
print(engine.chat(q).response)


if __name__ == "__main__":
main()
39 changes: 39 additions & 0 deletions Dev/src/study_tools/flashcards.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
"""Generate Anki deck from summaries."""

import uuid
from pathlib import Path

# heavy imports in main()

from .utils import load_config


def main():
import genanki
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient

cfg = load_config()
chroma_path = cfg["paths"]["chroma_dir"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fix naming inconsistency: "chroma_dir" should be "qdrant_dir".

The code uses Qdrant vector store but references a "chroma_dir" configuration key, which is inconsistent.

-    chroma_path = cfg["paths"]["chroma_dir"]
+    chroma_path = cfg["paths"]["qdrant_dir"]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
chroma_path = cfg["paths"]["chroma_dir"]
chroma_path = cfg["paths"]["qdrant_dir"]
🤖 Prompt for AI Agents
In Dev/src/study_tools/flashcards.py at line 18, the configuration key
"chroma_dir" is incorrectly used while the code uses Qdrant vector store. Rename
the key from "chroma_dir" to "qdrant_dir" to maintain naming consistency with
the vector store being used.

client = QdrantClient(path=chroma_path)
store = QdrantVectorStore(client, collection_name="study")
storage = StorageContext.from_defaults(persist_dir=chroma_path, vector_store=store)
index = load_index_from_storage(storage)
retriever = index.as_retriever(similarity_top_k=50)

deck = genanki.Deck(uuid.uuid4().int >> 64, "Study-Bot Deck")
for node in index.docstore.docs.values():
qa = retriever.query(f"Turn this into Q&A flashcards:\n\n{node.text}").response
for line in qa.splitlines():
if "?" in line:
q, a = line.split("?", 1)
note = genanki.Note(model=genanki.BASIC_MODEL, fields=[q.strip()+"?", a.strip()])
deck.add_note(note)
Comment on lines +27 to +32
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect usage of retriever for Q&A generation.

Using a retriever's query() method for Q&A generation is incorrect. Retrievers are designed for similarity search, not text generation. The parsing logic is also fragile and could fail on malformed responses.

-        qa = retriever.query(f"Turn this into Q&A flashcards:\n\n{node.text}").response
-        for line in qa.splitlines():
-            if "?" in line:
-                q, a = line.split("?", 1)
-                note = genanki.Note(model=genanki.BASIC_MODEL, fields=[q.strip()+"?", a.strip()])
-                deck.add_note(note)
+        # Use LLM directly for Q&A generation instead of retriever
+        llm = index.service_context.llm
+        qa_prompt = f"Generate 3-5 question-answer pairs from this text. Format each as 'Q: question? A: answer':\n\n{node.text}"
+        qa_response = llm.complete(qa_prompt).text
+        
+        for line in qa_response.splitlines():
+            if line.startswith("Q:") and "A:" in line:
+                try:
+                    q_part, a_part = line.split("A:", 1)
+                    question = q_part.replace("Q:", "").strip()
+                    answer = a_part.strip()
+                    if question and answer:
+                        note = genanki.Note(model=genanki.BASIC_MODEL, fields=[question, answer])
+                        deck.add_note(note)
+                except ValueError:
+                    continue  # Skip malformed lines
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
qa = retriever.query(f"Turn this into Q&A flashcards:\n\n{node.text}").response
for line in qa.splitlines():
if "?" in line:
q, a = line.split("?", 1)
note = genanki.Note(model=genanki.BASIC_MODEL, fields=[q.strip()+"?", a.strip()])
deck.add_note(note)
# Use LLM directly for Q&A generation instead of retriever
llm = index.service_context.llm
qa_prompt = f"Generate 3-5 question-answer pairs from this text. Format each as 'Q: question? A: answer':\n\n{node.text}"
qa_response = llm.complete(qa_prompt).text
for line in qa_response.splitlines():
if line.startswith("Q:") and "A:" in line:
try:
q_part, a_part = line.split("A:", 1)
question = q_part.replace("Q:", "").strip()
answer = a_part.strip()
if question and answer:
note = genanki.Note(model=genanki.BASIC_MODEL, fields=[question, answer])
deck.add_note(note)
except ValueError:
continue # Skip malformed lines
🤖 Prompt for AI Agents
In Dev/src/study_tools/flashcards.py around lines 27 to 32, the code incorrectly
uses the retriever's query() method for generating Q&A flashcards, which is
meant for similarity search, not text generation. Replace the retriever.query()
call with a proper text generation method or API designed for generating Q&A
pairs from the input text. Additionally, improve the parsing logic to robustly
handle the generated Q&A format, ensuring it can gracefully handle malformed or
unexpected responses without breaking.


genanki.Package(deck).write_to_file("study.apkg")
print("study.apkg ready – import into Anki")


if __name__ == "__main__":
main()
17 changes: 17 additions & 0 deletions Dev/src/study_tools/ingest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
"""Simple document count utility."""

from pathlib import Path

from .utils import load_config


def main():
from llama_index.core import SimpleDirectoryReader
cfg = load_config()
docs_dir = Path(cfg["paths"]["docs_dir"])
docs = SimpleDirectoryReader(str(docs_dir)).load_data()
print(f"Loaded {len(docs)} docs")

Comment on lines +8 to +14
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling and improve code structure.

The code works but could benefit from better error handling and structure improvements.

Apply this diff to improve error handling and code structure:

 def main():
-    from llama_index.core import SimpleDirectoryReader
-    cfg = load_config()
-    docs_dir = Path(cfg["paths"]["docs_dir"])
-    docs = SimpleDirectoryReader(str(docs_dir)).load_data()
-    print(f"Loaded {len(docs)} docs")
+    try:
+        from llama_index.core import SimpleDirectoryReader
+        cfg = load_config()
+        docs_dir = Path(cfg["paths"]["docs_dir"])
+        
+        if not docs_dir.exists():
+            print(f"Error: Documents directory does not exist: {docs_dir}")
+            return
+            
+        docs = SimpleDirectoryReader(str(docs_dir)).load_data()
+        print(f"Loaded {len(docs)} docs")
+    except KeyError as e:
+        print(f"Error: Missing configuration key: {e}")
+    except Exception as e:
+        print(f"Error loading documents: {e}")

Consider moving the import to the top of the file for better visibility:

+from llama_index.core import SimpleDirectoryReader
 from pathlib import Path

 from .utils import load_config
🤖 Prompt for AI Agents
In Dev/src/study_tools/ingest.py around lines 8 to 14, improve the code by
moving the import of SimpleDirectoryReader to the top of the file for better
visibility and apply error handling around the document loading process. Wrap
the loading logic in a try-except block to catch and log exceptions, ensuring
the program handles errors gracefully instead of failing silently or crashing.


if __name__ == "__main__":
main()
25 changes: 25 additions & 0 deletions Dev/src/study_tools/reset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
"""Remove generated data."""

import shutil
from pathlib import Path

from .utils import load_config


def main():
cfg = load_config()
paths = cfg["paths"]
for key in ("chroma_dir", "cache_dir"):
p = Path(paths[key])
if p.exists():
shutil.rmtree(p)
print(f"Deleted {p}")
for f in ("summary.md", "summary.pdf", "study.apkg"):
Comment on lines +12 to +17
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

KeyError & safety guard around path lookup

cfg["paths"][key] will raise if the key is absent. A safer pattern:

-    for key in ("chroma_dir", "cache_dir"):
-        p = Path(paths[key])
+    for key in ("chroma_dir", "cache_dir"):
+        if key not in paths:
+            continue
+        p = Path(paths[key])

Also consider shutil.rmtree(p, ignore_errors=True) to cope with permission issues.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for key in ("chroma_dir", "cache_dir"):
p = Path(paths[key])
if p.exists():
shutil.rmtree(p)
print(f"Deleted {p}")
for f in ("summary.md", "summary.pdf", "study.apkg"):
for key in ("chroma_dir", "cache_dir"):
if key not in paths:
continue
p = Path(paths[key])
if p.exists():
shutil.rmtree(p)
print(f"Deleted {p}")
for f in ("summary.md", "summary.pdf", "study.apkg"):
🤖 Prompt for AI Agents
In Dev/src/study_tools/reset.py around lines 12 to 17, the code directly
accesses paths using keys without checking if they exist, which can raise a
KeyError. Modify the code to safely retrieve the path using a method like
dict.get() with a default or check if the key exists before accessing it.
Additionally, update shutil.rmtree calls to include ignore_errors=True to handle
potential permission issues gracefully.

fp = Path(f)
if fp.exists():
fp.unlink()
print(f"Deleted {fp}")
Comment on lines +17 to +21
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Hard-coded filenames tie the script to the CWD

summary.md, summary.pdf, and study.apkg are deleted relative to wherever the user runs the script, which may not be the project root. Read the root from cfg (e.g. project_root) or resolve relative to Path(__file__).parent.parent.

🤖 Prompt for AI Agents
In Dev/src/study_tools/reset.py around lines 17 to 21, the filenames are
currently hard-coded and deleted relative to the current working directory,
which may not be the project root. To fix this, determine the project root
directory by reading it from the configuration (e.g., cfg.project_root) or by
resolving it relative to the script location using Path(__file__).parent.parent.
Then, construct the full paths to the files by joining the project root with the
filenames before checking existence and deleting them.



if __name__ == "__main__":
main()
Loading