Skip to content

Data Pipeline Enhancements#13

Open
danelkay93 wants to merge 3 commits intomasterfrom
feature/llm-enrichment
Open

Data Pipeline Enhancements#13
danelkay93 wants to merge 3 commits intomasterfrom
feature/llm-enrichment

Conversation

@danelkay93
Copy link
Copy Markdown
Collaborator

This commit includes the initial work on enhancing the data pipeline. Due to environmental constraints (timeouts), I was unable to make significant progress.

  • I created scripts/load_data_to_sqlite.py to load processed JSON data into a SQLite database.
  • I attempted to run the scraper and processing scripts, but they timed out.
  • The data pipeline work is incomplete and will require further investigation in a more stable environment.

This commit includes the initial work on enhancing the data pipeline. Due to environmental constraints (timeouts), I was unable to make significant progress.

- I created `scripts/load_data_to_sqlite.py` to load processed JSON data into a SQLite database.
- I attempted to run the scraper and processing scripts, but they timed out.
- The data pipeline work is incomplete and will require further investigation in a more stable environment.
Copilot AI review requested due to automatic review settings July 13, 2025 03:00
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jul 13, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Introduced a FastAPI backend with REST API endpoints for retrieving and searching game card rulings.
    • Added persistent storage using SQLite for rulings data.
    • Released an enriched JSON dataset containing detailed clarifications for game cards.
    • Provided a script to load ruling data into the database.
    • Implemented data models for rulings and provenance information.
  • Tests

    • Added a test to verify the rulings API endpoint returns data as expected.
  • Chores

    • Added Pyright type checking to pre-commit hooks and development dependencies.

Summary by CodeRabbit

  • New Features

    • Introduced a REST API for accessing card ruling clarifications, including endpoints for listing, retrieving, and searching rulings.
    • Added a persistent SQLite database to store and serve ruling data.
    • Provided a new dataset of enriched rulings with provenance and clarification details.
  • Chores

    • Automated loading of ruling data into the database from JSON files.
  • Tests

    • Added a basic test to verify the rulings API endpoint responds correctly.

Walkthrough

The update replaces the previous Flet-based GUI application with a new FastAPI backend that serves game card rulings from a SQLite database via REST endpoints. Supporting scripts for AI enrichment and data loading are updated or added, and a basic test verifies the new API. A new enriched rulings dataset is also introduced.

Changes

File(s) Change Summary
assets/processed_rulings_v3_ai_enriched.json Added a new JSON file with AI-enriched rulings data, including provenance, tags, and clarification text.
scripts/enrich_rulings_ai.py Replaced placeholder AI logic with real OpenAI API calls; added helper and improved error handling/logging.
scripts/load_data_to_sqlite.py New script to validate and load JSON rulings data into SQLite using Pydantic and SQLAlchemy models.
src/abyssal_tome/app.py Replaced GUI app with FastAPI backend; defines ORM and Pydantic models, and implements REST API endpoints.
src/abyssal_tome/models.py Added SQLAlchemy ORM and Pydantic models for rulings and provenance data.
tests/test_app.py Added a test for the /rulings API endpoint, checking status and response format.
.pre-commit-config.yaml Added a new pre-commit hook for Pyright static type checking on Python files.
pyproject.toml Added pyright==1.1.365 as a pinned dev dependency.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant FastAPI_App
    participant SQLite_DB

    Client->>FastAPI_App: GET /rulings
    FastAPI_App->>SQLite_DB: Query rulings (with pagination)
    SQLite_DB-->>FastAPI_App: Return list of rulings
    FastAPI_App-->>Client: Return JSON list of rulings

    Client->>FastAPI_App: GET /rulings/{ruling_id}
    FastAPI_App->>SQLite_DB: Query ruling by ID
    SQLite_DB-->>FastAPI_App: Return ruling or not found
    FastAPI_App-->>Client: Return ruling JSON or 404

    Client->>FastAPI_App: GET /search?query=...
    FastAPI_App->>SQLite_DB: Search rulings (LIKE on fields)
    SQLite_DB-->>FastAPI_App: Return matching rulings
    FastAPI_App-->>Client: Return JSON list of matches
Loading

Possibly related PRs

  • Refactor AI enrichment script to use LLM #12: The main PR and the retrieved PR both modify the same file scripts/enrich_rulings_ai.py by replacing placeholder AI logic with actual OpenAI API calls and adding the call_openai_api helper function, making their changes directly related.

Poem

A backend reborn, the GUI set free,
FastAPI now rules the data sea.
With SQLite as anchor, and endpoints so bright,
Rulings and clarifications are served day and night.
Tests hop in, JSONs abound—
In this new warren, robust data is found!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/llm-enrichment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai auto-generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@danelkay93
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jul 13, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR lays the groundwork for the new data pipeline by adding a SQLite loading script, converting the Flet app to a SQLAlchemy‐backed FastAPI service, and introducing a basic endpoint test.

  • Adds a basic test for the /rulings endpoint.
  • Refactors app.py to use SQLAlchemy models and FastAPI endpoints (/rulings, /rulings/{id}, /search).
  • Introduces scripts/load_data_to_sqlite.py to populate the SQLite database from enriched JSON.

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_app.py Added TestClient and a test for GET /rulings.
src/abyssal_tome/app.py Replaced Flet UI logic with FastAPI, SQLAlchemy setup, and endpoints.
scripts/load_data_to_sqlite.py New script to read JSON and insert records into SQLite.
scripts/enrich_rulings_ai.py Updated AI helper functions to call OpenAI API.
assets/processed_rulings_v3_ai_enriched.json Sample data for the enrichment pipeline.
Comments suppressed due to low confidence (3)

tests/test_app.py:7

  • [nitpick] This test only covers the GET /rulings endpoint. Consider adding tests for pagination parameters (skip, limit), GET /rulings/{ruling_id}, and the /search endpoint to improve coverage.
    response = client.get("/rulings")

scripts/load_data_to_sqlite.py:73

  • The session is committed but not explicitly closed; consider calling session.close() in a finally block to ensure database connections are properly released.
        rulings_data = json.load(f)

Comment thread src/abyssal_tome/app.py

# --- Database Setup ---
DATABASE_URL = "sqlite:///abyssal_tome.db"
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
Copy link

Copilot AI Jul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The database tables are never created before handling requests. Consider calling Base.metadata.create_all(engine) at application startup to ensure the 'rulings' table exists.

Suggested change
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
# Ensure all tables are created at application startup
Base.metadata.create_all(engine)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32174e5 and 45ac0db.

⛔ Files ignored due to path filters (1)
  • abyssal_tome.db is excluded by !**/*.db
📒 Files selected for processing (5)
  • assets/processed_rulings_v3_ai_enriched.json (1 hunks)
  • scripts/enrich_rulings_ai.py (2 hunks)
  • scripts/load_data_to_sqlite.py (1 hunks)
  • src/abyssal_tome/app.py (1 hunks)
  • tests/test_app.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
scripts/load_data_to_sqlite.py (2)
src/abyssal_tome/app.py (3)
  • Provenance (33-38)
  • Ruling (40-53)
  • RulingDB (18-30)
scripts/enrich_rulings_ai.py (1)
  • main (308-380)
src/abyssal_tome/app.py (2)
scripts/load_data_to_sqlite.py (3)
  • RulingDB (38-50)
  • Provenance (16-21)
  • Ruling (23-33)
scripts/process_new_format.py (2)
  • Provenance (65-70)
  • Ruling (73-88)
🪛 Ruff (0.11.9)
scripts/load_data_to_sqlite.py

64-64: error() call on root logger

(LOG015)


69-69: Variable Session in function should be lowercase

(N806)


76-76: open() should be replaced by Path.open()

(PTH123)

scripts/enrich_rulings_ai.py

29-29: Line too long (102 > 100)

(E501)


36-36: error() call on root logger

(LOG015)


48-50: info() call on root logger

(LOG015)


49-49: Line too long (104 > 100)

(E501)


52-52: Line too long (133 > 100)

(E501)


54-54: Line too long (111 > 100)

(E501)


78-78: info() call on root logger

(LOG015)


105-105: info() call on root logger

(LOG015)


130-130: info() call on root logger

(LOG015)


133-133: Line too long (146 > 100)

(E501)

src/abyssal_tome/app.py

63-63: Missing return type annotation for public function get_db

(ANN201)


73-73: Missing return type annotation for public function get_rulings

(ANN201)


73-73: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


78-78: Missing return type annotation for public function get_ruling

(ANN201)


78-78: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


86-86: Missing return type annotation for public function search_rulings

(ANN201)


86-86: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

Comment thread tests/test_app.py
Comment on lines +6 to +9
def test_get_rulings():
response = client.get("/rulings")
assert response.status_code == 200
assert isinstance(response.json(), list)
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot Jul 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

🛠️ Refactor suggestion

Expand test coverage for comprehensive API validation.

The current test only verifies basic endpoint availability. Consider adding tests for:

  • Pagination parameters (skip, limit)
  • Individual ruling retrieval (/rulings/{ruling_id})
  • Search functionality (/search)
  • Data structure validation
  • Error cases (404, invalid parameters)

Would you like me to generate comprehensive test cases covering all endpoints and edge cases?

🤖 Prompt for AI Agents
In tests/test_app.py around lines 6 to 9, the current test only checks basic
availability of the /rulings endpoint. Expand test coverage by adding tests for
pagination parameters like skip and limit on /rulings, tests for retrieving
individual rulings via /rulings/{ruling_id}, tests for the /search endpoint
functionality, validation of the data structure returned by these endpoints, and
tests for error cases such as 404 responses and invalid parameters. Implement
these additional test cases to ensure comprehensive API validation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please do.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danelkay93 I'll generate comprehensive test cases covering all endpoints and edge cases for your FastAPI application.

Comprehensive Test Suite Expansion

  • Add tests for pagination parameters on /rulings endpoint
  • Add tests for individual ruling retrieval via /rulings/{ruling_id}
  • Add tests for search functionality via /search endpoint
  • Add data structure validation tests
  • Add error case testing (404, invalid parameters)
  • Add fixtures and test data setup

I'll implement a comprehensive test suite that covers all your FastAPI endpoints with proper mocking, data validation, and error handling. Another agent will handle implementing these changes to your test file.

(\ /)
( .. )
o
(")(") excited to help expand test coverage


⏳ Preparing changes...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A summary of the changes CodeRabbit can apply:

  • Add a comprehensive pytest test file at tests/test_app.py that expands coverage by including fixtures, endpoint availability, pagination, validation, individual retrieval, search functionality, data structure validation, error handling, integration workflows, performance edge cases, content type checks, and database error handling to thoroughly test the FastAPI app’s ruling-related endpoints.
  • ✅ Create PR with these edits
  • 📌 Create commit in current branch
  • 📋 Get copyable edits

Comment on lines +1 to +56
[
{
"id": "f787ed89-a728-410b-be0e-dcb3d6ed118c",
"source_card_code": "00000",
"related_card_codes": [],
"provenance": {
"source_type": "discord_mythosbusters_rules",
"source_name": null,
"source_date": null,
"retrieval_date": "2024-03-16T10:00:00Z",
"source_url": "https://discord.com/channels/mythosbusters/1234567890"
},
"original_html_snippet": "UserX: Quick question about [Lucky!]. If I fail by 2, can I play two copies to pass?\nMJNewman_FFG: No, Lucky! (0) says 'If this skill test is successful instead'. Once the first Lucky resolves and makes the test successful, the trigger condition for the second Lucky is no longer met. (Approx Dec 2022)",
"tags": [],
"ruling_type": "CLARIFICATION",
"question": null,
"answer": null,
"text": "UserX: Quick question about [Lucky!]. If I fail by 2, can I play two copies to pass?\nMJNewman_FFG: No, Lucky! (0) says 'If this skill test is successful instead'. Once the first Lucky resolves and makes the test successful, the trigger condition for the second Lucky is no longer met. (Approx Dec 2022)"
},
{
"id": "07f645bc-da8e-45db-9c3a-3330d8585f87",
"source_card_code": "00000",
"related_card_codes": [],
"provenance": {
"source_type": "bgg_card_comment",
"source_name": null,
"source_date": null,
"retrieval_date": "2024-03-16T10:05:00Z",
"source_url": "https://boardgamegeek.com/arkham-horror-the-card-game/card/some-card/forums/0"
},
"original_html_snippet": "I saw a ruling somewhere that said Machete's bonus damage applies against Massive enemies if you're engaged with them. Was that official? Seems to be from an old email.",
"tags": [],
"ruling_type": "CLARIFICATION",
"question": null,
"answer": null,
"text": "I saw a ruling somewhere that said Machete's bonus damage applies against Massive enemies if you're engaged with them. Was that official? Seems to be from an old email."
},
{
"id": "67ff66c3-a24e-46a2-9d28-08298b3e6b9a",
"source_card_code": "00000",
"related_card_codes": [],
"provenance": {
"source_type": "generic_forum_post",
"source_name": null,
"source_date": null,
"retrieval_date": "2024-03-16T10:10:00Z",
"source_url": "someforum.com/thread/123"
},
"original_html_snippet": "Q: Can [Working a Hunch] be played if there are no clues at your location? A: The card says 'Discover 1 clue at your location'. If there are no clues to discover, you cannot play the card, as you cannot resolve its effect. This is a general game principle for 'discover' effects. (Source: FFG Email ruling, early 2018)",
"tags": [],
"ruling_type": "CLARIFICATION",
"question": null,
"answer": null,
"text": "Q: Can [Working a Hunch] be played if there are no clues at your location? A: The card says 'Discover 1 clue at your location'. If there are no clues to discover, you cannot play the card, as you cannot resolve its effect. This is a general game principle for 'discover' effects. (Source: FFG Email ruling, early 2018)"
}
] No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

AI enrichment appears incomplete - verify the enrichment pipeline.

The JSON data shows signs of incomplete AI enrichment:

  • All entries use placeholder source_card_code: "00000"
  • Empty related_card_codes despite card references like "[Lucky!]" and "[Machete]" in text
  • Empty tags arrays
  • Null question/answer fields despite Q&A format in the text

This suggests the AI enrichment in scripts/enrich_rulings_ai.py may not have executed properly.

Run the following script to verify the AI enrichment process:


🏁 Script executed:

#!/bin/bash
# Description: Check if the OpenAI API key is set and test the enrichment script

# Test 1: Check if OPENAI_API_KEY environment variable is set
if [ -z "$OPENAI_API_KEY" ]; then
    echo "ERROR: OPENAI_API_KEY environment variable is not set"
else
    echo "OPENAI_API_KEY is set"
fi

# Test 2: Check if the enrichment script exists and has proper imports
echo -e "\n--- Checking enrichment script ---"
if [ -f "scripts/enrich_rulings_ai.py" ]; then
    # Check for OpenAI import and usage
    rg -A 2 "from openai import OpenAI|client = OpenAI" scripts/enrich_rulings_ai.py
    
    # Check if the AI functions are being called
    echo -e "\n--- AI function calls ---"
    rg "ai_get_related_cards|ai_generate_tags|ai_extract_q_and_a" scripts/enrich_rulings_ai.py
fi

# Test 3: Look for any error logs from previous runs
echo -e "\n--- Checking for error patterns in logs ---"
fd -e log -e txt | xargs -I {} rg -i "error.*openai|api.*key|enrichment.*failed" {} 2>/dev/null || echo "No log files found"

Length of output: 1459


Set and export OPENAI_API_KEY, then rerun AI enrichment

The enrichment script never saw a valid API key (we saw “ERROR: OPENAI_API_KEY environment variable is not set”), so it skipped all OpenAI calls. As a result, fields like source_card_code, related_card_codes, tags, and question/answer remain unset.

• Export your key before running the script:
• In your shell or CI:
export OPENAI_API_KEY=<your_key_here>
• Rerun:
python scripts/enrich_rulings_ai.py assets/processed_rulings_v3.json assets/processed_rulings_v3_ai_enriched.json
• Verify the output no longer contains placeholders:
grep -R '"source_card_code": "00000"' assets/processed_rulings_v3_ai_enriched.json should return no matches
• Inspect a sample entry to ensure related_card_codes, tags, question, and answer are populated

🤖 Prompt for AI Agents
In assets/processed_rulings_v3_ai_enriched.json lines 1 to 56, the AI enrichment
script did not run properly because the OPENAI_API_KEY environment variable was
not set, resulting in missing data fields like source_card_code,
related_card_codes, tags, question, and answer. To fix this, export your OpenAI
API key in your shell or CI environment using 'export
OPENAI_API_KEY=<your_key_here>', then rerun the enrichment script with the
correct input and output file paths. After rerunning, verify that the output
file no longer contains placeholder values such as "00000" for source_card_code
and that the relevant fields are properly populated.

Comment thread scripts/load_data_to_sqlite.py Outdated
Comment thread scripts/load_data_to_sqlite.py Outdated
Comment on lines +53 to +100
def main():
# Define paths
json_path = Path("assets/processed_rulings_v3_ai_enriched.json")
db_path = Path("abyssal_tome.db")

# Check if JSON file exists
if not json_path.exists():
logging.error(f"JSON file not found at {json_path}")
return

# Create engine and session
engine = create_engine(f"sqlite:///{db_path}")
Session = sessionmaker(bind=engine)
session = Session()

# Create tables
Base.metadata.create_all(engine)

# Load data from JSON
with open(json_path, 'r', encoding='utf-8') as f:
rulings_data = json.load(f)

# Process and add data to the database
for ruling_dict in rulings_data:
# Validate with Pydantic
ruling = Ruling(**ruling_dict)

# Create a RulingDB instance
ruling_db = RulingDB(
id=ruling.id,
source_card_code=ruling.source_card_code,
related_card_codes=ruling.related_card_codes,
ruling_type=ruling.ruling_type,
question=ruling.question,
answer=ruling.answer,
text=ruling.text,
provenance=ruling.provenance.dict(),
original_html_snippet=ruling.original_html_snippet,
tags=ruling.tags,
)
session.add(ruling_db)

# Commit the session
session.commit()
logging.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")

if __name__ == "__main__":
main()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Add error handling and use proper conventions.

The script lacks error handling for database operations and has several issues flagged by static analysis.

Apply these improvements:

-def main():
+def main() -> None:
     # Define paths
     json_path = Path("assets/processed_rulings_v3_ai_enriched.json")
     db_path = Path("abyssal_tome.db")
+    logger = logging.getLogger(__name__)
 
     # Check if JSON file exists
     if not json_path.exists():
-        logging.error(f"JSON file not found at {json_path}")
+        logger.error(f"JSON file not found at {json_path}")
         return
 
     # Create engine and session
     engine = create_engine(f"sqlite:///{db_path}")
-    Session = sessionmaker(bind=engine)
-    session = Session()
+    session_factory = sessionmaker(bind=engine)
+    session = session_factory()
 
     # Create tables
     Base.metadata.create_all(engine)
 
     # Load data from JSON
-    with open(json_path, 'r', encoding='utf-8') as f:
+    with json_path.open('r', encoding='utf-8') as f:
         rulings_data = json.load(f)
 
-    # Process and add data to the database
-    for ruling_dict in rulings_data:
-        # Validate with Pydantic
-        ruling = Ruling(**ruling_dict)
-
-        # Create a RulingDB instance
-        ruling_db = RulingDB(
-            id=ruling.id,
-            source_card_code=ruling.source_card_code,
-            related_card_codes=ruling.related_card_codes,
-            ruling_type=ruling.ruling_type,
-            question=ruling.question,
-            answer=ruling.answer,
-            text=ruling.text,
-            provenance=ruling.provenance.dict(),
-            original_html_snippet=ruling.original_html_snippet,
-            tags=ruling.tags,
-        )
-        session.add(ruling_db)
-
-    # Commit the session
-    session.commit()
-    logging.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")
+    try:
+        # Process and add data to the database
+        for ruling_dict in rulings_data:
+            # Validate with Pydantic
+            ruling = Ruling(**ruling_dict)
+
+            # Check for existing ruling to avoid duplicates
+            existing = session.query(RulingDB).filter_by(id=ruling.id).first()
+            if existing:
+                logger.warning(f"Ruling {ruling.id} already exists, skipping")
+                continue
+
+            # Create a RulingDB instance
+            ruling_db = RulingDB(
+                id=ruling.id,
+                source_card_code=ruling.source_card_code,
+                related_card_codes=ruling.related_card_codes,
+                ruling_type=ruling.ruling_type,
+                question=ruling.question,
+                answer=ruling.answer,
+                text=ruling.text,
+                provenance=ruling.provenance.dict(),
+                original_html_snippet=ruling.original_html_snippet,
+                tags=ruling.tags,
+            )
+            session.add(ruling_db)
+
+        # Commit the session
+        session.commit()
+        logger.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")
+    except Exception as e:
+        session.rollback()
+        logger.error(f"Error loading rulings: {e}")
+        raise
+    finally:
+        session.close()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def main():
# Define paths
json_path = Path("assets/processed_rulings_v3_ai_enriched.json")
db_path = Path("abyssal_tome.db")
# Check if JSON file exists
if not json_path.exists():
logging.error(f"JSON file not found at {json_path}")
return
# Create engine and session
engine = create_engine(f"sqlite:///{db_path}")
Session = sessionmaker(bind=engine)
session = Session()
# Create tables
Base.metadata.create_all(engine)
# Load data from JSON
with open(json_path, 'r', encoding='utf-8') as f:
rulings_data = json.load(f)
# Process and add data to the database
for ruling_dict in rulings_data:
# Validate with Pydantic
ruling = Ruling(**ruling_dict)
# Create a RulingDB instance
ruling_db = RulingDB(
id=ruling.id,
source_card_code=ruling.source_card_code,
related_card_codes=ruling.related_card_codes,
ruling_type=ruling.ruling_type,
question=ruling.question,
answer=ruling.answer,
text=ruling.text,
provenance=ruling.provenance.dict(),
original_html_snippet=ruling.original_html_snippet,
tags=ruling.tags,
)
session.add(ruling_db)
# Commit the session
session.commit()
logging.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")
if __name__ == "__main__":
main()
def main() -> None:
# Define paths
json_path = Path("assets/processed_rulings_v3_ai_enriched.json")
db_path = Path("abyssal_tome.db")
logger = logging.getLogger(__name__)
# Check if JSON file exists
if not json_path.exists():
logger.error(f"JSON file not found at {json_path}")
return
# Create engine and session
engine = create_engine(f"sqlite:///{db_path}")
session_factory = sessionmaker(bind=engine)
session = session_factory()
# Create tables
Base.metadata.create_all(engine)
# Load data from JSON
with json_path.open('r', encoding='utf-8') as f:
rulings_data = json.load(f)
try:
# Process and add data to the database
for ruling_dict in rulings_data:
# Validate with Pydantic
ruling = Ruling(**ruling_dict)
# Check for existing ruling to avoid duplicates
existing = session.query(RulingDB).filter_by(id=ruling.id).first()
if existing:
logger.warning(f"Ruling {ruling.id} already exists, skipping")
continue
# Create a RulingDB instance
ruling_db = RulingDB(
id=ruling.id,
source_card_code=ruling.source_card_code,
related_card_codes=ruling.related_card_codes,
ruling_type=ruling.ruling_type,
question=ruling.question,
answer=ruling.answer,
text=ruling.text,
provenance=ruling.provenance.dict(),
original_html_snippet=ruling.original_html_snippet,
tags=ruling.tags,
)
session.add(ruling_db)
# Commit the session
session.commit()
logger.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")
except Exception as e:
session.rollback()
logger.error(f"Error loading rulings: {e}")
raise
finally:
session.close()
if __name__ == "__main__":
main()
🧰 Tools
🪛 Ruff (0.11.9)

64-64: error() call on root logger

(LOG015)


69-69: Variable Session in function should be lowercase

(N806)


76-76: open() should be replaced by Path.open()

(PTH123)

🤖 Prompt for AI Agents
In scripts/load_data_to_sqlite.py around lines 53 to 100, add try-except blocks
around the database session operations to catch and log exceptions during data
insertion and commit. Ensure the session is properly closed in a finally block
to avoid resource leaks. Also, use logging instead of print statements and
follow PEP8 conventions for imports and function definitions. This will improve
robustness and maintainability of the script.

Comment thread scripts/enrich_rulings_ai.py
Comment thread src/abyssal_tome/app.py Outdated
Comment thread src/abyssal_tome/app.py
Comment thread src/abyssal_tome/app.py
Comment thread src/abyssal_tome/app.py
Comment on lines +67 to +70
@app.get("/rulings", response_model=List[Ruling])
def get_rulings(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
rulings = db.query(RulingDB).offset(skip).limit(limit).all()
return rulings
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add parameter validation and adjust default limit.

The default limit of 100 might be too high for performance. Also, negative values should be prevented.

 @app.get("/rulings", response_model=List[Ruling])
-def get_rulings(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
+def get_rulings(skip: int = 0, limit: int = 20, db: Session = Depends(get_db)) -> List[Ruling]:
+    if skip < 0:
+        raise HTTPException(status_code=400, detail="Skip must be non-negative")
+    if limit < 1 or limit > 100:
+        raise HTTPException(status_code=400, detail="Limit must be between 1 and 100")
     rulings = db.query(RulingDB).offset(skip).limit(limit).all()
     return rulings
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@app.get("/rulings", response_model=List[Ruling])
def get_rulings(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
rulings = db.query(RulingDB).offset(skip).limit(limit).all()
return rulings
@app.get("/rulings", response_model=List[Ruling])
def get_rulings(skip: int = 0, limit: int = 20, db: Session = Depends(get_db)) -> List[Ruling]:
if skip < 0:
raise HTTPException(status_code=400, detail="Skip must be non-negative")
if limit < 1 or limit > 100:
raise HTTPException(status_code=400, detail="Limit must be between 1 and 100")
rulings = db.query(RulingDB).offset(skip).limit(limit).all()
return rulings
🤖 Prompt for AI Agents
In src/abyssal_tome/app.py around lines 67 to 70, the get_rulings endpoint lacks
validation for the skip and limit parameters, allowing negative values and has a
high default limit of 100. Add validation to ensure skip and limit are
non-negative integers and reduce the default limit to a lower value such as 20
to improve performance. Use parameter validation techniques like FastAPI's Query
with ge=0 for non-negative enforcement and set the default limit accordingly.

Comment thread src/abyssal_tome/app.py
Comment on lines +79 to +88
@app.get("/search", response_model=List[Ruling])
def search_rulings(query: str, db: Session = Depends(get_db)):
# A simple search implementation
search_query = f"%{query}%"
rulings = db.query(RulingDB).filter(
(RulingDB.question.like(search_query)) |
(RulingDB.answer.like(search_query)) |
(RulingDB.text.like(search_query))
).all()
return rulings
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Enhance search to include more fields and add pagination.

The current search only covers question, answer, and text fields, missing original_html_snippet. Also, search results should be paginated.

 @app.get("/search", response_model=List[Ruling])
-def search_rulings(query: str, db: Session = Depends(get_db)):
+def search_rulings(
+    query: str,
+    skip: int = 0,
+    limit: int = 20,
+    db: Session = Depends(get_db)
+) -> List[Ruling]:
+    if not query or len(query) < 2:
+        raise HTTPException(status_code=400, detail="Query must be at least 2 characters")
+    if skip < 0:
+        raise HTTPException(status_code=400, detail="Skip must be non-negative")
+    if limit < 1 or limit > 100:
+        raise HTTPException(status_code=400, detail="Limit must be between 1 and 100")
+        
     # A simple search implementation
     search_query = f"%{query}%"
     rulings = db.query(RulingDB).filter(
         (RulingDB.question.like(search_query)) |
         (RulingDB.answer.like(search_query)) |
-        (RulingDB.text.like(search_query))
-    ).all()
+        (RulingDB.text.like(search_query)) |
+        (RulingDB.original_html_snippet.like(search_query))
+    ).offset(skip).limit(limit).all()
     return rulings
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@app.get("/search", response_model=List[Ruling])
def search_rulings(query: str, db: Session = Depends(get_db)):
# A simple search implementation
search_query = f"%{query}%"
rulings = db.query(RulingDB).filter(
(RulingDB.question.like(search_query)) |
(RulingDB.answer.like(search_query)) |
(RulingDB.text.like(search_query))
).all()
return rulings
@app.get("/search", response_model=List[Ruling])
def search_rulings(
query: str,
skip: int = 0,
limit: int = 20,
db: Session = Depends(get_db)
) -> List[Ruling]:
if not query or len(query) < 2:
raise HTTPException(status_code=400, detail="Query must be at least 2 characters")
if skip < 0:
raise HTTPException(status_code=400, detail="Skip must be non-negative")
if limit < 1 or limit > 100:
raise HTTPException(status_code=400, detail="Limit must be between 1 and 100")
# A simple search implementation
search_query = f"%{query}%"
rulings = (
db.query(RulingDB)
.filter(
(RulingDB.question.like(search_query)) |
(RulingDB.answer.like(search_query)) |
(RulingDB.text.like(search_query)) |
(RulingDB.original_html_snippet.like(search_query))
)
.offset(skip)
.limit(limit)
.all()
)
return rulings
🧰 Tools
🪛 Ruff (0.11.9)

86-86: Missing return type annotation for public function search_rulings

(ANN201)


86-86: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🤖 Prompt for AI Agents
In src/abyssal_tome/app.py around lines 79 to 88, the search function only
filters by question, answer, and text fields and returns all results without
pagination. Update the filter to also include the original_html_snippet field in
the search criteria using a LIKE condition. Add pagination parameters such as
limit and offset to the function signature, apply them to the query using
limit() and offset() methods, and return the paginated results.

This commit addresses a number of refactoring suggestions to improve the overall quality of the codebase.

- Extracted duplicated Pydantic and SQLAlchemy models into a shared `src/abyssal_tome/models.py` module.
- Improved `scripts/load_data_to_sqlite.py` with better error handling, logging, and session management.
- Refactored `scripts/enrich_rulings_ai.py` to add error handling for the OpenAI API key, use a dedicated logger, and make the OpenAI model configurable.
- Enhanced `src/abyssal_tome/app.py` with type annotations, a configurable database URL, endpoint validation, and improved search functionality.
- Resolved a merge conflict in `scripts/enrich_rulings_ai.py`.
- Added Pyright to the pre-commit configuration for static type checking.
@danelkay93 danelkay93 closed this Jul 13, 2025
@danelkay93 danelkay93 reopened this Jul 13, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🔭 Outside diff range comments (1)
scripts/enrich_rulings_ai.py (1)

248-248: Consider using deep copy for nested dictionaries.

The current shallow copy may not properly isolate nested dictionaries like provenance if they're modified.

+        import copy
         # Make a copy to avoid modifying the original list of dicts in-place if it's reused
-        enriched_ruling = ruling_dict.copy()  # Works淺copy for dicts of primitives/nested dicts
+        enriched_ruling = copy.deepcopy(ruling_dict)  # Deep copy for nested structures
♻️ Duplicate comments (2)
scripts/load_data_to_sqlite.py (1)

20-33: Fix code style issues flagged by static analysis.

Apply these improvements:

+logger = logging.getLogger(__name__)
+
 if not json_path.exists():
-    logging.error(f"JSON file not found at {json_path}, aborting.")
+    logger.error(f"JSON file not found at {json_path}, aborting.")
     return

 engine = create_engine(f"sqlite:///{db_path}")
-Session = sessionmaker(bind=engine)
-session = Session()
+session_factory = sessionmaker(bind=engine)
+session = session_factory()

 try:
     Base.metadata.create_all(engine)
-    logging.info("Database tables created.")
+    logger.info("Database tables created.")

-    with open(json_path, 'r', encoding='utf-8') as f:
+    with json_path.open('r', encoding='utf-8') as f:
         rulings_data = json.load(f)
src/abyssal_tome/app.py (1)

14-16: Create database tables at application startup.

The application will fail if the database tables don't exist.

Add table creation after the engine setup:

 engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
 SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
+
+# Create tables at startup
+Base.metadata.create_all(engine)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45ac0db and cf66170.

📒 Files selected for processing (6)
  • .pre-commit-config.yaml (1 hunks)
  • pyproject.toml (1 hunks)
  • scripts/enrich_rulings_ai.py (7 hunks)
  • scripts/load_data_to_sqlite.py (1 hunks)
  • src/abyssal_tome/app.py (1 hunks)
  • src/abyssal_tome/models.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
scripts/load_data_to_sqlite.py (1)
src/abyssal_tome/models.py (3)
  • Provenance (22-27)
  • Ruling (29-42)
  • RulingDB (8-20)
🪛 YAMLlint (1.37.1)
.pre-commit-config.yaml

[error] 28-28: too many spaces after hyphen

(hyphens)

🪛 Ruff (0.11.9)
scripts/load_data_to_sqlite.py

22-22: error() call on root logger

(LOG015)


26-26: Variable Session in function should be lowercase

(N806)


31-31: info() call on root logger

(LOG015)


33-33: open() should be replaced by Path.open()

(PTH123)


53-53: error() call on root logger

(LOG015)


56-56: info() call on root logger

(LOG015)


59-59: error() call on root logger

(LOG015)


63-63: info() call on root logger

(LOG015)

src/abyssal_tome/app.py

30-30: Module level import not at top of file

(E402)


35-35: Missing return type annotation for public function get_rulings

(ANN201)


35-35: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


35-35: Line too long (105 > 100)

(E501)


40-40: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


48-48: Missing return type annotation for public function search_rulings

(ANN201)


48-48: Line too long (120 > 100)

(E501)


48-48: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

scripts/enrich_rulings_ai.py

22-22: Line too long (106 > 100)

(E501)


27-27: Line too long (103 > 100)

(E501)


35-35: Line too long (102 > 100)

(E501)


58-58: Line too long (104 > 100)

(E501)


61-61: Line too long (133 > 100)

(E501)


63-63: Line too long (111 > 100)

(E501)


142-142: Line too long (146 > 100)

(E501)


352-352: Line too long (105 > 100)

(E501)


387-387: Line too long (103 > 100)

(E501)

🔇 Additional comments (4)
pyproject.toml (1)

47-48: Good addition of Pyright for type checking.

The version pinning ensures consistent type checking across environments.

scripts/enrich_rulings_ai.py (3)

20-24: Past review comment has been properly addressed.

The error handling for missing API key has been correctly implemented with proper logging and exit behavior.


17-17: Past review comment has been properly addressed.

The logger instance is now properly configured using logging.getLogger(__name__) instead of the root logger.


26-26: Past review comment has been properly addressed.

The model parameter is now configurable via the OPENAI_MODEL environment variable with a sensible default.

Comment thread .pre-commit-config.yaml
Comment on lines +28 to +32
- id: pyright
name: Pyright
entry: .venv/bin/pyright
language: system
types: [python]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Fix YAML formatting - too many spaces after hyphen.

Apply this diff to fix the YAML formatting:

-    -   id: pyright
+    -   id: pyright

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 28-28: too many spaces after hyphen

(hyphens)

🤖 Prompt for AI Agents
In .pre-commit-config.yaml around lines 28 to 32, the YAML formatting is
incorrect due to too many spaces after the hyphen in the list item for the
pyright hook. Remove the extra spaces after the hyphen so that the hyphen is
directly followed by the id key without additional spaces, ensuring proper YAML
list syntax.

Comment on lines +41 to +42
class Config:
orm_mode = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Update deprecated Pydantic configuration.

The orm_mode configuration is deprecated in Pydantic v2.

Apply this diff to use the current configuration:

    class Config:
-        orm_mode = True
+        from_attributes = True
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class Config:
orm_mode = True
class Config:
from_attributes = True
🤖 Prompt for AI Agents
In src/abyssal_tome/models.py at lines 41 to 42, the Pydantic Config class uses
the deprecated orm_mode setting. Replace the Config class with the new Pydantic
v2 style by defining a model_config attribute using the ConfigDict with orm_mode
set to True. This updates the model to the current Pydantic configuration
standard.

Comment on lines +52 to +63
logging.error(f"Error processing ruling: {ruling_dict.get('id')}, error: {e}")

session.commit()
logging.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")

except Exception as e:
logging.error(f"An error occurred during database operations: {e}")
session.rollback()
finally:
session.close()
logging.info("Database session closed.")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Use module logger instead of root logger.

Update the remaining logging calls:

-                logging.error(f"Error processing ruling: {ruling_dict.get('id')}, error: {e}")
+                logger.error(f"Error processing ruling: {ruling_dict.get('id')}, error: {e}")

         session.commit()
-        logging.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")
+        logger.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")

     except Exception as e:
-        logging.error(f"An error occurred during database operations: {e}")
+        logger.error(f"An error occurred during database operations: {e}")
         session.rollback()
     finally:
         session.close()
-        logging.info("Database session closed.")
+        logger.info("Database session closed.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logging.error(f"Error processing ruling: {ruling_dict.get('id')}, error: {e}")
session.commit()
logging.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")
except Exception as e:
logging.error(f"An error occurred during database operations: {e}")
session.rollback()
finally:
session.close()
logging.info("Database session closed.")
logger.error(f"Error processing ruling: {ruling_dict.get('id')}, error: {e}")
session.commit()
logger.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")
except Exception as e:
logger.error(f"An error occurred during database operations: {e}")
session.rollback()
finally:
session.close()
logger.info("Database session closed.")
🧰 Tools
🪛 Ruff (0.11.9)

53-53: error() call on root logger

(LOG015)


56-56: info() call on root logger

(LOG015)


59-59: error() call on root logger

(LOG015)


63-63: info() call on root logger

(LOG015)

🤖 Prompt for AI Agents
In scripts/load_data_to_sqlite.py around lines 52 to 63, replace all calls to
the root logger (logging.error, logging.info) with calls to the module-specific
logger instance. Identify the logger created for the module (e.g., logger =
logging.getLogger(__name__)) and use it for all logging calls in this block to
ensure consistent and configurable logging.

Comment thread src/abyssal_tome/app.py
Comment on lines +3 to +7
from fastapi import Depends, FastAPI, HTTPException
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, Session
from src.abyssal_tome.models import Provenance, Ruling, RulingDB, Base
from typing import List, Generator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Move all imports to the top of the file.

Move the Query import to the top with other FastAPI imports:

-from fastapi import Depends, FastAPI, HTTPException
+from fastapi import Depends, FastAPI, HTTPException, Query
 from sqlalchemy import create_engine
 from sqlalchemy.orm import sessionmaker, Session
 from src.abyssal_tome.models import Provenance, Ruling, RulingDB, Base
 from typing import List, Generator

And remove line 30:

-from fastapi import Query

Also applies to: 30-30

🤖 Prompt for AI Agents
In src/abyssal_tome/app.py around lines 3 to 7 and line 30, move the import of
Query from its current position at line 30 to join the other FastAPI imports at
the top of the file. Remove the separate import statement at line 30 after
relocating it. This consolidates all imports at the top for better organization.

Comment thread src/abyssal_tome/app.py
Comment on lines +32 to +35
def get_rulings(skip: int = Query(0, ge=0), limit: int = Query(20, ge=0), db: Session = Depends(get_db)):
rulings = db.query(RulingDB).offset(skip).limit(limit).all()
return rulings

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add return type annotation and fix line length.

 @app.get("/rulings", response_model=List[Ruling])
-def get_rulings(skip: int = Query(0, ge=0), limit: int = Query(20, ge=0), db: Session = Depends(get_db)):
+def get_rulings(
+    skip: int = Query(0, ge=0),
+    limit: int = Query(20, ge=0),
+    db: Session = Depends(get_db)
+) -> List[Ruling]:
     rulings = db.query(RulingDB).offset(skip).limit(limit).all()
     return rulings
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_rulings(skip: int = Query(0, ge=0), limit: int = Query(20, ge=0), db: Session = Depends(get_db)):
rulings = db.query(RulingDB).offset(skip).limit(limit).all()
return rulings
@app.get("/rulings", response_model=List[Ruling])
def get_rulings(
skip: int = Query(0, ge=0),
limit: int = Query(20, ge=0),
db: Session = Depends(get_db)
) -> List[Ruling]:
rulings = db.query(RulingDB).offset(skip).limit(limit).all()
return rulings
🧰 Tools
🪛 Ruff (0.11.9)

35-35: Missing return type annotation for public function get_rulings

(ANN201)


35-35: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


35-35: Line too long (105 > 100)

(E501)

🤖 Prompt for AI Agents
In src/abyssal_tome/app.py around lines 32 to 35, the function get_rulings lacks
a return type annotation and the line length may exceed style guidelines. Add an
appropriate return type annotation to the function signature, such as
List[RulingDB], and reformat the code to ensure no line exceeds the maximum
allowed length, possibly by breaking long lines or adjusting spacing.

Comment thread src/abyssal_tome/app.py
Comment on lines +44 to +53
def search_rulings(query: str, skip: int = Query(0, ge=0), limit: int = Query(20, ge=0), db: Session = Depends(get_db)):
# A simple search implementation
search_query = f"%{query}%"
rulings = db.query(RulingDB).filter(
(RulingDB.question.like(search_query)) |
(RulingDB.answer.like(search_query)) |
(RulingDB.text.like(search_query)) |
(RulingDB.original_html_snippet.like(search_query))
).offset(skip).limit(limit).all()
return rulings
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add return type annotation and fix line length.

 @app.get("/search", response_model=List[Ruling])
-def search_rulings(query: str, skip: int = Query(0, ge=0), limit: int = Query(20, ge=0), db: Session = Depends(get_db)):
+def search_rulings(
+    query: str,
+    skip: int = Query(0, ge=0),
+    limit: int = Query(20, ge=0),
+    db: Session = Depends(get_db)
+) -> List[Ruling]:
     # A simple search implementation
     search_query = f"%{query}%"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def search_rulings(query: str, skip: int = Query(0, ge=0), limit: int = Query(20, ge=0), db: Session = Depends(get_db)):
# A simple search implementation
search_query = f"%{query}%"
rulings = db.query(RulingDB).filter(
(RulingDB.question.like(search_query)) |
(RulingDB.answer.like(search_query)) |
(RulingDB.text.like(search_query)) |
(RulingDB.original_html_snippet.like(search_query))
).offset(skip).limit(limit).all()
return rulings
@app.get("/search", response_model=List[Ruling])
def search_rulings(
query: str,
skip: int = Query(0, ge=0),
limit: int = Query(20, ge=0),
db: Session = Depends(get_db)
) -> List[Ruling]:
# A simple search implementation
search_query = f"%{query}%"
rulings = db.query(RulingDB).filter(
(RulingDB.question.like(search_query)) |
(RulingDB.answer.like(search_query)) |
(RulingDB.text.like(search_query)) |
(RulingDB.original_html_snippet.like(search_query))
).offset(skip).limit(limit).all()
return rulings
🧰 Tools
🪛 Ruff (0.11.9)

48-48: Missing return type annotation for public function search_rulings

(ANN201)


48-48: Line too long (120 > 100)

(E501)


48-48: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🤖 Prompt for AI Agents
In src/abyssal_tome/app.py around lines 44 to 53, the search_rulings function
lacks a return type annotation and some lines exceed recommended length limits.
Add an appropriate return type annotation to the function signature reflecting
the expected return type, such as a list of RulingDB objects. Also, refactor the
filter conditions to break long lines into shorter ones, improving readability
and adhering to line length standards.


api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
logger.error("OPENAI_API_KEY environment variable not set. Please set it before running this script.")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Fix line length violations.

Multiple lines exceed the 100-character limit as flagged by static analysis.

Consider breaking these long lines for better readability:

-    logger.error("OPENAI_API_KEY environment variable not set. Please set it before running this script.")
+    logger.error(
+        "OPENAI_API_KEY environment variable not set. Please set it before running this script."
+    )

-def call_openai_api(prompt: str, model: str = os.environ.get("OPENAI_MODEL", "gpt-4o")) -> dict | None:
+def call_openai_api(
+    prompt: str, model: str = os.environ.get("OPENAI_MODEL", "gpt-4o")
+) -> dict | None:

-                {"role": "system", "content": "You are a helpful assistant designed to output JSON."},
+                {
+                    "role": "system", 
+                    "content": "You are a helpful assistant designed to output JSON."
+                },

Apply similar formatting to other long lines for consistency.

Also applies to: 27-27, 35-35, 58-58, 61-61, 63-63, 142-142, 352-352, 387-387

🧰 Tools
🪛 Ruff (0.11.9)

22-22: Line too long (106 > 100)

(E501)

🤖 Prompt for AI Agents
In scripts/enrich_rulings_ai.py at line 22 and other specified lines (27, 35,
58, 61, 63, 142, 352, 387), some lines exceed the 100-character limit. To fix
this, break these long lines into multiple shorter lines using line continuation
or string concatenation to improve readability and comply with style guidelines.
Apply consistent formatting across all these lines.

Comment on lines +59 to +69
prompt = f"""
Given the following ruling text for the card with code '{source_card_code}', identify any other card codes mentioned in the text.
Card codes are 5-digit strings.
Return a JSON object with a single key "related_card_codes" containing a list of the identified card codes.
Do not include the source card code '{source_card_code}' in the list.

Ruling text:
---
{ruling_text}
---
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider improving prompt structure for better AI responses.

The prompt could be more structured to ensure consistent JSON responses and handle edge cases better.

     prompt = f"""
-    Given the following ruling text for the card with code '{source_card_code}', identify any other card codes mentioned in the text.
-    Card codes are 5-digit strings.
-    Return a JSON object with a single key "related_card_codes" containing a list of the identified card codes.
-    Do not include the source card code '{source_card_code}' in the list.
+    Analyze the following ruling text for the card '{source_card_code}' and identify any other card codes mentioned.
+    
+    Rules:
+    - Card codes are exactly 5-digit strings (e.g., "01001", "02034")
+    - Look for patterns like [01001], card 01001, or just 01001
+    - Exclude the source card code '{source_card_code}'
+    - Return empty list if no other cards are found
+    
+    Return JSON format: {{"related_card_codes": ["01001", "02034"]}}

     Ruling text:
     ---
     {ruling_text}
     ---
     """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
prompt = f"""
Given the following ruling text for the card with code '{source_card_code}', identify any other card codes mentioned in the text.
Card codes are 5-digit strings.
Return a JSON object with a single key "related_card_codes" containing a list of the identified card codes.
Do not include the source card code '{source_card_code}' in the list.
Ruling text:
---
{ruling_text}
---
"""
prompt = f"""
Analyze the following ruling text for the card '{source_card_code}' and identify any other card codes mentioned.
Rules:
- Card codes are exactly 5-digit strings (e.g., "01001", "02034")
- Look for patterns like [01001], card 01001, or just 01001
- Exclude the source card code '{source_card_code}'
- Return empty list if no other cards are found
Return JSON format: {{"related_card_codes": ["01001", "02034"]}}
Ruling text:
---
{ruling_text}
---
"""
🧰 Tools
🪛 Ruff (0.11.9)

61-61: Line too long (133 > 100)

(E501)


63-63: Line too long (111 > 100)

(E501)

🤖 Prompt for AI Agents
In scripts/enrich_rulings_ai.py around lines 59 to 69, the prompt for the AI
should be restructured to improve clarity and consistency of JSON responses.
Refine the prompt by explicitly instructing the AI to always return a valid JSON
object, even if no related card codes are found, and clarify the format and
constraints. Add examples or specify how to handle cases with no matches to
ensure robust parsing of the AI output.

Comment on lines +40 to +42
if isinstance(content, dict):
return content
return json.loads(content)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix potential JSON parsing logic error.

The content from OpenAI API is always a string that needs JSON parsing. The isinstance check for dict will never be true.

-        content = response.choices[0].message.content
-        if isinstance(content, dict):
-            return content
-        return json.loads(content)
+        content = response.choices[0].message.content
+        if content is None:
+            logger.error("Received null content from OpenAI API")
+            return None
+        return json.loads(content)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if isinstance(content, dict):
return content
return json.loads(content)
content = response.choices[0].message.content
if content is None:
logger.error("Received null content from OpenAI API")
return None
return json.loads(content)
🤖 Prompt for AI Agents
In scripts/enrich_rulings_ai.py around lines 40 to 42, remove the isinstance
check for dict since the content from the OpenAI API is always a string. Always
parse the content using json.loads without conditionally returning it as a dict.

…ped)

This commit addresses a number of refactoring suggestions to improve the overall quality of the codebase. The rebase onto master was skipped due to timeouts in the environment.

- Extracted duplicated Pydantic and SQLAlchemy models into a shared `src/abyssal_tome/models.py` module.
- Improved `scripts/load_data_to_sqlite.py` with better error handling, logging, and session management.
- Refactored `scripts/enrich_rulings_ai.py` to add error handling for the OpenAI API key, use a dedicated logger, and make the OpenAI model configurable.
- Enhanced `src/abyssal_tome/app.py` with type annotations, a configurable database URL, endpoint validation, and improved search functionality.
- Resolved a merge conflict in `scripts/enrich_rulings_ai.py`.
- Added Pyright to the pre-commit configuration for static type checking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants