Conversation
This commit includes the initial work on enhancing the data pipeline. Due to environmental constraints (timeouts), I was unable to make significant progress. - I created `scripts/load_data_to_sqlite.py` to load processed JSON data into a SQLite database. - I attempted to run the scraper and processing scripts, but they timed out. - The data pipeline work is incomplete and will require further investigation in a more stable environment.
📝 WalkthroughSummary by CodeRabbit
Summary by CodeRabbit
WalkthroughThe update replaces the previous Flet-based GUI application with a new FastAPI backend that serves game card rulings from a SQLite database via REST endpoints. Supporting scripts for AI enrichment and data loading are updated or added, and a basic test verifies the new API. A new enriched rulings dataset is also introduced. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant FastAPI_App
participant SQLite_DB
Client->>FastAPI_App: GET /rulings
FastAPI_App->>SQLite_DB: Query rulings (with pagination)
SQLite_DB-->>FastAPI_App: Return list of rulings
FastAPI_App-->>Client: Return JSON list of rulings
Client->>FastAPI_App: GET /rulings/{ruling_id}
FastAPI_App->>SQLite_DB: Query ruling by ID
SQLite_DB-->>FastAPI_App: Return ruling or not found
FastAPI_App-->>Client: Return ruling JSON or 404
Client->>FastAPI_App: GET /search?query=...
FastAPI_App->>SQLite_DB: Search rulings (LIKE on fields)
SQLite_DB-->>FastAPI_App: Return matching rulings
FastAPI_App-->>Client: Return JSON list of matches
Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Pull Request Overview
This PR lays the groundwork for the new data pipeline by adding a SQLite loading script, converting the Flet app to a SQLAlchemy‐backed FastAPI service, and introducing a basic endpoint test.
- Adds a basic test for the
/rulingsendpoint. - Refactors
app.pyto use SQLAlchemy models and FastAPI endpoints (/rulings,/rulings/{id},/search). - Introduces
scripts/load_data_to_sqlite.pyto populate the SQLite database from enriched JSON.
Reviewed Changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/test_app.py | Added TestClient and a test for GET /rulings. |
| src/abyssal_tome/app.py | Replaced Flet UI logic with FastAPI, SQLAlchemy setup, and endpoints. |
| scripts/load_data_to_sqlite.py | New script to read JSON and insert records into SQLite. |
| scripts/enrich_rulings_ai.py | Updated AI helper functions to call OpenAI API. |
| assets/processed_rulings_v3_ai_enriched.json | Sample data for the enrichment pipeline. |
Comments suppressed due to low confidence (3)
tests/test_app.py:7
- [nitpick] This test only covers the GET /rulings endpoint. Consider adding tests for pagination parameters (skip, limit), GET /rulings/{ruling_id}, and the /search endpoint to improve coverage.
response = client.get("/rulings")
scripts/load_data_to_sqlite.py:73
- The session is committed but not explicitly closed; consider calling session.close() in a finally block to ensure database connections are properly released.
rulings_data = json.load(f)
|
|
||
| # --- Database Setup --- | ||
| DATABASE_URL = "sqlite:///abyssal_tome.db" | ||
| engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False}) |
There was a problem hiding this comment.
The database tables are never created before handling requests. Consider calling Base.metadata.create_all(engine) at application startup to ensure the 'rulings' table exists.
| engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False}) | |
| engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False}) | |
| # Ensure all tables are created at application startup | |
| Base.metadata.create_all(engine) |
There was a problem hiding this comment.
Actionable comments posted: 12
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
abyssal_tome.dbis excluded by!**/*.db
📒 Files selected for processing (5)
assets/processed_rulings_v3_ai_enriched.json(1 hunks)scripts/enrich_rulings_ai.py(2 hunks)scripts/load_data_to_sqlite.py(1 hunks)src/abyssal_tome/app.py(1 hunks)tests/test_app.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
scripts/load_data_to_sqlite.py (2)
src/abyssal_tome/app.py (3)
Provenance(33-38)Ruling(40-53)RulingDB(18-30)scripts/enrich_rulings_ai.py (1)
main(308-380)
src/abyssal_tome/app.py (2)
scripts/load_data_to_sqlite.py (3)
RulingDB(38-50)Provenance(16-21)Ruling(23-33)scripts/process_new_format.py (2)
Provenance(65-70)Ruling(73-88)
🪛 Ruff (0.11.9)
scripts/load_data_to_sqlite.py
64-64: error() call on root logger
(LOG015)
69-69: Variable Session in function should be lowercase
(N806)
76-76: open() should be replaced by Path.open()
(PTH123)
scripts/enrich_rulings_ai.py
29-29: Line too long (102 > 100)
(E501)
36-36: error() call on root logger
(LOG015)
48-50: info() call on root logger
(LOG015)
49-49: Line too long (104 > 100)
(E501)
52-52: Line too long (133 > 100)
(E501)
54-54: Line too long (111 > 100)
(E501)
78-78: info() call on root logger
(LOG015)
105-105: info() call on root logger
(LOG015)
130-130: info() call on root logger
(LOG015)
133-133: Line too long (146 > 100)
(E501)
src/abyssal_tome/app.py
63-63: Missing return type annotation for public function get_db
(ANN201)
73-73: Missing return type annotation for public function get_rulings
(ANN201)
73-73: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
78-78: Missing return type annotation for public function get_ruling
(ANN201)
78-78: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
86-86: Missing return type annotation for public function search_rulings
(ANN201)
86-86: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
| def test_get_rulings(): | ||
| response = client.get("/rulings") | ||
| assert response.status_code == 200 | ||
| assert isinstance(response.json(), list) |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
🛠️ Refactor suggestion
Expand test coverage for comprehensive API validation.
The current test only verifies basic endpoint availability. Consider adding tests for:
- Pagination parameters (
skip,limit) - Individual ruling retrieval (
/rulings/{ruling_id}) - Search functionality (
/search) - Data structure validation
- Error cases (404, invalid parameters)
Would you like me to generate comprehensive test cases covering all endpoints and edge cases?
🤖 Prompt for AI Agents
In tests/test_app.py around lines 6 to 9, the current test only checks basic
availability of the /rulings endpoint. Expand test coverage by adding tests for
pagination parameters like skip and limit on /rulings, tests for retrieving
individual rulings via /rulings/{ruling_id}, tests for the /search endpoint
functionality, validation of the data structure returned by these endpoints, and
tests for error cases such as 404 responses and invalid parameters. Implement
these additional test cases to ensure comprehensive API validation.
There was a problem hiding this comment.
Yes, please do.
There was a problem hiding this comment.
@danelkay93 I'll generate comprehensive test cases covering all endpoints and edge cases for your FastAPI application.
Comprehensive Test Suite Expansion
- Add tests for pagination parameters on /rulings endpoint
- Add tests for individual ruling retrieval via /rulings/{ruling_id}
- Add tests for search functionality via /search endpoint
- Add data structure validation tests
- Add error case testing (404, invalid parameters)
- Add fixtures and test data setup
I'll implement a comprehensive test suite that covers all your FastAPI endpoints with proper mocking, data validation, and error handling. Another agent will handle implementing these changes to your test file.
(\ /)
( .. )
o(")(") excited to help expand test coverage
⏳ Preparing changes...
There was a problem hiding this comment.
A summary of the changes CodeRabbit can apply:
- Add a comprehensive pytest test file at tests/test_app.py that expands coverage by including fixtures, endpoint availability, pagination, validation, individual retrieval, search functionality, data structure validation, error handling, integration workflows, performance edge cases, content type checks, and database error handling to thoroughly test the FastAPI app’s ruling-related endpoints.
- ✅ Create PR with these edits
- 📌 Create commit in current branch
- 📋 Get copyable edits
| [ | ||
| { | ||
| "id": "f787ed89-a728-410b-be0e-dcb3d6ed118c", | ||
| "source_card_code": "00000", | ||
| "related_card_codes": [], | ||
| "provenance": { | ||
| "source_type": "discord_mythosbusters_rules", | ||
| "source_name": null, | ||
| "source_date": null, | ||
| "retrieval_date": "2024-03-16T10:00:00Z", | ||
| "source_url": "https://discord.com/channels/mythosbusters/1234567890" | ||
| }, | ||
| "original_html_snippet": "UserX: Quick question about [Lucky!]. If I fail by 2, can I play two copies to pass?\nMJNewman_FFG: No, Lucky! (0) says 'If this skill test is successful instead'. Once the first Lucky resolves and makes the test successful, the trigger condition for the second Lucky is no longer met. (Approx Dec 2022)", | ||
| "tags": [], | ||
| "ruling_type": "CLARIFICATION", | ||
| "question": null, | ||
| "answer": null, | ||
| "text": "UserX: Quick question about [Lucky!]. If I fail by 2, can I play two copies to pass?\nMJNewman_FFG: No, Lucky! (0) says 'If this skill test is successful instead'. Once the first Lucky resolves and makes the test successful, the trigger condition for the second Lucky is no longer met. (Approx Dec 2022)" | ||
| }, | ||
| { | ||
| "id": "07f645bc-da8e-45db-9c3a-3330d8585f87", | ||
| "source_card_code": "00000", | ||
| "related_card_codes": [], | ||
| "provenance": { | ||
| "source_type": "bgg_card_comment", | ||
| "source_name": null, | ||
| "source_date": null, | ||
| "retrieval_date": "2024-03-16T10:05:00Z", | ||
| "source_url": "https://boardgamegeek.com/arkham-horror-the-card-game/card/some-card/forums/0" | ||
| }, | ||
| "original_html_snippet": "I saw a ruling somewhere that said Machete's bonus damage applies against Massive enemies if you're engaged with them. Was that official? Seems to be from an old email.", | ||
| "tags": [], | ||
| "ruling_type": "CLARIFICATION", | ||
| "question": null, | ||
| "answer": null, | ||
| "text": "I saw a ruling somewhere that said Machete's bonus damage applies against Massive enemies if you're engaged with them. Was that official? Seems to be from an old email." | ||
| }, | ||
| { | ||
| "id": "67ff66c3-a24e-46a2-9d28-08298b3e6b9a", | ||
| "source_card_code": "00000", | ||
| "related_card_codes": [], | ||
| "provenance": { | ||
| "source_type": "generic_forum_post", | ||
| "source_name": null, | ||
| "source_date": null, | ||
| "retrieval_date": "2024-03-16T10:10:00Z", | ||
| "source_url": "someforum.com/thread/123" | ||
| }, | ||
| "original_html_snippet": "Q: Can [Working a Hunch] be played if there are no clues at your location? A: The card says 'Discover 1 clue at your location'. If there are no clues to discover, you cannot play the card, as you cannot resolve its effect. This is a general game principle for 'discover' effects. (Source: FFG Email ruling, early 2018)", | ||
| "tags": [], | ||
| "ruling_type": "CLARIFICATION", | ||
| "question": null, | ||
| "answer": null, | ||
| "text": "Q: Can [Working a Hunch] be played if there are no clues at your location? A: The card says 'Discover 1 clue at your location'. If there are no clues to discover, you cannot play the card, as you cannot resolve its effect. This is a general game principle for 'discover' effects. (Source: FFG Email ruling, early 2018)" | ||
| } | ||
| ] No newline at end of file |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
AI enrichment appears incomplete - verify the enrichment pipeline.
The JSON data shows signs of incomplete AI enrichment:
- All entries use placeholder
source_card_code: "00000" - Empty
related_card_codesdespite card references like "[Lucky!]" and "[Machete]" in text - Empty
tagsarrays - Null
question/answerfields despite Q&A format in the text
This suggests the AI enrichment in scripts/enrich_rulings_ai.py may not have executed properly.
Run the following script to verify the AI enrichment process:
🏁 Script executed:
#!/bin/bash
# Description: Check if the OpenAI API key is set and test the enrichment script
# Test 1: Check if OPENAI_API_KEY environment variable is set
if [ -z "$OPENAI_API_KEY" ]; then
echo "ERROR: OPENAI_API_KEY environment variable is not set"
else
echo "OPENAI_API_KEY is set"
fi
# Test 2: Check if the enrichment script exists and has proper imports
echo -e "\n--- Checking enrichment script ---"
if [ -f "scripts/enrich_rulings_ai.py" ]; then
# Check for OpenAI import and usage
rg -A 2 "from openai import OpenAI|client = OpenAI" scripts/enrich_rulings_ai.py
# Check if the AI functions are being called
echo -e "\n--- AI function calls ---"
rg "ai_get_related_cards|ai_generate_tags|ai_extract_q_and_a" scripts/enrich_rulings_ai.py
fi
# Test 3: Look for any error logs from previous runs
echo -e "\n--- Checking for error patterns in logs ---"
fd -e log -e txt | xargs -I {} rg -i "error.*openai|api.*key|enrichment.*failed" {} 2>/dev/null || echo "No log files found"Length of output: 1459
Set and export OPENAI_API_KEY, then rerun AI enrichment
The enrichment script never saw a valid API key (we saw “ERROR: OPENAI_API_KEY environment variable is not set”), so it skipped all OpenAI calls. As a result, fields like source_card_code, related_card_codes, tags, and question/answer remain unset.
• Export your key before running the script:
• In your shell or CI:
export OPENAI_API_KEY=<your_key_here>
• Rerun:
python scripts/enrich_rulings_ai.py assets/processed_rulings_v3.json assets/processed_rulings_v3_ai_enriched.json
• Verify the output no longer contains placeholders:
• grep -R '"source_card_code": "00000"' assets/processed_rulings_v3_ai_enriched.json should return no matches
• Inspect a sample entry to ensure related_card_codes, tags, question, and answer are populated
🤖 Prompt for AI Agents
In assets/processed_rulings_v3_ai_enriched.json lines 1 to 56, the AI enrichment
script did not run properly because the OPENAI_API_KEY environment variable was
not set, resulting in missing data fields like source_card_code,
related_card_codes, tags, question, and answer. To fix this, export your OpenAI
API key in your shell or CI environment using 'export
OPENAI_API_KEY=<your_key_here>', then rerun the enrichment script with the
correct input and output file paths. After rerunning, verify that the output
file no longer contains placeholder values such as "00000" for source_card_code
and that the relevant fields are properly populated.
| def main(): | ||
| # Define paths | ||
| json_path = Path("assets/processed_rulings_v3_ai_enriched.json") | ||
| db_path = Path("abyssal_tome.db") | ||
|
|
||
| # Check if JSON file exists | ||
| if not json_path.exists(): | ||
| logging.error(f"JSON file not found at {json_path}") | ||
| return | ||
|
|
||
| # Create engine and session | ||
| engine = create_engine(f"sqlite:///{db_path}") | ||
| Session = sessionmaker(bind=engine) | ||
| session = Session() | ||
|
|
||
| # Create tables | ||
| Base.metadata.create_all(engine) | ||
|
|
||
| # Load data from JSON | ||
| with open(json_path, 'r', encoding='utf-8') as f: | ||
| rulings_data = json.load(f) | ||
|
|
||
| # Process and add data to the database | ||
| for ruling_dict in rulings_data: | ||
| # Validate with Pydantic | ||
| ruling = Ruling(**ruling_dict) | ||
|
|
||
| # Create a RulingDB instance | ||
| ruling_db = RulingDB( | ||
| id=ruling.id, | ||
| source_card_code=ruling.source_card_code, | ||
| related_card_codes=ruling.related_card_codes, | ||
| ruling_type=ruling.ruling_type, | ||
| question=ruling.question, | ||
| answer=ruling.answer, | ||
| text=ruling.text, | ||
| provenance=ruling.provenance.dict(), | ||
| original_html_snippet=ruling.original_html_snippet, | ||
| tags=ruling.tags, | ||
| ) | ||
| session.add(ruling_db) | ||
|
|
||
| # Commit the session | ||
| session.commit() | ||
| logging.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}") | ||
|
|
||
| if __name__ == "__main__": | ||
| main() |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add error handling and use proper conventions.
The script lacks error handling for database operations and has several issues flagged by static analysis.
Apply these improvements:
-def main():
+def main() -> None:
# Define paths
json_path = Path("assets/processed_rulings_v3_ai_enriched.json")
db_path = Path("abyssal_tome.db")
+ logger = logging.getLogger(__name__)
# Check if JSON file exists
if not json_path.exists():
- logging.error(f"JSON file not found at {json_path}")
+ logger.error(f"JSON file not found at {json_path}")
return
# Create engine and session
engine = create_engine(f"sqlite:///{db_path}")
- Session = sessionmaker(bind=engine)
- session = Session()
+ session_factory = sessionmaker(bind=engine)
+ session = session_factory()
# Create tables
Base.metadata.create_all(engine)
# Load data from JSON
- with open(json_path, 'r', encoding='utf-8') as f:
+ with json_path.open('r', encoding='utf-8') as f:
rulings_data = json.load(f)
- # Process and add data to the database
- for ruling_dict in rulings_data:
- # Validate with Pydantic
- ruling = Ruling(**ruling_dict)
-
- # Create a RulingDB instance
- ruling_db = RulingDB(
- id=ruling.id,
- source_card_code=ruling.source_card_code,
- related_card_codes=ruling.related_card_codes,
- ruling_type=ruling.ruling_type,
- question=ruling.question,
- answer=ruling.answer,
- text=ruling.text,
- provenance=ruling.provenance.dict(),
- original_html_snippet=ruling.original_html_snippet,
- tags=ruling.tags,
- )
- session.add(ruling_db)
-
- # Commit the session
- session.commit()
- logging.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")
+ try:
+ # Process and add data to the database
+ for ruling_dict in rulings_data:
+ # Validate with Pydantic
+ ruling = Ruling(**ruling_dict)
+
+ # Check for existing ruling to avoid duplicates
+ existing = session.query(RulingDB).filter_by(id=ruling.id).first()
+ if existing:
+ logger.warning(f"Ruling {ruling.id} already exists, skipping")
+ continue
+
+ # Create a RulingDB instance
+ ruling_db = RulingDB(
+ id=ruling.id,
+ source_card_code=ruling.source_card_code,
+ related_card_codes=ruling.related_card_codes,
+ ruling_type=ruling.ruling_type,
+ question=ruling.question,
+ answer=ruling.answer,
+ text=ruling.text,
+ provenance=ruling.provenance.dict(),
+ original_html_snippet=ruling.original_html_snippet,
+ tags=ruling.tags,
+ )
+ session.add(ruling_db)
+
+ # Commit the session
+ session.commit()
+ logger.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")
+ except Exception as e:
+ session.rollback()
+ logger.error(f"Error loading rulings: {e}")
+ raise
+ finally:
+ session.close()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def main(): | |
| # Define paths | |
| json_path = Path("assets/processed_rulings_v3_ai_enriched.json") | |
| db_path = Path("abyssal_tome.db") | |
| # Check if JSON file exists | |
| if not json_path.exists(): | |
| logging.error(f"JSON file not found at {json_path}") | |
| return | |
| # Create engine and session | |
| engine = create_engine(f"sqlite:///{db_path}") | |
| Session = sessionmaker(bind=engine) | |
| session = Session() | |
| # Create tables | |
| Base.metadata.create_all(engine) | |
| # Load data from JSON | |
| with open(json_path, 'r', encoding='utf-8') as f: | |
| rulings_data = json.load(f) | |
| # Process and add data to the database | |
| for ruling_dict in rulings_data: | |
| # Validate with Pydantic | |
| ruling = Ruling(**ruling_dict) | |
| # Create a RulingDB instance | |
| ruling_db = RulingDB( | |
| id=ruling.id, | |
| source_card_code=ruling.source_card_code, | |
| related_card_codes=ruling.related_card_codes, | |
| ruling_type=ruling.ruling_type, | |
| question=ruling.question, | |
| answer=ruling.answer, | |
| text=ruling.text, | |
| provenance=ruling.provenance.dict(), | |
| original_html_snippet=ruling.original_html_snippet, | |
| tags=ruling.tags, | |
| ) | |
| session.add(ruling_db) | |
| # Commit the session | |
| session.commit() | |
| logging.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}") | |
| if __name__ == "__main__": | |
| main() | |
| def main() -> None: | |
| # Define paths | |
| json_path = Path("assets/processed_rulings_v3_ai_enriched.json") | |
| db_path = Path("abyssal_tome.db") | |
| logger = logging.getLogger(__name__) | |
| # Check if JSON file exists | |
| if not json_path.exists(): | |
| logger.error(f"JSON file not found at {json_path}") | |
| return | |
| # Create engine and session | |
| engine = create_engine(f"sqlite:///{db_path}") | |
| session_factory = sessionmaker(bind=engine) | |
| session = session_factory() | |
| # Create tables | |
| Base.metadata.create_all(engine) | |
| # Load data from JSON | |
| with json_path.open('r', encoding='utf-8') as f: | |
| rulings_data = json.load(f) | |
| try: | |
| # Process and add data to the database | |
| for ruling_dict in rulings_data: | |
| # Validate with Pydantic | |
| ruling = Ruling(**ruling_dict) | |
| # Check for existing ruling to avoid duplicates | |
| existing = session.query(RulingDB).filter_by(id=ruling.id).first() | |
| if existing: | |
| logger.warning(f"Ruling {ruling.id} already exists, skipping") | |
| continue | |
| # Create a RulingDB instance | |
| ruling_db = RulingDB( | |
| id=ruling.id, | |
| source_card_code=ruling.source_card_code, | |
| related_card_codes=ruling.related_card_codes, | |
| ruling_type=ruling.ruling_type, | |
| question=ruling.question, | |
| answer=ruling.answer, | |
| text=ruling.text, | |
| provenance=ruling.provenance.dict(), | |
| original_html_snippet=ruling.original_html_snippet, | |
| tags=ruling.tags, | |
| ) | |
| session.add(ruling_db) | |
| # Commit the session | |
| session.commit() | |
| logger.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}") | |
| except Exception as e: | |
| session.rollback() | |
| logger.error(f"Error loading rulings: {e}") | |
| raise | |
| finally: | |
| session.close() | |
| if __name__ == "__main__": | |
| main() |
🧰 Tools
🪛 Ruff (0.11.9)
64-64: error() call on root logger
(LOG015)
69-69: Variable Session in function should be lowercase
(N806)
76-76: open() should be replaced by Path.open()
(PTH123)
🤖 Prompt for AI Agents
In scripts/load_data_to_sqlite.py around lines 53 to 100, add try-except blocks
around the database session operations to catch and log exceptions during data
insertion and commit. Ensure the session is properly closed in a finally block
to avoid resource leaks. Also, use logging instead of print statements and
follow PEP8 conventions for imports and function definitions. This will improve
robustness and maintainability of the script.
| @app.get("/rulings", response_model=List[Ruling]) | ||
| def get_rulings(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)): | ||
| rulings = db.query(RulingDB).offset(skip).limit(limit).all() | ||
| return rulings |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Add parameter validation and adjust default limit.
The default limit of 100 might be too high for performance. Also, negative values should be prevented.
@app.get("/rulings", response_model=List[Ruling])
-def get_rulings(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
+def get_rulings(skip: int = 0, limit: int = 20, db: Session = Depends(get_db)) -> List[Ruling]:
+ if skip < 0:
+ raise HTTPException(status_code=400, detail="Skip must be non-negative")
+ if limit < 1 or limit > 100:
+ raise HTTPException(status_code=400, detail="Limit must be between 1 and 100")
rulings = db.query(RulingDB).offset(skip).limit(limit).all()
return rulings📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @app.get("/rulings", response_model=List[Ruling]) | |
| def get_rulings(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)): | |
| rulings = db.query(RulingDB).offset(skip).limit(limit).all() | |
| return rulings | |
| @app.get("/rulings", response_model=List[Ruling]) | |
| def get_rulings(skip: int = 0, limit: int = 20, db: Session = Depends(get_db)) -> List[Ruling]: | |
| if skip < 0: | |
| raise HTTPException(status_code=400, detail="Skip must be non-negative") | |
| if limit < 1 or limit > 100: | |
| raise HTTPException(status_code=400, detail="Limit must be between 1 and 100") | |
| rulings = db.query(RulingDB).offset(skip).limit(limit).all() | |
| return rulings |
🤖 Prompt for AI Agents
In src/abyssal_tome/app.py around lines 67 to 70, the get_rulings endpoint lacks
validation for the skip and limit parameters, allowing negative values and has a
high default limit of 100. Add validation to ensure skip and limit are
non-negative integers and reduce the default limit to a lower value such as 20
to improve performance. Use parameter validation techniques like FastAPI's Query
with ge=0 for non-negative enforcement and set the default limit accordingly.
| @app.get("/search", response_model=List[Ruling]) | ||
| def search_rulings(query: str, db: Session = Depends(get_db)): | ||
| # A simple search implementation | ||
| search_query = f"%{query}%" | ||
| rulings = db.query(RulingDB).filter( | ||
| (RulingDB.question.like(search_query)) | | ||
| (RulingDB.answer.like(search_query)) | | ||
| (RulingDB.text.like(search_query)) | ||
| ).all() | ||
| return rulings |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Enhance search to include more fields and add pagination.
The current search only covers question, answer, and text fields, missing original_html_snippet. Also, search results should be paginated.
@app.get("/search", response_model=List[Ruling])
-def search_rulings(query: str, db: Session = Depends(get_db)):
+def search_rulings(
+ query: str,
+ skip: int = 0,
+ limit: int = 20,
+ db: Session = Depends(get_db)
+) -> List[Ruling]:
+ if not query or len(query) < 2:
+ raise HTTPException(status_code=400, detail="Query must be at least 2 characters")
+ if skip < 0:
+ raise HTTPException(status_code=400, detail="Skip must be non-negative")
+ if limit < 1 or limit > 100:
+ raise HTTPException(status_code=400, detail="Limit must be between 1 and 100")
+
# A simple search implementation
search_query = f"%{query}%"
rulings = db.query(RulingDB).filter(
(RulingDB.question.like(search_query)) |
(RulingDB.answer.like(search_query)) |
- (RulingDB.text.like(search_query))
- ).all()
+ (RulingDB.text.like(search_query)) |
+ (RulingDB.original_html_snippet.like(search_query))
+ ).offset(skip).limit(limit).all()
return rulings📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @app.get("/search", response_model=List[Ruling]) | |
| def search_rulings(query: str, db: Session = Depends(get_db)): | |
| # A simple search implementation | |
| search_query = f"%{query}%" | |
| rulings = db.query(RulingDB).filter( | |
| (RulingDB.question.like(search_query)) | | |
| (RulingDB.answer.like(search_query)) | | |
| (RulingDB.text.like(search_query)) | |
| ).all() | |
| return rulings | |
| @app.get("/search", response_model=List[Ruling]) | |
| def search_rulings( | |
| query: str, | |
| skip: int = 0, | |
| limit: int = 20, | |
| db: Session = Depends(get_db) | |
| ) -> List[Ruling]: | |
| if not query or len(query) < 2: | |
| raise HTTPException(status_code=400, detail="Query must be at least 2 characters") | |
| if skip < 0: | |
| raise HTTPException(status_code=400, detail="Skip must be non-negative") | |
| if limit < 1 or limit > 100: | |
| raise HTTPException(status_code=400, detail="Limit must be between 1 and 100") | |
| # A simple search implementation | |
| search_query = f"%{query}%" | |
| rulings = ( | |
| db.query(RulingDB) | |
| .filter( | |
| (RulingDB.question.like(search_query)) | | |
| (RulingDB.answer.like(search_query)) | | |
| (RulingDB.text.like(search_query)) | | |
| (RulingDB.original_html_snippet.like(search_query)) | |
| ) | |
| .offset(skip) | |
| .limit(limit) | |
| .all() | |
| ) | |
| return rulings |
🧰 Tools
🪛 Ruff (0.11.9)
86-86: Missing return type annotation for public function search_rulings
(ANN201)
86-86: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
🤖 Prompt for AI Agents
In src/abyssal_tome/app.py around lines 79 to 88, the search function only
filters by question, answer, and text fields and returns all results without
pagination. Update the filter to also include the original_html_snippet field in
the search criteria using a LIKE condition. Add pagination parameters such as
limit and offset to the function signature, apply them to the query using
limit() and offset() methods, and return the paginated results.
This commit addresses a number of refactoring suggestions to improve the overall quality of the codebase. - Extracted duplicated Pydantic and SQLAlchemy models into a shared `src/abyssal_tome/models.py` module. - Improved `scripts/load_data_to_sqlite.py` with better error handling, logging, and session management. - Refactored `scripts/enrich_rulings_ai.py` to add error handling for the OpenAI API key, use a dedicated logger, and make the OpenAI model configurable. - Enhanced `src/abyssal_tome/app.py` with type annotations, a configurable database URL, endpoint validation, and improved search functionality. - Resolved a merge conflict in `scripts/enrich_rulings_ai.py`. - Added Pyright to the pre-commit configuration for static type checking.
There was a problem hiding this comment.
Actionable comments posted: 9
🔭 Outside diff range comments (1)
scripts/enrich_rulings_ai.py (1)
248-248: Consider using deep copy for nested dictionaries.The current shallow copy may not properly isolate nested dictionaries like
provenanceif they're modified.+ import copy # Make a copy to avoid modifying the original list of dicts in-place if it's reused - enriched_ruling = ruling_dict.copy() # Works淺copy for dicts of primitives/nested dicts + enriched_ruling = copy.deepcopy(ruling_dict) # Deep copy for nested structures
♻️ Duplicate comments (2)
scripts/load_data_to_sqlite.py (1)
20-33: Fix code style issues flagged by static analysis.Apply these improvements:
+logger = logging.getLogger(__name__) + if not json_path.exists(): - logging.error(f"JSON file not found at {json_path}, aborting.") + logger.error(f"JSON file not found at {json_path}, aborting.") return engine = create_engine(f"sqlite:///{db_path}") -Session = sessionmaker(bind=engine) -session = Session() +session_factory = sessionmaker(bind=engine) +session = session_factory() try: Base.metadata.create_all(engine) - logging.info("Database tables created.") + logger.info("Database tables created.") - with open(json_path, 'r', encoding='utf-8') as f: + with json_path.open('r', encoding='utf-8') as f: rulings_data = json.load(f)src/abyssal_tome/app.py (1)
14-16: Create database tables at application startup.The application will fail if the database tables don't exist.
Add table creation after the engine setup:
engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False}) SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine) + +# Create tables at startup +Base.metadata.create_all(engine)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
.pre-commit-config.yaml(1 hunks)pyproject.toml(1 hunks)scripts/enrich_rulings_ai.py(7 hunks)scripts/load_data_to_sqlite.py(1 hunks)src/abyssal_tome/app.py(1 hunks)src/abyssal_tome/models.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
scripts/load_data_to_sqlite.py (1)
src/abyssal_tome/models.py (3)
Provenance(22-27)Ruling(29-42)RulingDB(8-20)
🪛 YAMLlint (1.37.1)
.pre-commit-config.yaml
[error] 28-28: too many spaces after hyphen
(hyphens)
🪛 Ruff (0.11.9)
scripts/load_data_to_sqlite.py
22-22: error() call on root logger
(LOG015)
26-26: Variable Session in function should be lowercase
(N806)
31-31: info() call on root logger
(LOG015)
33-33: open() should be replaced by Path.open()
(PTH123)
53-53: error() call on root logger
(LOG015)
56-56: info() call on root logger
(LOG015)
59-59: error() call on root logger
(LOG015)
63-63: info() call on root logger
(LOG015)
src/abyssal_tome/app.py
30-30: Module level import not at top of file
(E402)
35-35: Missing return type annotation for public function get_rulings
(ANN201)
35-35: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
35-35: Line too long (105 > 100)
(E501)
40-40: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
48-48: Missing return type annotation for public function search_rulings
(ANN201)
48-48: Line too long (120 > 100)
(E501)
48-48: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
scripts/enrich_rulings_ai.py
22-22: Line too long (106 > 100)
(E501)
27-27: Line too long (103 > 100)
(E501)
35-35: Line too long (102 > 100)
(E501)
58-58: Line too long (104 > 100)
(E501)
61-61: Line too long (133 > 100)
(E501)
63-63: Line too long (111 > 100)
(E501)
142-142: Line too long (146 > 100)
(E501)
352-352: Line too long (105 > 100)
(E501)
387-387: Line too long (103 > 100)
(E501)
🔇 Additional comments (4)
pyproject.toml (1)
47-48: Good addition of Pyright for type checking.The version pinning ensures consistent type checking across environments.
scripts/enrich_rulings_ai.py (3)
20-24: Past review comment has been properly addressed.The error handling for missing API key has been correctly implemented with proper logging and exit behavior.
17-17: Past review comment has been properly addressed.The logger instance is now properly configured using
logging.getLogger(__name__)instead of the root logger.
26-26: Past review comment has been properly addressed.The model parameter is now configurable via the
OPENAI_MODELenvironment variable with a sensible default.
| - id: pyright | ||
| name: Pyright | ||
| entry: .venv/bin/pyright | ||
| language: system | ||
| types: [python] |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Fix YAML formatting - too many spaces after hyphen.
Apply this diff to fix the YAML formatting:
- - id: pyright
+ - id: pyrightCommittable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 28-28: too many spaces after hyphen
(hyphens)
🤖 Prompt for AI Agents
In .pre-commit-config.yaml around lines 28 to 32, the YAML formatting is
incorrect due to too many spaces after the hyphen in the list item for the
pyright hook. Remove the extra spaces after the hyphen so that the hyphen is
directly followed by the id key without additional spaces, ensuring proper YAML
list syntax.
| class Config: | ||
| orm_mode = True |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Update deprecated Pydantic configuration.
The orm_mode configuration is deprecated in Pydantic v2.
Apply this diff to use the current configuration:
class Config:
- orm_mode = True
+ from_attributes = True📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class Config: | |
| orm_mode = True | |
| class Config: | |
| from_attributes = True |
🤖 Prompt for AI Agents
In src/abyssal_tome/models.py at lines 41 to 42, the Pydantic Config class uses
the deprecated orm_mode setting. Replace the Config class with the new Pydantic
v2 style by defining a model_config attribute using the ConfigDict with orm_mode
set to True. This updates the model to the current Pydantic configuration
standard.
| logging.error(f"Error processing ruling: {ruling_dict.get('id')}, error: {e}") | ||
|
|
||
| session.commit() | ||
| logging.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}") | ||
|
|
||
| except Exception as e: | ||
| logging.error(f"An error occurred during database operations: {e}") | ||
| session.rollback() | ||
| finally: | ||
| session.close() | ||
| logging.info("Database session closed.") | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Use module logger instead of root logger.
Update the remaining logging calls:
- logging.error(f"Error processing ruling: {ruling_dict.get('id')}, error: {e}")
+ logger.error(f"Error processing ruling: {ruling_dict.get('id')}, error: {e}")
session.commit()
- logging.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")
+ logger.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}")
except Exception as e:
- logging.error(f"An error occurred during database operations: {e}")
+ logger.error(f"An error occurred during database operations: {e}")
session.rollback()
finally:
session.close()
- logging.info("Database session closed.")
+ logger.info("Database session closed.")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logging.error(f"Error processing ruling: {ruling_dict.get('id')}, error: {e}") | |
| session.commit() | |
| logging.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}") | |
| except Exception as e: | |
| logging.error(f"An error occurred during database operations: {e}") | |
| session.rollback() | |
| finally: | |
| session.close() | |
| logging.info("Database session closed.") | |
| logger.error(f"Error processing ruling: {ruling_dict.get('id')}, error: {e}") | |
| session.commit() | |
| logger.info(f"Successfully loaded {len(rulings_data)} rulings into {db_path}") | |
| except Exception as e: | |
| logger.error(f"An error occurred during database operations: {e}") | |
| session.rollback() | |
| finally: | |
| session.close() | |
| logger.info("Database session closed.") |
🧰 Tools
🪛 Ruff (0.11.9)
53-53: error() call on root logger
(LOG015)
56-56: info() call on root logger
(LOG015)
59-59: error() call on root logger
(LOG015)
63-63: info() call on root logger
(LOG015)
🤖 Prompt for AI Agents
In scripts/load_data_to_sqlite.py around lines 52 to 63, replace all calls to
the root logger (logging.error, logging.info) with calls to the module-specific
logger instance. Identify the logger created for the module (e.g., logger =
logging.getLogger(__name__)) and use it for all logging calls in this block to
ensure consistent and configurable logging.
| from fastapi import Depends, FastAPI, HTTPException | ||
| from sqlalchemy import create_engine | ||
| from sqlalchemy.orm import sessionmaker, Session | ||
| from src.abyssal_tome.models import Provenance, Ruling, RulingDB, Base | ||
| from typing import List, Generator |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Move all imports to the top of the file.
Move the Query import to the top with other FastAPI imports:
-from fastapi import Depends, FastAPI, HTTPException
+from fastapi import Depends, FastAPI, HTTPException, Query
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker, Session
from src.abyssal_tome.models import Provenance, Ruling, RulingDB, Base
from typing import List, GeneratorAnd remove line 30:
-from fastapi import QueryAlso applies to: 30-30
🤖 Prompt for AI Agents
In src/abyssal_tome/app.py around lines 3 to 7 and line 30, move the import of
Query from its current position at line 30 to join the other FastAPI imports at
the top of the file. Remove the separate import statement at line 30 after
relocating it. This consolidates all imports at the top for better organization.
| def get_rulings(skip: int = Query(0, ge=0), limit: int = Query(20, ge=0), db: Session = Depends(get_db)): | ||
| rulings = db.query(RulingDB).offset(skip).limit(limit).all() | ||
| return rulings | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Add return type annotation and fix line length.
@app.get("/rulings", response_model=List[Ruling])
-def get_rulings(skip: int = Query(0, ge=0), limit: int = Query(20, ge=0), db: Session = Depends(get_db)):
+def get_rulings(
+ skip: int = Query(0, ge=0),
+ limit: int = Query(20, ge=0),
+ db: Session = Depends(get_db)
+) -> List[Ruling]:
rulings = db.query(RulingDB).offset(skip).limit(limit).all()
return rulings📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_rulings(skip: int = Query(0, ge=0), limit: int = Query(20, ge=0), db: Session = Depends(get_db)): | |
| rulings = db.query(RulingDB).offset(skip).limit(limit).all() | |
| return rulings | |
| @app.get("/rulings", response_model=List[Ruling]) | |
| def get_rulings( | |
| skip: int = Query(0, ge=0), | |
| limit: int = Query(20, ge=0), | |
| db: Session = Depends(get_db) | |
| ) -> List[Ruling]: | |
| rulings = db.query(RulingDB).offset(skip).limit(limit).all() | |
| return rulings |
🧰 Tools
🪛 Ruff (0.11.9)
35-35: Missing return type annotation for public function get_rulings
(ANN201)
35-35: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
35-35: Line too long (105 > 100)
(E501)
🤖 Prompt for AI Agents
In src/abyssal_tome/app.py around lines 32 to 35, the function get_rulings lacks
a return type annotation and the line length may exceed style guidelines. Add an
appropriate return type annotation to the function signature, such as
List[RulingDB], and reformat the code to ensure no line exceeds the maximum
allowed length, possibly by breaking long lines or adjusting spacing.
| def search_rulings(query: str, skip: int = Query(0, ge=0), limit: int = Query(20, ge=0), db: Session = Depends(get_db)): | ||
| # A simple search implementation | ||
| search_query = f"%{query}%" | ||
| rulings = db.query(RulingDB).filter( | ||
| (RulingDB.question.like(search_query)) | | ||
| (RulingDB.answer.like(search_query)) | | ||
| (RulingDB.text.like(search_query)) | | ||
| (RulingDB.original_html_snippet.like(search_query)) | ||
| ).offset(skip).limit(limit).all() | ||
| return rulings |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Add return type annotation and fix line length.
@app.get("/search", response_model=List[Ruling])
-def search_rulings(query: str, skip: int = Query(0, ge=0), limit: int = Query(20, ge=0), db: Session = Depends(get_db)):
+def search_rulings(
+ query: str,
+ skip: int = Query(0, ge=0),
+ limit: int = Query(20, ge=0),
+ db: Session = Depends(get_db)
+) -> List[Ruling]:
# A simple search implementation
search_query = f"%{query}%"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def search_rulings(query: str, skip: int = Query(0, ge=0), limit: int = Query(20, ge=0), db: Session = Depends(get_db)): | |
| # A simple search implementation | |
| search_query = f"%{query}%" | |
| rulings = db.query(RulingDB).filter( | |
| (RulingDB.question.like(search_query)) | | |
| (RulingDB.answer.like(search_query)) | | |
| (RulingDB.text.like(search_query)) | | |
| (RulingDB.original_html_snippet.like(search_query)) | |
| ).offset(skip).limit(limit).all() | |
| return rulings | |
| @app.get("/search", response_model=List[Ruling]) | |
| def search_rulings( | |
| query: str, | |
| skip: int = Query(0, ge=0), | |
| limit: int = Query(20, ge=0), | |
| db: Session = Depends(get_db) | |
| ) -> List[Ruling]: | |
| # A simple search implementation | |
| search_query = f"%{query}%" | |
| rulings = db.query(RulingDB).filter( | |
| (RulingDB.question.like(search_query)) | | |
| (RulingDB.answer.like(search_query)) | | |
| (RulingDB.text.like(search_query)) | | |
| (RulingDB.original_html_snippet.like(search_query)) | |
| ).offset(skip).limit(limit).all() | |
| return rulings |
🧰 Tools
🪛 Ruff (0.11.9)
48-48: Missing return type annotation for public function search_rulings
(ANN201)
48-48: Line too long (120 > 100)
(E501)
48-48: Do not perform function call Depends in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
🤖 Prompt for AI Agents
In src/abyssal_tome/app.py around lines 44 to 53, the search_rulings function
lacks a return type annotation and some lines exceed recommended length limits.
Add an appropriate return type annotation to the function signature reflecting
the expected return type, such as a list of RulingDB objects. Also, refactor the
filter conditions to break long lines into shorter ones, improving readability
and adhering to line length standards.
|
|
||
| api_key = os.environ.get("OPENAI_API_KEY") | ||
| if not api_key: | ||
| logger.error("OPENAI_API_KEY environment variable not set. Please set it before running this script.") |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Fix line length violations.
Multiple lines exceed the 100-character limit as flagged by static analysis.
Consider breaking these long lines for better readability:
- logger.error("OPENAI_API_KEY environment variable not set. Please set it before running this script.")
+ logger.error(
+ "OPENAI_API_KEY environment variable not set. Please set it before running this script."
+ )
-def call_openai_api(prompt: str, model: str = os.environ.get("OPENAI_MODEL", "gpt-4o")) -> dict | None:
+def call_openai_api(
+ prompt: str, model: str = os.environ.get("OPENAI_MODEL", "gpt-4o")
+) -> dict | None:
- {"role": "system", "content": "You are a helpful assistant designed to output JSON."},
+ {
+ "role": "system",
+ "content": "You are a helpful assistant designed to output JSON."
+ },Apply similar formatting to other long lines for consistency.
Also applies to: 27-27, 35-35, 58-58, 61-61, 63-63, 142-142, 352-352, 387-387
🧰 Tools
🪛 Ruff (0.11.9)
22-22: Line too long (106 > 100)
(E501)
🤖 Prompt for AI Agents
In scripts/enrich_rulings_ai.py at line 22 and other specified lines (27, 35,
58, 61, 63, 142, 352, 387), some lines exceed the 100-character limit. To fix
this, break these long lines into multiple shorter lines using line continuation
or string concatenation to improve readability and comply with style guidelines.
Apply consistent formatting across all these lines.
| prompt = f""" | ||
| Given the following ruling text for the card with code '{source_card_code}', identify any other card codes mentioned in the text. | ||
| Card codes are 5-digit strings. | ||
| Return a JSON object with a single key "related_card_codes" containing a list of the identified card codes. | ||
| Do not include the source card code '{source_card_code}' in the list. | ||
|
|
||
| Ruling text: | ||
| --- | ||
| {ruling_text} | ||
| --- | ||
| """ |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Consider improving prompt structure for better AI responses.
The prompt could be more structured to ensure consistent JSON responses and handle edge cases better.
prompt = f"""
- Given the following ruling text for the card with code '{source_card_code}', identify any other card codes mentioned in the text.
- Card codes are 5-digit strings.
- Return a JSON object with a single key "related_card_codes" containing a list of the identified card codes.
- Do not include the source card code '{source_card_code}' in the list.
+ Analyze the following ruling text for the card '{source_card_code}' and identify any other card codes mentioned.
+
+ Rules:
+ - Card codes are exactly 5-digit strings (e.g., "01001", "02034")
+ - Look for patterns like [01001], card 01001, or just 01001
+ - Exclude the source card code '{source_card_code}'
+ - Return empty list if no other cards are found
+
+ Return JSON format: {{"related_card_codes": ["01001", "02034"]}}
Ruling text:
---
{ruling_text}
---
"""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| prompt = f""" | |
| Given the following ruling text for the card with code '{source_card_code}', identify any other card codes mentioned in the text. | |
| Card codes are 5-digit strings. | |
| Return a JSON object with a single key "related_card_codes" containing a list of the identified card codes. | |
| Do not include the source card code '{source_card_code}' in the list. | |
| Ruling text: | |
| --- | |
| {ruling_text} | |
| --- | |
| """ | |
| prompt = f""" | |
| Analyze the following ruling text for the card '{source_card_code}' and identify any other card codes mentioned. | |
| Rules: | |
| - Card codes are exactly 5-digit strings (e.g., "01001", "02034") | |
| - Look for patterns like [01001], card 01001, or just 01001 | |
| - Exclude the source card code '{source_card_code}' | |
| - Return empty list if no other cards are found | |
| Return JSON format: {{"related_card_codes": ["01001", "02034"]}} | |
| Ruling text: | |
| --- | |
| {ruling_text} | |
| --- | |
| """ |
🧰 Tools
🪛 Ruff (0.11.9)
61-61: Line too long (133 > 100)
(E501)
63-63: Line too long (111 > 100)
(E501)
🤖 Prompt for AI Agents
In scripts/enrich_rulings_ai.py around lines 59 to 69, the prompt for the AI
should be restructured to improve clarity and consistency of JSON responses.
Refine the prompt by explicitly instructing the AI to always return a valid JSON
object, even if no related card codes are found, and clarify the format and
constraints. Add examples or specify how to handle cases with no matches to
ensure robust parsing of the AI output.
| if isinstance(content, dict): | ||
| return content | ||
| return json.loads(content) |
There was a problem hiding this comment.
Fix potential JSON parsing logic error.
The content from OpenAI API is always a string that needs JSON parsing. The isinstance check for dict will never be true.
- content = response.choices[0].message.content
- if isinstance(content, dict):
- return content
- return json.loads(content)
+ content = response.choices[0].message.content
+ if content is None:
+ logger.error("Received null content from OpenAI API")
+ return None
+ return json.loads(content)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if isinstance(content, dict): | |
| return content | |
| return json.loads(content) | |
| content = response.choices[0].message.content | |
| if content is None: | |
| logger.error("Received null content from OpenAI API") | |
| return None | |
| return json.loads(content) |
🤖 Prompt for AI Agents
In scripts/enrich_rulings_ai.py around lines 40 to 42, remove the isinstance
check for dict since the content from the OpenAI API is always a string. Always
parse the content using json.loads without conditionally returning it as a dict.
…ped) This commit addresses a number of refactoring suggestions to improve the overall quality of the codebase. The rebase onto master was skipped due to timeouts in the environment. - Extracted duplicated Pydantic and SQLAlchemy models into a shared `src/abyssal_tome/models.py` module. - Improved `scripts/load_data_to_sqlite.py` with better error handling, logging, and session management. - Refactored `scripts/enrich_rulings_ai.py` to add error handling for the OpenAI API key, use a dedicated logger, and make the OpenAI model configurable. - Enhanced `src/abyssal_tome/app.py` with type annotations, a configurable database URL, endpoint validation, and improved search functionality. - Resolved a merge conflict in `scripts/enrich_rulings_ai.py`. - Added Pyright to the pre-commit configuration for static type checking.
This commit includes the initial work on enhancing the data pipeline. Due to environmental constraints (timeouts), I was unable to make significant progress.
scripts/load_data_to_sqlite.pyto load processed JSON data into a SQLite database.