Fix crawler results pagination in web UI#1294
Fix crawler results pagination in web UI#1294kwei-zhang wants to merge 2 commits intoOWASP:masterfrom
Conversation
Summary by CodeRabbit
WalkthroughAdjusted pagination handling and response parsing: database offset changed to zero-based calculation Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This pull request fixes two critical bugs in the crawler results pagination feature of the web UI:
- Wrong page count: The JavaScript code was calculating the total number of pages using the string length of the JSON response instead of the array length, causing it to show hundreds of phantom pages (e.g., "Page 1 of 500" when there were only 5 results).
- Duplicate results: The database pagination offset calculation was incorrect for 0-based page indexing, causing pages 1 and 2 to show the same results.
Changes:
- Fixed JavaScript pagination logic to parse JSON before counting items and calculate total pages using a heuristic approach based on the number of items returned
- Corrected database offset calculation from
(page * 10) - 10topage * 10to properly handle 0-based page indexing from the API
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| nettacker/web/static/js/main.js | Fixed pagination by parsing JSON response correctly and counting array items instead of string length; implemented heuristic for total pages calculation |
| nettacker/database/db.py | Corrected offset calculation for 0-based page indexing in both SQLite and SQLAlchemy query paths |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const totalPages = | ||
| itemCount < 10 | ||
| ? crawler_page | ||
| : Math.max(crawler_page, crawler_page + 1); |
There was a problem hiding this comment.
The pagination logic uses a heuristic approach: if fewer than 10 items are returned, it assumes this is the last page. However, this will show an unnecessary extra page when there are exactly N*10 items (e.g., if there are exactly 20 results, page 2 will show 10 items and indicate "Page 2 of 3", and clicking page 3 will show an empty page).
This is a common limitation when the API doesn't return a total count. Consider one of these improvements: (1) Have the API return the total count alongside results, or (2) Fetch one extra item (limit 11) and only display 10, using the 11th item's presence to determine if there's a next page. Option 2 would avoid the "empty last page" issue without requiring API changes.
nettacker/database/db.py
Outdated
| f"%{query}%", | ||
| f"%{query}%", | ||
| (page * 10) - 10, | ||
| page * 10, |
There was a problem hiding this comment.
The offset calculation has been changed from (page * 10) - 10 to page * 10. This is correct for the actual usage because the API endpoint at nettacker/api/engine.py:546-547 converts 1-based page numbers to 0-based before calling this function.
However, this breaks the existing unit tests in tests/database/test_db.py which call search_logs() directly with page=1 (1-based), expecting it to return the first page of results. After this change, page=1 will return the second page (offset=10). The tests at lines 1193, 1225, 1249, 1261, and 1271 will need to be updated to pass page=0 instead of page=1, or the test setup needs to account for the API's conversion logic.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
nettacker/web/static/js/main.js (1)
978-981:Math.max(crawler_page, crawler_page + 1)is alwayscrawler_page + 1— simplify.
Math.max(x, x + 1)unconditionally returnsx + 1, so theMath.maxcall is a no-op that adds noise. The whole expression reduces to a straightforward ternary:♻️ Suggested simplification
- const totalPages = - itemCount < 10 - ? crawler_page - : Math.max(crawler_page, crawler_page + 1); + const totalPages = itemCount < 10 ? crawler_page : crawler_page + 1;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nettacker/web/static/js/main.js` around lines 978 - 981, The ternary computing totalPages uses Math.max(crawler_page, crawler_page + 1) which is always crawler_page + 1; simplify the expression by replacing the Math.max call so totalPages is set to crawler_page when itemCount < 10 and to crawler_page + 1 otherwise (look for the const totalPages declaration and the variables itemCount and crawler_page).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nettacker/database/db.py`:
- Line 1046: search_logs currently computes the SQL offset as page * 10 which
treats page as 0-based and causes the API route go_for_search_logs (which passes
1-based page values) to skip the first results; update the offset calculation in
search_logs to use the same 1-based convention as select_reports and
last_host_logs by changing the offset expression to (page - 1) * 10 (ensure this
uses the existing page parameter and preserves current limit of 10).
---
Nitpick comments:
In `@nettacker/web/static/js/main.js`:
- Around line 978-981: The ternary computing totalPages uses
Math.max(crawler_page, crawler_page + 1) which is always crawler_page + 1;
simplify the expression by replacing the Math.max call so totalPages is set to
crawler_page when itemCount < 10 and to crawler_page + 1 otherwise (look for the
const totalPages declaration and the variables itemCount and crawler_page).
There was a problem hiding this comment.
🧹 Nitpick comments (1)
nettacker/api/engine.py (1)
544-554: Pagination fix is correct; consider aligning the same pattern in sibling endpoints.The three-part fix—catching both
TypeError/ValueError, defaulting to1, and clampingpage < 1—correctly ensures a 1-based page value is always forwarded tosearch_logs, which now computes offset as(page - 1) * 10.The analogous handlers
get_results()(Line 367) andget_last_host_logs()(Line 461) use a weaker guard pattern (if not page: page = 1then bareint(page)) that would raise an unhandledValueErroron a non-numericpageparameter and has no negative-value clamp. Consider harmonising them:♻️ Suggested alignment for sibling endpoints
`@app.route`("/results/get_list", methods=["GET"]) def get_results(): api_key_is_valid(app, flask_request) - page = get_value(flask_request, "page") - if not page: + try: + page = int(get_value(flask_request, "page")) + except (TypeError, ValueError): page = 1 - return jsonify(select_reports(int(page))), 200 + if page < 1: + page = 1 + return jsonify(select_reports(page)), 200`@app.route`("/logs/get_list", methods=["GET"]) def get_last_host_logs(): api_key_is_valid(app, flask_request) - page = get_value(flask_request, "page") - if not page: + try: + page = int(get_value(flask_request, "page")) + except (TypeError, ValueError): page = 1 - return jsonify(last_host_logs(int(page))), 200 + if page < 1: + page = 1 + return jsonify(last_host_logs(page)), 200🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nettacker/api/engine.py` around lines 544 - 554, The pagination parsing in search_logs was hardened; apply the same pattern to get_results and get_last_host_logs by replacing their current fragile page handling (where they do if not page: page = 1 and then int(page)) with the robust sequence: fetch page via get_value(flask_request, "page"), wrap int(...) in a try/except catching TypeError and ValueError to default page to 1, then clamp any page < 1 to 1 before passing it on (refer to the page parsing logic used around search_logs and the get_value call to locate the exact lines to change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@nettacker/api/engine.py`:
- Around line 544-554: The pagination parsing in search_logs was hardened; apply
the same pattern to get_results and get_last_host_logs by replacing their
current fragile page handling (where they do if not page: page = 1 and then
int(page)) with the robust sequence: fetch page via get_value(flask_request,
"page"), wrap int(...) in a try/except catching TypeError and ValueError to
default page to 1, then clamp any page < 1 to 1 before passing it on (refer to
the page parsing logic used around search_logs and the get_value call to locate
the exact lines to change).
|
LGTM! You might have to resolve the test_db.py case as copilot suggested. |
Proposed change
Problem:
Your PR description goes here.
Type of change
Solve:
Search_logs()in db.py used offset (page * 10) - 10 (1-based), while the API sends a 0-based page. So page 1 and 2 both used offset 0 and returned the same rows.Checklist
make pre-commit, it didn't generate any changesmake test, all tests passed locally