Skip to content

Comments

Fix crawler results pagination in web UI#1294

Open
kwei-zhang wants to merge 2 commits intoOWASP:masterfrom
kwei-zhang:master
Open

Fix crawler results pagination in web UI#1294
kwei-zhang wants to merge 2 commits intoOWASP:masterfrom
kwei-zhang:master

Conversation

@kwei-zhang
Copy link

Proposed change

Problem:

  1. Wrong number of pages: Crawler results showed multiple pages (e.g. 6) even when everything fit on one page.
  2. Same results on every page: Page 1 and page 2 showed the same results.
image

Your PR description goes here.

Type of change

  • New core framework functionality
  • Bugfix (non-breaking change which fixes an issue)
  • Code refactoring without any functionality changes
  • New or existing module/payload change
  • Documentation/localization improvement
  • Test coverage improvement
  • Dependency upgrade
  • Other improvement (best practice, cleanup, optimization, etc)

Solve:

  1. This is a simple fix, pagination used res.length on the raw response string, so the “length” was the JSON character count, not the number of results.
  2. Search_logs() in db.py used offset (page * 10) - 10 (1-based), while the API sends a 0-based page. So page 1 and 2 both used offset 0 and returned the same rows.
image

Checklist

  • I've followed the [contributing guidelines][contributing-guidelines]
  • I've run make pre-commit, it didn't generate any changes
  • I've run make test, all tests passed locally

Copilot AI review requested due to automatic review settings February 21, 2026 11:43
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 21, 2026

Summary by CodeRabbit

  • Bug Fixes
    • Corrected backend pagination offset calculation to consistently use zero-based offsets for pages.
    • Enforced 1-based, non-negative page parsing in the API to prevent invalid page values.
    • Made client-side response handling more robust (accepts string or JSON) and adjusted pagination to use actual returned item counts.

Walkthrough

Adjusted pagination handling and response parsing: database offset changed to zero-based calculation (page - 1) * 10; API page parsing now defaults to and enforces 1-based pages (no decrement); frontend added string-response guards, normalizes data before pagination, and updated totalPages/itemCount logic.

Changes

Cohort / File(s) Summary
Database Pagination
nettacker/database/db.py
Changed pagination offset calculation in search_logs from (page * 10) - 10 to (page - 1) * 10 in both SQL and ORM paths.
API Page Parsing
nettacker/api/engine.py
Changed go_for_search_logs page handling: parsing errors now default to 1, and pages < 1 are clamped to 1 (removed prior decrement behavior).
Frontend response & pagination
nettacker/web/static/js/main.js
Added type guards to parse string responses as JSON, normalized data before use, switched pagination to compute itemCount and totalPages from normalized data, and updated calls to show_crawler(data).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly relates to the main objective: fixing pagination issues in the crawler results web UI.
Description check ✅ Passed The description clearly explains the pagination problems, root causes, and solutions implemented across three files, directly relating to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes two critical bugs in the crawler results pagination feature of the web UI:

  1. Wrong page count: The JavaScript code was calculating the total number of pages using the string length of the JSON response instead of the array length, causing it to show hundreds of phantom pages (e.g., "Page 1 of 500" when there were only 5 results).
  2. Duplicate results: The database pagination offset calculation was incorrect for 0-based page indexing, causing pages 1 and 2 to show the same results.

Changes:

  • Fixed JavaScript pagination logic to parse JSON before counting items and calculate total pages using a heuristic approach based on the number of items returned
  • Corrected database offset calculation from (page * 10) - 10 to page * 10 to properly handle 0-based page indexing from the API

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
nettacker/web/static/js/main.js Fixed pagination by parsing JSON response correctly and counting array items instead of string length; implemented heuristic for total pages calculation
nettacker/database/db.py Corrected offset calculation for 0-based page indexing in both SQLite and SQLAlchemy query paths

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +978 to +981
const totalPages =
itemCount < 10
? crawler_page
: Math.max(crawler_page, crawler_page + 1);
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pagination logic uses a heuristic approach: if fewer than 10 items are returned, it assumes this is the last page. However, this will show an unnecessary extra page when there are exactly N*10 items (e.g., if there are exactly 20 results, page 2 will show 10 items and indicate "Page 2 of 3", and clicking page 3 will show an empty page).

This is a common limitation when the API doesn't return a total count. Consider one of these improvements: (1) Have the API return the total count alongside results, or (2) Fetch one extra item (limit 11) and only display 10, using the 11th item's presence to determine if there's a next page. Option 2 would avoid the "empty last page" issue without requiring API changes.

Copilot uses AI. Check for mistakes.
f"%{query}%",
f"%{query}%",
(page * 10) - 10,
page * 10,
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offset calculation has been changed from (page * 10) - 10 to page * 10. This is correct for the actual usage because the API endpoint at nettacker/api/engine.py:546-547 converts 1-based page numbers to 0-based before calling this function.

However, this breaks the existing unit tests in tests/database/test_db.py which call search_logs() directly with page=1 (1-based), expecting it to return the first page of results. After this change, page=1 will return the second page (offset=10). The tests at lines 1193, 1225, 1249, 1261, and 1271 will need to be updated to pass page=0 instead of page=1, or the test setup needs to account for the API's conversion logic.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
nettacker/web/static/js/main.js (1)

978-981: Math.max(crawler_page, crawler_page + 1) is always crawler_page + 1 — simplify.

Math.max(x, x + 1) unconditionally returns x + 1, so the Math.max call is a no-op that adds noise. The whole expression reduces to a straightforward ternary:

♻️ Suggested simplification
-      const totalPages =
-        itemCount < 10
-          ? crawler_page
-          : Math.max(crawler_page, crawler_page + 1);
+      const totalPages = itemCount < 10 ? crawler_page : crawler_page + 1;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nettacker/web/static/js/main.js` around lines 978 - 981, The ternary
computing totalPages uses Math.max(crawler_page, crawler_page + 1) which is
always crawler_page + 1; simplify the expression by replacing the Math.max call
so totalPages is set to crawler_page when itemCount < 10 and to crawler_page + 1
otherwise (look for the const totalPages declaration and the variables itemCount
and crawler_page).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nettacker/database/db.py`:
- Line 1046: search_logs currently computes the SQL offset as page * 10 which
treats page as 0-based and causes the API route go_for_search_logs (which passes
1-based page values) to skip the first results; update the offset calculation in
search_logs to use the same 1-based convention as select_reports and
last_host_logs by changing the offset expression to (page - 1) * 10 (ensure this
uses the existing page parameter and preserves current limit of 10).

---

Nitpick comments:
In `@nettacker/web/static/js/main.js`:
- Around line 978-981: The ternary computing totalPages uses
Math.max(crawler_page, crawler_page + 1) which is always crawler_page + 1;
simplify the expression by replacing the Math.max call so totalPages is set to
crawler_page when itemCount < 10 and to crawler_page + 1 otherwise (look for the
const totalPages declaration and the variables itemCount and crawler_page).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
nettacker/api/engine.py (1)

544-554: Pagination fix is correct; consider aligning the same pattern in sibling endpoints.

The three-part fix—catching both TypeError/ValueError, defaulting to 1, and clamping page < 1—correctly ensures a 1-based page value is always forwarded to search_logs, which now computes offset as (page - 1) * 10.

The analogous handlers get_results() (Line 367) and get_last_host_logs() (Line 461) use a weaker guard pattern (if not page: page = 1 then bare int(page)) that would raise an unhandled ValueError on a non-numeric page parameter and has no negative-value clamp. Consider harmonising them:

♻️ Suggested alignment for sibling endpoints
 `@app.route`("/results/get_list", methods=["GET"])
 def get_results():
     api_key_is_valid(app, flask_request)
-    page = get_value(flask_request, "page")
-    if not page:
+    try:
+        page = int(get_value(flask_request, "page"))
+    except (TypeError, ValueError):
         page = 1
-    return jsonify(select_reports(int(page))), 200
+    if page < 1:
+        page = 1
+    return jsonify(select_reports(page)), 200
 `@app.route`("/logs/get_list", methods=["GET"])
 def get_last_host_logs():
     api_key_is_valid(app, flask_request)
-    page = get_value(flask_request, "page")
-    if not page:
+    try:
+        page = int(get_value(flask_request, "page"))
+    except (TypeError, ValueError):
         page = 1
-    return jsonify(last_host_logs(int(page))), 200
+    if page < 1:
+        page = 1
+    return jsonify(last_host_logs(page)), 200
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nettacker/api/engine.py` around lines 544 - 554, The pagination parsing in
search_logs was hardened; apply the same pattern to get_results and
get_last_host_logs by replacing their current fragile page handling (where they
do if not page: page = 1 and then int(page)) with the robust sequence: fetch
page via get_value(flask_request, "page"), wrap int(...) in a try/except
catching TypeError and ValueError to default page to 1, then clamp any page < 1
to 1 before passing it on (refer to the page parsing logic used around
search_logs and the get_value call to locate the exact lines to change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@nettacker/api/engine.py`:
- Around line 544-554: The pagination parsing in search_logs was hardened; apply
the same pattern to get_results and get_last_host_logs by replacing their
current fragile page handling (where they do if not page: page = 1 and then
int(page)) with the robust sequence: fetch page via get_value(flask_request,
"page"), wrap int(...) in a try/except catching TypeError and ValueError to
default page to 1, then clamp any page < 1 to 1 before passing it on (refer to
the page parsing logic used around search_logs and the get_value call to locate
the exact lines to change).

@pUrGe12
Copy link
Contributor

pUrGe12 commented Feb 21, 2026

LGTM! You might have to resolve the test_db.py case as copilot suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants