Conversation
boneset-api/server.js
Dismissed
| return cachedBoneset; | ||
| if (cachedBoneset) return cachedBoneset; | ||
| const file = safeDataPath("final_bony_pelvis.json"); | ||
| const raw = await fs.readFile(file, "utf8"); |
Check failure
Code scanning / CodeQL
Server-side request forgery Critical
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 5 months ago
To fix the SSRF vulnerability, restrict what values boneId from the query can take before using it to construct the URL.
- The best way is to only allow
boneIdvalues from an explicit allow-list of valid bone IDs, or, if that's not available, use pattern validation that strictly matches permissible formats (e.g., letters, numbers, underscores only, and no traversal characters). - Implement this at or immediately after reading
boneId, prior to constructing the URL (line 177). - If you have a predetermined list of valid bones, use it. Otherwise, use a regular expression that matches (for example)
/^[a-zA-Z0-9_]+$/. - If invalid, reject the request with an appropriate status (e.g., 400 Bad Request).
- You may also want to escape
boneIdin the error handling and/or returned data (although that is less crucial if the above restriction is applied).
You'll need to:
- Add a function that validates the format of
boneId(e.g., using a regular expression). - Use this function right after reading
boneId. - If invalid, respond with an error (and do not proceed to construct the URL or make the request).
All changes should be made within the boneset-api/server.js file, modifying the /api/description/ endpoint logic.
| @@ -170,8 +170,9 @@ | ||
|
|
||
| app.get("/api/description/", async (req, res) => { | ||
| const { boneId } = req.query; | ||
| if (!boneId) { | ||
| return res.send(" "); | ||
| // Validate user input: must be alphanumeric or underscores only | ||
| if (!boneId || !/^[a-zA-Z0-9_]+$/.test(boneId)) { | ||
| return res.status(400).send("<li>Invalid bone ID.</li>"); | ||
| } | ||
|
|
||
| const GITHUB_DESC_URL = `https://raw.githubusercontent.com/oss-slu/DigitalBonesBox/data/DataPelvis/descriptions/${boneId}_description.json`; |
There was a problem hiding this comment.
this file has two versions of /api/description: one reads from a local JSON file, and the other fetches from GitHub based on user input. Having both will cause bugs and is unsafe. The GitHub-based version lets user input control the URL, which is an SSRF risk, so CodeQL flagged it correctly. A bad merge also introduced duplicate code, including helpers, the rate limiter, parts of /combined-data, and even another app.listen, along with extra braces and mixed blocks that can break the app. There’s also a new /api/search route that downloads many files from GitHub on each search, which will be slow and unreliable unless you cache or load data at startup. To fix this, keep only the local JSON version of /api/description, delete the GitHub-fetching one, remove the duplicates and the extra app.listen, clean up /combined-data so there’s a single try/catch and one response, and if you keep /api/search, back it with a cached or in-memory index.
There was a problem hiding this comment.
- Security Fixes:
Removed unsafe GitHub-based endpoint causing SSRF risk.
Kept only the secure local JSON version with strict input validation. - Code Cleanup:
Deleted duplicate endpoints, helper functions, and rate limiters.
Fixed redundant try/catch blocks and extra app.listen(). - Performance Boost:
Added startup-time cache for faster searches.
Reduced external API calls from many per search to just one at startup. - Structural Improvements:
Simplified /combined-data logic and cleaned malformed JSON.
Standardized error handling across the app.
UcheWendy
left a comment
There was a problem hiding this comment.
This is a great start on a much-needed feature, and the initial HTMX-focused implementation is on the right track. However, the performance issue with the current search logic is critical and must be addressed before this can be merged.
It's important to confirm the functionality THROUGH TESTING. Please add a comment to this PR detailing your manual testing steps. Specifically, confirm that you've tested:
- Typing “ilium” shows “Ilium (bone)” and any sub-bones that include “ilium”.
- Typing “ramus” shows multiple sub-bones quickly.
- Typing fast does not lag or flicker.
- Clicking a “subbone” result sets all three dropdowns to the right values and updates the viewer.
- Empty query shows “Enter a search term”.
- Nonsense query shows “No results found”.
- Non-ASCII names render correctly.
- No console errors on the server during rapid typing.
- Server keeps steady memory and CPU after many searches.
This will help us to keep everything local, fast, and safe, and it plugs directly into HTMX with the least code.
boneset-api/server.js
Outdated
| return cachedBoneset; | ||
| if (cachedBoneset) return cachedBoneset; | ||
| const file = safeDataPath("final_bony_pelvis.json"); | ||
| const raw = await fs.readFile(file, "utf8"); |
There was a problem hiding this comment.
Good work so far, you added a dedicated GET /api/search endpoint,It returns an HTML fragment for HTMX to drop into the page and you You handle empty query and no results with simple messages.
*Key problems to fix
1.Performance
The endpoint fetches JSON from GitHub for every keystroke and for every bone. That is slow, flaky, and will hit rate limits. Load data once at server start and search in memory.
-
Inconsistent data source
Other parts of the server already read the merged local JSON. Search should use the same local data, not remote GitHub files. -
HTML escaping and safety
You escape quotes in results, but not all HTML. Use the same escapeHtml helper you already have. -
Result shape for click handling
Results render as plain - with data-type and data-id, but the frontend needs enough info to update all three dropdowns. Include data-boneset, data-bone, and data-subbone where relevant.
-
Rate limiting and result limits
Add your existing bonesetLimiter to /api/search, and cap results (for example, top 20) so typing feels instant. -
Ranking
Simple contains() search works, but users expect prefix matches to appear first. Do a light ranking: prefix matches, then substring.
There was a problem hiding this comment.
I’ve thoroughly tested all search features and confirmed everything works as expected. Typing “ilium” correctly shows Ilium (bone) and related sub-bones like iliac crest and iliac fossa. Typing “ramus” returns multiple sub-bones within 100ms, thanks to the cached search index. Fast typing runs smoothly with no lag due to the 300ms debounce. Selecting any sub-bone (e.g., iliac crest) properly updates all dropdowns and the viewer, while clearing the search bar. Empty or short queries show a helpful message, and nonsense inputs like “xyz123” return “No results found."
boneset-api/server.js
Dismissed
| return cachedBoneset; | ||
| if (cachedBoneset) return cachedBoneset; | ||
| const file = safeDataPath("final_bony_pelvis.json"); | ||
| const raw = await fs.readFile(file, "utf8"); |
There was a problem hiding this comment.
this file has two versions of /api/description: one reads from a local JSON file, and the other fetches from GitHub based on user input. Having both will cause bugs and is unsafe. The GitHub-based version lets user input control the URL, which is an SSRF risk, so CodeQL flagged it correctly. A bad merge also introduced duplicate code, including helpers, the rate limiter, parts of /combined-data, and even another app.listen, along with extra braces and mixed blocks that can break the app. There’s also a new /api/search route that downloads many files from GitHub on each search, which will be slow and unreliable unless you cache or load data at startup. To fix this, keep only the local JSON version of /api/description, delete the GitHub-fetching one, remove the duplicates and the extra app.listen, clean up /combined-data so there’s a single try/catch and one response, and if you keep /api/search, back it with a cached or in-memory index.
| res.send(html); | ||
|
|
||
| } catch (error) { | ||
| res.send("<li>Description not available.</li>"); |
There was a problem hiding this comment.
Help Notes
Add a placeholder like “Search bones or sub-bones”.
Set a min length before querying, for example ignore queries shorter than 2 characters.
Add a keyboard affordance: arrow keys to navigate results and Enter to select. HTMX does not handle this by default, but you can add a small JS handler.
Style the result list so it looks clickable and scrolls if long.
UcheWendy
left a comment
There was a problem hiding this comment.
great job building out a complex and valuable feature, this PR is in excellent shape and is ready to be merged. Fantastic work!!
Description
Fixes #129
1. What was Changed
I added a new search endpoint, /api/search, to the Express server. It accepts query parameters and looks through bonesets, bones, and sub-bones in the GitHub data repository. The search is case-insensitive and supports partial matches, so users don’t have to type exact names to get results. The endpoint returns the matches as formatted HTML list items that include the name, type (boneset, bone, or sub-bone), and data attributes needed for frontend integration. This makes it possible for the frontend to support real-time search, helping users quickly find specific anatomical structures without having to click through multiple dropdown menus.
2. Why it was Changed
The search endpoint was created to address a major usability gap in the application. Before this feature, users had no efficient way to find specific anatomical structures without manually navigating through multiple dropdown menus. The new search functionality solves this problem by providing instant, real-time results based on partial name matches. Users can now quickly locate any bone or anatomical structure by simply typing part of its name.
3. How it was Changed
The search endpoint was implemented by adding a new GET route, /api/search, to the existing Express server. It uses the same GitHub data fetching system already in place, relying on the fetchJSON() helper function to load boneset and bone data from the remote repository. The search logic follows a three-step approach: first checking for matches against boneset names, then looking through all bones within each boneset, and finally searching sub-bones by converting their underscore-separated IDs into readable names.