Skip to content

issue127-skull-boneset#133

Closed
Taktar wants to merge 5 commits intomainfrom
issue127-skull-boneset
Closed

issue127-skull-boneset#133
Taktar wants to merge 5 commits intomainfrom
issue127-skull-boneset

Conversation

@Taktar
Copy link
Collaborator

@Taktar Taktar commented Sep 29, 2025

Description

Partially addresses #127 – Content Pipeline: Process and Integrate a New Boneset (Skull).
This PR sets up the Skull extraction pipeline and checks in the raw, per-slide outputs we’ll use to build the final skull.json. It adds the folder scaffolding, unzips the PPTX into XML, and runs our extraction scripts to produce images plus JSON annotations/descriptions.

Dependencies: Python 3 stdlib only (xml.etree, json, os). No new npm deps.


1. What was Changed

  • Created data_extraction/skull/{ppt,annotations,images}/.
  • Added the Skull deck locally and unzipped to data_extraction/skull/ppt/unzipped/….
  • Ran/parameterized:
    • data_extraction/extract_ppt_annotations.py → per-slide *_annotations.json + extracted slide images.
    • data_extraction/Extract_Bone_Descriptions.py → per-slide *_Descriptions.json.
  • Outputs committed:
    • 32 *_annotations.json files (slide 1 has no image/labels).
    • 33 *_Descriptions.json files (one per slide).
    • Extracted slide images under data_extraction/skull/images/slide*/….
  • Kept the PPTX and unzipped XML out of git history (local only).

2. Why it was Changed

Kick off the content pipeline for the Skull boneset and produce clean, machine-readable labels and descriptions. This unblocks creation of a single skull.json and later API/UI integration alongside Bony Pelvis.


3. How it was Changed

  • Staging & unzip
    • Copied Bone Box (Skull).pptxdata_extraction/skull/ppt/skull.pptx.
    • Unzipped to data_extraction/skull/ppt/unzipped/ to parse ppt/slides/*.xml, relationships, and media.
  • Image/annotation extraction (extract_ppt_annotations.py)
    • Configured paths:
      • slides: …/ppt/unzipped/ppt/slides
      • rels: …/ppt/unzipped/ppt/slides/_rels
      • media: …/ppt/unzipped/ppt/media
      • out: data_extraction/skull/images & data_extraction/skull/annotations
    • For each slide, resolved r:embed → media, wrote images to images/slideN/…, saved slideN_annotations.json.
    • Slide 1 has no image/labels — script logs No images found… Skipping.
  • Description extraction (Extract_Bone_Descriptions.py)
    • Parses each slide’s shapes, locates the right-panel text box by x/y/width/height, collects bullet text.
    • First token becomes bone name; remaining tokens become the description list.
    • Emits { id, name, description: [...] } to slideN_Descriptions.json.
  • Verification
    • Counts:
      • *_annotations.json: 32
      • *_Descriptions.json: 33
    • Spot-checked files (e.g., slide3_annotations.json) for sensible label coordinates/text.
  • Notes / challenges
    • Slide 1 is a title slide; handled gracefully.
    • Position heuristics tuned to this deck; can be adjusted if later slides differ.

4. Screenshots (if applicable)

N/A (no UI/API changes in this PR). Folder tree and terminal logs available upon request.


Next steps (tracked in #127)

  • Merge per-slide outputs into a single skull.json with full hierarchy (bones, sub-bones, descriptions, annotations, image paths).
  • Add a GitHub-hosted data directory for Skull; update backend server.js to load from the new raw URL.
  • Update UI dropdown to include “Skull” alongside “Bony Pelvis.”

@Taktar Taktar requested a review from UcheWendy as a code owner September 29, 2025 05:07
@Taktar
Copy link
Collaborator Author

Taktar commented Sep 29, 2025

Screenshot 2025-09-29 at 12 48 11 AM Screenshot 2025-09-29 at 12 48 22 AM Screenshot 2025-09-29 at 12 49 02 AM

@Taktar Taktar self-assigned this Sep 29, 2025
@Taktar
Copy link
Collaborator Author

Taktar commented Sep 29, 2025

Screenshot 2025-09-29 at 1 36 40 AM got the descriptions to pop up for skull

@Taktar
Copy link
Collaborator Author

Taktar commented Oct 1, 2025

Screenshot 2025-09-30 at 11 02 00 PM

Copy link
Collaborator

@UcheWendy UcheWendy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue asks for a new GitHub directory for Skull (e.g., DataSkull) with skull.json and images accessible via raw GitHub URL. In this PR, Skull data lives under boneset-api/data/final_skull.json. I don’t see a DataSkull/ folder or a raw-URL fetch path added. This misses the “store in GitHub for raw access” requirement and makes prod dependent on bundling data in the server image.
Lets maintain a consistant image paths.Because Skull is served from a local JSON file, the image paths inside final_skull.json must be valid from the app’s point of view (either local static paths you serve, or raw GitHub URLs). The PR doesn’t show that wiring (static express folder or new GitHub paths). Verify these resolve in the UI.

Tips to note
To fix this, clean /combined-data by replacing the handler with a single flow: load the pelvis from the existing GitHub JSONs, push its boneset/bones/subbones, load the skull with await loadLocalBoneset("skull"), push its boneset/bones/subbones, and return { bonesets, bones, subbones } once; this removes the duplicate bonesets declaration and the nested “fetch pelvis inside a pelvis loop” bug.
Next, unify the cache layer by keeping bonesetCache = new Map() with loadLocalBoneset(id) and deleting the old cachedBoneset and loadBoneset() so all local reads go through the new function. Fix the /api/description duplication by removing the second for (const line of lines) so descriptions render once, and keep validation on boneId and ALLOWED_BONESETS.has(bonesetId).

To meet the “DataSkull” acceptance criteria, create DataSkull/ in the repo (parallel to DataPelvis/), commit boneset/skull.json and all images there, and add Skull GitHub base URLs like:
---> const SKULL_REPO = "https://raw.githubusercontent.com/oss-slu/DigitalBonesBox/data/DataSkull/";
---> const SKULL_JSON_URL = ${SKULL_REPO}boneset/skull.json; ---> const SKULL_BONES_DIR = ${SKULL_REPO}bones/`;.

For consistency, have /combined-data pull both pelvis and skull from raw GitHub URLs. Finally, verify the dropdown by confirming /combined-data returns bonesetscontaining both{ id: "bony_pelvis" } and { id: "skull" }`, so the UI shows both options.


// Unchanged: used by the dropdowns in the current UI
// Unchanged pelvis aggregation + add Skull from local final_skull.json
app.get("/combined-data", async (_req, res) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The route re-declares bonesets and mixes old pelvis logic with the new flow. You initialize bonesets twice and start a loop on bonesetData.bones while also fetching pelvis again inside that loop. This will either throw or produce wrong results. Clean this to one clear flow: build pelvis, then skull, then return


try {
const set = await loadBoneset();
const set = await loadLocalBoneset(bonesetId); // <- IMPORTANT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You introduce a bonesetCache and a new loadLocalBoneset(id) while also keeping the old single-file cache fields (cachedBoneset, old loadBoneset() pattern). Pick one approach and remove the leftovers so readers aren’t confused.

// Safe path + allowlist + rate limit to fetch the full local JSON
app.get("/api/boneset/:bonesetId", bonesetLimiter, async (req, res) => {
const { bonesetId } = req.params;

Copy link
Collaborator

@UcheWendy UcheWendy Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/api/description renders lines twice, The code appends the description lines in a loop, then immediately appends them again in a second loop. You’ll get duplicated (li) items.

@Taktar
Copy link
Collaborator Author

Taktar commented Oct 1, 2025

Screenshot 2025-10-01 at 12 15 04 AM Screenshot 2025-10-01 at 12 17 43 AM

@Taktar
Copy link
Collaborator Author

Taktar commented Oct 3, 2025

Screenshot 2025-10-02 at 11 31 24 PM images still not loading

@Taktar
Copy link
Collaborator Author

Taktar commented Oct 3, 2025

update @UcheWendy

Done (per acceptance criteria & review):

  • Created DataSkull/ in the repo with boneset/skull.json and all images checked in.
  • Updated skull.json image_url fields to Raw GitHub URLs (currently pointing to the feature branch: https://raw.githubusercontent.com/oss-slu/DigitalBonesBox/issue127-skull-boneset/DataSkull/...).
  • Server updated:
    /combined-data now merges Pelvis (DataPelvis) + Skull (DataSkull) from raw GitHub.
  • Branch-aware skull source via SKULL_BRANCH/SKULL_REPO env (so this PR works before data lands on data).
  • /api/description serves skull from GitHub and pelvis from local file (as discussed).
  • UI: dropdown now shows Bony Pelvis and Skull options. that went away during the process but is back.

Current blocker I’m fixing: images not rendering

  • Skull images are reachable on the web (HTTP 200 from Raw GitHub), but the browser isn’t showing them because our front-end helper prefixes even absolute URLs with http://127.0.0.1:8000.
  • Pelvis images still point to local /images/... paths which we’re not serving yet (or they’re not present), so those show 404s.

@Taktar Taktar requested a review from UcheWendy October 3, 2025 04:48
@UcheWendy UcheWendy closed this Oct 7, 2025
@UcheWendy
Copy link
Collaborator

Scope too large

@UcheWendy UcheWendy deleted the issue127-skull-boneset branch December 8, 2025 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants