Server should Accept Recipe JSON by ascibisz · Pull Request #419 · mesoscope/cellpack

ascibisz · 2025-10-22T23:29:32Z

Problem

Currently, when a user packs an edited recipe via the cellpack client, we upload the edited recipe to firebase, pass that reference to the server, and then the server retrieves the recipe from firebase. To minimize firebase calls and improve efficiency of the client, we're changing that flow to instead send the edited recipe JSON as the body of the packing request to the server.

We also wanted a way to check if we have already packed a recipe and if we have, return that result file rather than running the whole packing again. To do this, we are calculating a hash for the recipe objects before they are packed. The hash for each recipe will be uploaded to firebase along with its result files, so we can query firebase for a given hash to see if that recipe has a packing result already.

Key Server Improvements

1. Deduplication & Caching

BEFORE: Each request generated a unique UUID, no deduplication possible
AFTER: JSON recipes generate deterministic hash, enabling job deduplication

2. Input Flexibility & Backwards Compatibility

BEFORE: Only recipe file paths supported via query parameter
AFTER: Supports both recipe file paths AND direct JSON recipe objects in request body, plus optional config parameter

3. Smart Job Management

BEFORE: Generated UUID for each job without deduplication, every request creates new job regardless of content
AFTER: Uses deterministic hash for JSON recipes, enabling job reuse for identical recipes

4. Firebase Request Reduction

BEFORE: Every edited recipe was uploaded to firebase by the client and downloaded from firebase by the server
AFTER: Edited recipes are passed in the body of the packing request, so no firebase uploads or downloads occur

5: Unified Results Upload

BEFORE: Simularium result file was uploaded to S3 twice per job, once on its own and once as part of the full output files upload
AFTER: Only upload Simularium result file once by keeping track of its path when we upload all output files

Technical Implementation

New Server Components:

DataDoc.generate_hash() - Creates deterministic hash from recipe JSON
job_exists() - Checks if job already completed in Firebase
Enhanced request handling - Reads JSON from request body
Smart job ID generation - Uses hash for JSON recipes, UUID for file paths

Request Flow Changes:

Input validation now checks both query params and request body
Hash-based deduplication for JSON recipes
Backward compatibility maintained for file-based recipes
Consistent job tracking with hash parameter

Benefits

Reduced Server Load: Identical recipes don't reprocess
Faster Client Response: Instant return for duplicate JSON requests
Better Resource Utilization: No redundant compute for same recipes
Improved API Design: JSON recipes easier for programmatic access
Reduced Firebase Usage: Passing recipe directly instead of uploading to firebase

CellPACK Server Job Workflow Changes

BEFORE: Original Server Workflow

graph TD
    A[Client Request] --> B[POST /start-packing]
    B --> C{Check for recipe URL param}
    C -->|Missing| D[Return 400 Error]
    C -->|Present| E[Generate UUID for job_id]
    E --> F[Create Background Task]
    F --> G[Return job_id immediately]
    F --> I[Initiate packing]
    I --> J[Load recipe from firebase<br>using file path from<br>URL param]
    J --> K[Execute packing]
    K --> L{Packing succeeds?}
    L -->|Success| M[S3: Upload outputs to S3<br>Firebase: Update job status to SUCCEEDED]
    L -->|Failure| N[Firebase: Update job status to FAILED]
    
    style A fill:#e1f5fe
    style G fill:#c8e6c9
    style M fill:#fff3e0
    style N fill:#ffcdd2

AFTER: Enhanced Server Workflow with JSON Recipe Support

graph TD
    A[Client Request] --> B[POST /start-packing]
    B --> C{Check inputs}
    C -->|No recipe - no URL param<br>and no request body| D[Return 400 Error]
    C -->|Has recipe path URL param| E[Generate UUID for job_id]
    C -->|Has recipe JSON in request body| F[Generate hash from JSON]
    F --> G{Packing result exists<br>in firebase for this hash?}
    G -->|Yes| H[Return existing hash<br>as job_id]
    G -->|No| I[Use hash as job_id]
    E --> J[Create Background Task]
    I --> J
    J --> K[Return job_id immediately]
    J --> L[Initiate packing]
    L --> M{Input type?}
    M -->|Recipe path| N[Load recipe from firebase<br>using file path from<br>URL param]
    M -->|JSON body| O[Load recipe from JSON dict<br>from request body]
    N --> P[Execute packing]
    O --> P
    P --> Q{Packing succeeds?}
    Q -->|Success| R[S3: Upload outputs to S3<br>Firebase: Update job status to SUCCEEDED]
    Q -->|Failure| S[Firebase: Update job status to FAILED]
    
    style A fill:#e1f5fe
    style K fill:#c8e6c9
    style R fill:#fff3e0
    style S fill:#ffcdd2
    style G fill:#ffeb3b
    style H fill:#c8e6c9

ascibisz · 2025-10-23T21:52:24Z

docs/DOCKER.md

-3. Try hitting the test endpoint on the server, by navigating to `http://0.0.0.0:8443/hello` in your browser.
-4. Try running a packing on the server, by hitting the `http://0.0.0.0:80/pack?recipe=firebase:recipes/one_sphere_v_1.0.0` in your browser.
+3. Try hitting the test endpoint on the server, by navigating to `http://0.0.0.0:80/hello` in your browser.
+4. Try running a packing on the server, by hitting the `http://0.0.0.0:80/start-packing?recipe=firebase:recipes/one_sphere_v_1.0.0` in your browser.


These instructions were just slightly incorrect, this has nothing to do with the other code changes, I just ran into it when testing my code and I wanted to fix it

github-actions · 2025-10-27T23:48:49Z

Packing analysis report

Analysis for packing results located at cellpack/tests/outputs/test_spheres/spheresSST

Ingredient name	Encapsulating radius	Average number packed
ext_A	25	236.0

Packing image

Distance analysis

Expected minimum distance: 50.00
Actual minimum distance: 50.01

Ingredient key	Pairwise distance distribution
ext_A

github-actions · 2026-01-09T19:41:46Z

Packing analysis report

Analysis for packing results located at cellpack/tests/outputs/test_spheres/spheresSST

Ingredient name	Encapsulating radius	Average number packed
ext_A	25	236.0

Packing image

Distance analysis

Expected minimum distance: 50.00
Actual minimum distance: 50.01

Ingredient key	Pairwise distance distribution
ext_A

* remove os fetch for job_id * use dedup_hash instead of job id * proposal: get hash from recipe loader * renaming and add TODOs * format * rename param to hash * remove unused validate param and doc strings in pack * simplify get_ dedup_hash * refactor job_status update * cleanup * fix upload_job_status to handle awshandler * pass dedup_pash to env for fetching across files * add tests * format1 * format test

* proposal: get hash from recipe loader * simplify get_ dedup_hash * only post simularium results file once for server job runs * update code for rebase * code cleanup --------- Co-authored-by: Ruge Li <rugeli0605@gmail.com>

* remove local metadata writes for auto-pop feature * remove cleanup firebase workflow * remove cleanup firebase code * 1. make doc url a constant 2.remove unused param

ascibisz · 2026-03-19T21:18:27Z

cellpack/bin/pack.py

-    recipe_loader = RecipeLoader(
-        recipe, packing_config_data["save_converted_recipe"], docker
-    )
+    if isinstance(recipe, dict):


Try moving this to recipe_loader, it might be cleaner

mogres

Read through the changes and left some clarifying comments. I might do some testing later just to verify none of the other workflows are breaking.

Really awesome work putting this together! Excited to see the speedup in cellPACK studio.

mogres · 2026-03-20T19:34:10Z

cellpack/autopack/loaders/recipe_loader.py

One suggestion to handle multiple input streams:

Update the RecipeLoader init to accept a single input_data argument which can be a path to a file or a dictionary. This should avoid potetntial bugs from conflicting inputs passed through input_file_path and json_recipe since there is only one input now.

Handle resolving the path or dict inside the read() method. This might require getting rid of the file path and file extension attributes which are not really used I think

mogres · 2026-03-20T21:53:47Z

cellpack/autopack/upy/simularium/simularium_helper.py

-        if file_name and url:
-            simulariumHelper.store_metadata(
-                file_name, url, db="firebase", job_id=job_id
+        if dedup_hash is None:


Can you clarify what happens here if dedup_hash is not None? Is the result not opened in browser if this is run locally?

For server jobs, the result file is uploaded as part of the outputs directory via upload_outputs_to_s3, the frontend can then fetch the result_path and display simularium result in Studio, so we intentionally skip the separate upload when dedup_hash is not None here.
For local packings, dedup_hash is None, so the existing upload +open in browser is preserved.

cellpack/autopack/writers/__init__.py

mogres · 2026-03-20T21:58:31Z

cellpack/autopack/DBRecipeHandler.py

+            # If db is AWSHandler, switch to firebase handler for job status updates
+            if hasattr(self.db, "s3_client"):
+                handler = DATABASE_IDS.handlers().get(DATABASE_IDS.FIREBASE)
+                db_handler = handler(default_db="staging")


Would this overwrite the self.db attribute? Might be better to create a new handler object or use .copy() just in case

mogres · 2026-03-20T21:59:58Z

cellpack/autopack/DBRecipeHandler.py

-            )
+            db_handler = self.db
+            # If db is AWSHandler, switch to firebase handler for job status updates
+            if hasattr(self.db, "s3_client"):


Is it possible to directly check if this is an AWSHandler using isinstance? Idk if it is possible for it to not have the s3_client attribute but just in case.

to your question above: no, self.db is not overwritten. In this case, db_handler is reassigned to a local variable that points to the Firebase handler, self.db remains as s3_client.

And I agree, isinstance would be more explicit and robust here, thanks for calling that out!

mogres · 2026-03-20T22:47:21Z

cellpack/bin/pack.py

Might be good to have an integration test to check if the changes in this file don't break the local workflow: pack -r RECIPE_PATH -c CONFIG_PATH with a mock recipe and config. Not sure if we have this already

mogres · 2026-03-20T22:48:33Z

cellpack/bin/pack.py

    autopack.helper = helper
    env = Environment(config=packing_config_data, recipe=recipe_data)
    env.helper = helper
+    env.dedup_hash = hash


You might have to update the environment __init__ to initialize the dedup_hash attribute as None

mogres · 2026-03-20T22:49:57Z

cellpack/bin/pack.py

-                config_data=packing_config_data,
-                recipe_data=recipe_loader.serializable_recipe_data,
-            )
+    if docker and hash:


Can you add a short docstring saying that this branch runs if pack is called from cellPACK studio (i.e. docker and hash are both provided)?

docker/Dockerfile.ecs

mogres · 2026-03-20T22:55:32Z

docker/server.py

We check this dedup hash against those existing in the job_status firebase collection to see if this exact recipe has already been packed, and if we find a match, we return it rather than running the packing

Just to clarify, do we still run the packing even if the dedup hash exists in firebase? Or do we just pull the results?

…m/mesoscope/cellpack into feature/server-passed-recipe-json

Co-authored-by: Saurabh Mogre <saurabh.mogre@alleninstitute.org>

rugeli · 2026-03-30T20:17:12Z

We check this dedup hash against those existing in the job_status firebase collection to see if this exact recipe has already been packed, and if we find a match, we return it rather than running the packing

Just to clarify, do we still run the packing even if the dedup hash exists in firebase? Or do we just pull the results?

@mogres Yeah in that case, we skip the packing entirely and return the debup_hash as jobId immediately without creating a new packing. This is only applies to server-initiated packings with a JSON body, local packings go through the CLI are unaffected.

…m/mesoscope/cellpack into feature/server-passed-recipe-json

ascibisz changed the title ~~Feature/server passed recipe json~~ Server should Accept Recipe JSON Oct 23, 2025

ascibisz mentioned this pull request Oct 23, 2025

Send Edited Recipe JSON in Request Body AllenCell/cellpack-client#123

Closed

ascibisz commented Oct 23, 2025

View reviewed changes

ascibisz marked this pull request as ready for review November 5, 2025 17:12

ascibisz marked this pull request as draft November 20, 2025 17:39

Base automatically changed from feature/client-upload-script to main December 1, 2025 18:11

ascibisz added 7 commits January 9, 2026 11:35

add upload script

78ca86d

add example data and more documentation

78b9b0a

point to correct collection

a9b056b

have server accept recipe as json object in body of request

f5f7a69

update documentation

f87915a

remove accidential dockerfile changes

1f2d2e3

rename param json_recipe

bd8ec42

ascibisz force-pushed the feature/server-passed-recipe-json branch from 63ca5ae to bd8ec42 Compare January 9, 2026 19:40

ascibisz added 12 commits January 9, 2026 11:42

remove file that shouldn't be in this PR

358158e

remove accidential file

f0beaa1

lint fixes

a54ffa1

refactor to try to improve clarity of json recipe vs file path

3d01db3

lint fixes

529e15b

lint fix

63514c9

minimize changeset

b2440cd

minimize changeset

470e3a1

simplify changeset

8a34898

code cleanup

45d438a

minimize changeset

c8fe120

remove trailing comma

ecc645d

rugeli approved these changes Jan 23, 2026

View reviewed changes

ascibisz and others added 13 commits March 19, 2026 11:26

minimize changeset

8f0c468

minimize changeset

3fd95a4

simplify changeset

2784825

code cleanup

e140122

minimize changeset

cf3b9ed

remove trailing comma

071aaf9

Only upload simularium file once (#446)

ac34219

* proposal: get hash from recipe loader * simplify get_ dedup_hash * only post simularium results file once for server job runs * update code for rebase * code cleanup --------- Co-authored-by: Ruge Li <rugeli0605@gmail.com>

Maint/firebase collection cleanup (#448)

86b3104

* remove local metadata writes for auto-pop feature * remove cleanup firebase workflow * remove cleanup firebase code * 1. make doc url a constant 2.remove unused param

handle both recipe_path and json body requests (#449)

45a10ab

change error message body

801b86f

lint fixes

7a42705

add more checks when attempting to read json body

c2259af

ascibisz force-pushed the feature/server-passed-recipe-json branch from c1d5718 to c2259af Compare March 19, 2026 18:28

ascibisz commented Mar 19, 2026

View reviewed changes

ascibisz requested review from meganrm and mogres March 19, 2026 21:35

mogres reviewed Mar 20, 2026

View reviewed changes

rugeli and others added 5 commits March 25, 2026 13:30

Merge branch 'feature/server-passed-recipe-json' of https://github.co…

d3c9c33

…m/mesoscope/cellpack into feature/server-passed-recipe-json

let recipe loader check the input and key stripping

e86069e

Update cellpack/autopack/writers/__init__.py

bcdc065

Co-authored-by: Saurabh Mogre <saurabh.mogre@alleninstitute.org>

use isinstance for AWSHandler, and misc

e667517

update aws tests

f9570ff

rugeli added 4 commits March 30, 2026 13:25

initialize dedup_hash

b2db8ec

Merge branch 'feature/server-passed-recipe-json' of https://github.co…

4170c1e

…m/mesoscope/cellpack into feature/server-passed-recipe-json

add in-line comment

a8ab9ad

temp solution: use requirement.txt

fd12619

rugeli mentioned this pull request Mar 30, 2026

have an integration test to check if the changes in this file don't break the local workflow #460

Open

rugeli requested a review from mogres March 31, 2026 17:33

Conversation

ascibisz commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Key Server Improvements

1. Deduplication & Caching

2. Input Flexibility & Backwards Compatibility

3. Smart Job Management

4. Firebase Request Reduction

5: Unified Results Upload

Technical Implementation

New Server Components:

Request Flow Changes:

Benefits

CellPACK Server Job Workflow Changes

BEFORE: Original Server Workflow

AFTER: Enhanced Server Workflow with JSON Recipe Support

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 27, 2025

Packing analysis report

Analysis for packing results located at cellpack/tests/outputs/test_spheres/spheresSST

Packing image

Distance analysis

Uh oh!

github-actions bot commented Jan 9, 2026

Packing analysis report

Analysis for packing results located at cellpack/tests/outputs/test_spheres/spheresSST

Packing image

Distance analysis

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mogres left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rugeli commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ascibisz commented Oct 22, 2025 •

edited

Loading