Skip to content

Server should Accept Recipe JSON #419

Open
ascibisz wants to merge 62 commits intomainfrom
feature/server-passed-recipe-json
Open

Server should Accept Recipe JSON #419
ascibisz wants to merge 62 commits intomainfrom
feature/server-passed-recipe-json

Conversation

@ascibisz
Copy link
Copy Markdown
Collaborator

@ascibisz ascibisz commented Oct 22, 2025

Problem

Currently, when a user packs an edited recipe via the cellpack client, we upload the edited recipe to firebase, pass that reference to the server, and then the server retrieves the recipe from firebase. To minimize firebase calls and improve efficiency of the client, we're changing that flow to instead send the edited recipe JSON as the body of the packing request to the server.

We also wanted a way to check if we have already packed a recipe and if we have, return that result file rather than running the whole packing again. To do this, we are calculating a hash for the recipe objects before they are packed. The hash for each recipe will be uploaded to firebase along with its result files, so we can query firebase for a given hash to see if that recipe has a packing result already.

Key Server Improvements

1. Deduplication & Caching

  • BEFORE: Each request generated a unique UUID, no deduplication possible
  • AFTER: JSON recipes generate deterministic hash, enabling job deduplication

2. Input Flexibility & Backwards Compatibility

  • BEFORE: Only recipe file paths supported via query parameter
  • AFTER: Supports both recipe file paths AND direct JSON recipe objects in request body, plus optional config parameter

3. Smart Job Management

  • BEFORE: Generated UUID for each job without deduplication, every request creates new job regardless of content
  • AFTER: Uses deterministic hash for JSON recipes, enabling job reuse for identical recipes

4. Firebase Request Reduction

  • BEFORE: Every edited recipe was uploaded to firebase by the client and downloaded from firebase by the server
  • AFTER: Edited recipes are passed in the body of the packing request, so no firebase uploads or downloads occur

5: Unified Results Upload

  • BEFORE: Simularium result file was uploaded to S3 twice per job, once on its own and once as part of the full output files upload
  • AFTER: Only upload Simularium result file once by keeping track of its path when we upload all output files

Technical Implementation

New Server Components:

  1. DataDoc.generate_hash() - Creates deterministic hash from recipe JSON
  2. job_exists() - Checks if job already completed in Firebase
  3. Enhanced request handling - Reads JSON from request body
  4. Smart job ID generation - Uses hash for JSON recipes, UUID for file paths

Request Flow Changes:

  1. Input validation now checks both query params and request body
  2. Hash-based deduplication for JSON recipes
  3. Backward compatibility maintained for file-based recipes
  4. Consistent job tracking with hash parameter

Benefits

  1. Reduced Server Load: Identical recipes don't reprocess
  2. Faster Client Response: Instant return for duplicate JSON requests
  3. Better Resource Utilization: No redundant compute for same recipes
  4. Improved API Design: JSON recipes easier for programmatic access
  5. Reduced Firebase Usage: Passing recipe directly instead of uploading to firebase

CellPACK Server Job Workflow Changes

BEFORE: Original Server Workflow

graph TD
    A[Client Request] --> B[POST /start-packing]
    B --> C{Check for recipe URL param}
    C -->|Missing| D[Return 400 Error]
    C -->|Present| E[Generate UUID for job_id]
    E --> F[Create Background Task]
    F --> G[Return job_id immediately]
    F --> I[Initiate packing]
    I --> J[Load recipe from firebase<br>using file path from<br>URL param]
    J --> K[Execute packing]
    K --> L{Packing succeeds?}
    L -->|Success| M[S3: Upload outputs to S3<br>Firebase: Update job status to SUCCEEDED]
    L -->|Failure| N[Firebase: Update job status to FAILED]
    
    style A fill:#e1f5fe
    style G fill:#c8e6c9
    style M fill:#fff3e0
    style N fill:#ffcdd2
Loading

AFTER: Enhanced Server Workflow with JSON Recipe Support

graph TD
    A[Client Request] --> B[POST /start-packing]
    B --> C{Check inputs}
    C -->|No recipe - no URL param<br>and no request body| D[Return 400 Error]
    C -->|Has recipe path URL param| E[Generate UUID for job_id]
    C -->|Has recipe JSON in request body| F[Generate hash from JSON]
    F --> G{Packing result exists<br>in firebase for this hash?}
    G -->|Yes| H[Return existing hash<br>as job_id]
    G -->|No| I[Use hash as job_id]
    E --> J[Create Background Task]
    I --> J
    J --> K[Return job_id immediately]
    J --> L[Initiate packing]
    L --> M{Input type?}
    M -->|Recipe path| N[Load recipe from firebase<br>using file path from<br>URL param]
    M -->|JSON body| O[Load recipe from JSON dict<br>from request body]
    N --> P[Execute packing]
    O --> P
    P --> Q{Packing succeeds?}
    Q -->|Success| R[S3: Upload outputs to S3<br>Firebase: Update job status to SUCCEEDED]
    Q -->|Failure| S[Firebase: Update job status to FAILED]
    
    style A fill:#e1f5fe
    style K fill:#c8e6c9
    style R fill:#fff3e0
    style S fill:#ffcdd2
    style G fill:#ffeb3b
    style H fill:#c8e6c9
Loading

@ascibisz ascibisz changed the title Feature/server passed recipe json Server should Accept Recipe JSON Oct 23, 2025
3. Try hitting the test endpoint on the server, by navigating to `http://0.0.0.0:8443/hello` in your browser.
4. Try running a packing on the server, by hitting the `http://0.0.0.0:80/pack?recipe=firebase:recipes/one_sphere_v_1.0.0` in your browser.
3. Try hitting the test endpoint on the server, by navigating to `http://0.0.0.0:80/hello` in your browser.
4. Try running a packing on the server, by hitting the `http://0.0.0.0:80/start-packing?recipe=firebase:recipes/one_sphere_v_1.0.0` in your browser.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These instructions were just slightly incorrect, this has nothing to do with the other code changes, I just ran into it when testing my code and I wanted to fix it

@github-actions
Copy link
Copy Markdown
Contributor

Packing analysis report

Analysis for packing results located at cellpack/tests/outputs/test_spheres/spheresSST

Ingredient name Encapsulating radius Average number packed
ext_A 25 236.0

Packing image

Packing image

Distance analysis

Expected minimum distance: 50.00
Actual minimum distance: 50.01

Ingredient key Pairwise distance distribution
ext_A Distance distribution ext_A

@ascibisz ascibisz marked this pull request as ready for review November 5, 2025 17:12
@ascibisz ascibisz marked this pull request as draft November 20, 2025 17:39
Base automatically changed from feature/client-upload-script to main December 1, 2025 18:11
@ascibisz ascibisz force-pushed the feature/server-passed-recipe-json branch from 63ca5ae to bd8ec42 Compare January 9, 2026 19:40
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 9, 2026

Packing analysis report

Analysis for packing results located at cellpack/tests/outputs/test_spheres/spheresSST

Ingredient name Encapsulating radius Average number packed
ext_A 25 236.0

Packing image

Packing image

Distance analysis

Expected minimum distance: 50.00
Actual minimum distance: 50.01

Ingredient key Pairwise distance distribution
ext_A Distance distribution ext_A

* remove os fetch for job_id

* use dedup_hash instead of job id

* proposal: get hash from recipe loader

* renaming and add TODOs

* format

* rename param to hash

* remove unused validate param and doc strings in pack

* simplify get_ dedup_hash

* refactor job_status update

* cleanup

* fix upload_job_status to handle awshandler

* pass dedup_pash to env for fetching across files

* add tests

* format1

* format test
ascibisz and others added 13 commits March 19, 2026 11:26
* remove os fetch for job_id

* use dedup_hash instead of job id

* proposal: get hash from recipe loader

* renaming and add TODOs

* format

* rename param to hash

* remove unused validate param and doc strings in pack

* simplify get_ dedup_hash

* refactor job_status update

* cleanup

* fix upload_job_status to handle awshandler

* pass dedup_pash to env for fetching across files

* add tests

* format1

* format test
* proposal: get hash from recipe loader

* simplify get_ dedup_hash

* only post simularium results file once for server job runs

* update code for rebase

* code cleanup

---------

Co-authored-by: Ruge Li <rugeli0605@gmail.com>
* remove local metadata writes for auto-pop feature

* remove cleanup firebase workflow

* remove cleanup firebase code

* 1. make doc url a constant 2.remove unused param
@ascibisz ascibisz force-pushed the feature/server-passed-recipe-json branch from c1d5718 to c2259af Compare March 19, 2026 18:28
recipe_loader = RecipeLoader(
recipe, packing_config_data["save_converted_recipe"], docker
)
if isinstance(recipe, dict):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try moving this to recipe_loader, it might be cleaner

@ascibisz ascibisz requested review from meganrm and mogres March 19, 2026 21:35
Copy link
Copy Markdown
Collaborator

@mogres mogres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through the changes and left some clarifying comments. I might do some testing later just to verify none of the other workflows are breaking.

Really awesome work putting this together! Excited to see the speedup in cellPACK studio.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion to handle multiple input streams:

  1. Update the RecipeLoader init to accept a single input_data argument which can be a path to a file or a dictionary. This should avoid potetntial bugs from conflicting inputs passed through input_file_path and json_recipe since there is only one input now.
  2. Handle resolving the path or dict inside the read() method. This might require getting rid of the file path and file extension attributes which are not really used I think

if file_name and url:
simulariumHelper.store_metadata(
file_name, url, db="firebase", job_id=job_id
if dedup_hash is None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what happens here if dedup_hash is not None? Is the result not opened in browser if this is run locally?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For server jobs, the result file is uploaded as part of the outputs directory via upload_outputs_to_s3, the frontend can then fetch the result_path and display simularium result in Studio, so we intentionally skip the separate upload when dedup_hash is not None here.
For local packings, dedup_hash is None, so the existing upload +open in browser is preserved.

# If db is AWSHandler, switch to firebase handler for job status updates
if hasattr(self.db, "s3_client"):
handler = DATABASE_IDS.handlers().get(DATABASE_IDS.FIREBASE)
db_handler = handler(default_db="staging")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this overwrite the self.db attribute? Might be better to create a new handler object or use .copy() just in case

)
db_handler = self.db
# If db is AWSHandler, switch to firebase handler for job status updates
if hasattr(self.db, "s3_client"):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to directly check if this is an AWSHandler using isinstance? Idk if it is possible for it to not have the s3_client attribute but just in case.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to your question above: no, self.db is not overwritten. In this case, db_handler is reassigned to a local variable that points to the Firebase handler, self.db remains as s3_client.

And I agree, isinstance would be more explicit and robust here, thanks for calling that out!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to have an integration test to check if the changes in this file don't break the local workflow: pack -r RECIPE_PATH -c CONFIG_PATH with a mock recipe and config. Not sure if we have this already

autopack.helper = helper
env = Environment(config=packing_config_data, recipe=recipe_data)
env.helper = helper
env.dedup_hash = hash
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have to update the environment __init__ to initialize the dedup_hash attribute as None

config_data=packing_config_data,
recipe_data=recipe_loader.serializable_recipe_data,
)
if docker and hash:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a short docstring saying that this branch runs if pack is called from cellPACK studio (i.e. docker and hash are both provided)?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check this dedup hash against those existing in the job_status firebase collection to see if this exact recipe has already been packed, and if we find a match, we return it rather than running the packing

Just to clarify, do we still run the packing even if the dedup hash exists in firebase? Or do we just pull the results?

@rugeli
Copy link
Copy Markdown
Collaborator

rugeli commented Mar 30, 2026

We check this dedup hash against those existing in the job_status firebase collection to see if this exact recipe has already been packed, and if we find a match, we return it rather than running the packing

Just to clarify, do we still run the packing even if the dedup hash exists in firebase? Or do we just pull the results?

@mogres Yeah in that case, we skip the packing entirely and return the debup_hash as jobId immediately without creating a new packing. This is only applies to server-initiated packings with a JSON body, local packings go through the CLI are unaffected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants