Skip to content

vGPU: Add "Ignore Memory Limit" option to bypass GPU memory check on task submission#3343

Closed
QingengWei wants to merge 1 commit intodevelopfrom
SCEM-11779
Closed

vGPU: Add "Ignore Memory Limit" option to bypass GPU memory check on task submission#3343
QingengWei wants to merge 1 commit intodevelopfrom
SCEM-11779

Conversation

@QingengWei
Copy link
Contributor

@QingengWei QingengWei commented Mar 10, 2026

Note

Medium Risk
Touches the end-to-end task submission path and alters server payloads, so mis-wiring could change execution behavior for vGPU jobs; additionally, the committed test script includes hard-coded credentials that should not ship.

Overview
Adds a new optional ignore_memory_limit flag that is threaded through the public web run surface (web.run, run_async, Job.start, Batch.run/start, and S-matrix modeler runs) down to task submission payloads (TaskCore.submit and RF batch submit), enabling vGPU users to bypass backend GPU memory limit checks when starting tasks.

Includes new test fixtures: a large ModalComponentModeler JSON and a helper script to upload/run it against the web API (currently containing hard-coded API tokens).

Written by Cursor Bugbot for commit df1b836. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

#web.configure("Ltrvqel7oCenUTH88Pqh99vn7ikCD25KFPZ0phz2Mxtgl5I4")

#Env.uat.active()
#web.configure("LmpSvRP0MGOuKOgm9ZJn97l9RE8t5I2ENTI9RLwbXlmmW89Z")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded API keys committed in test file

High Severity

Production API keys are hardcoded in tests/tidy3d_modal_cm_fdtd_test.py. Three separate API keys (for prod, dev, and uat environments) are exposed in plaintext via web.configure(...) calls. These credentials will be visible in version control history even if later removed and could allow unauthorized access to the service.

Fix in Cursor Fix in Web

"protocolVersion": protocol_version,
"workerGroup": worker_group,
"vgpu_allocation": vgpu_allocation,
"ignore_memory_limit": ignore_memory_limit,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snake_case API keys instead of camelCase in submit

High Severity

In the BatchTaskCore.submit method, the HTTP POST body uses snake_case keys "vgpu_allocation" and "ignore_memory_limit" instead of the camelCase "vgpuAllocation" and "ignoreMemoryLimit" used by the other submit method (for Tidy3dTask) and by the other keys in the same dictionary ("solverVersion", "protocolVersion", "workerGroup"). The server likely won't recognize these misnamed keys, so the parameters will be silently ignored.

Fix in Cursor Fix in Web

max_num_adjoint_per_fwd=max_num_adjoint_per_fwd,
numerical_structures=numerical_structures,
custom_vjp=custom_vjp,
ignore_memory_limit=ignore_memory_limit,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore_memory_limit dropped in two run_custom code paths

High Severity

The run_custom function accepts ignore_memory_limit and passes it correctly for the component modeler path (line 478), but the autograd _run() call (lines 499–520) and the non-autograd webapi.run() call (lines 531–549) both omit it. For regular Simulation objects, the parameter is silently ignored regardless of the code path taken.

Additional Locations (1)

Fix in Cursor Fix in Web

"element_mappings": [],
"custom_source_time": null,
"type": "ModalComponentModeler"
} No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test data file accidentally committed to repository

Medium Severity

A large 1500-line simulation data JSON file (tests/modal_cm.json) appears to have been accidentally committed alongside the ad-hoc test script tests/tidy3d_modal_cm_fdtd_test.py. This is test fixture data for a manual debugging script, not part of the automated test suite, and adds unnecessary bulk to the repository.

Fix in Cursor Fix in Web

@daquinteroflex
Copy link
Collaborator

daquinteroflex commented Mar 11, 2026

Thanks I will make sure this gets in for v2.11.0.dev2 by early next week

@shanguomagi
Copy link

@daquinteroflex Please let me know when the test version is released next week, and I will deploy it to UAT so QA can start testing.

@daquinteroflex
Copy link
Collaborator

@shanguomagi let's discuss in private but if you can show me the dev branch I should be able to test there first and then you should have a version available to you in flex possibly by tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants