Environment variable related changes #374

allenwang28 · 2025-10-10T15:43:46Z

This PR does a few things:

Provisioner changes

This PR originally started to add in VLLM_HOST_IP, world_size and rank as environment variables in proc_mesh creation.
But then there was a clear need to inherit a few relevant environment variables (like TORCHSTORE_USE_RDMA) in the provisioner, and so I added in:

Environment variable related changes

Renames env_constants.py to env.py
Introduces an EnvVar pattern where you can set the name, default value, and description and easily resolve its logical value within code. Reduces boilerplate we had for checking strings etc.
Applies the changes to the relevant spots in the codebase

joecummings

Overall LGTM - huge improvement over our scattered env variables! Just would like to get @felipemello1 's quick thoughts on the perf_tracker stuff.

joecummings · 2025-10-10T16:46:31Z

src/forge/env.py

+
+
+@dataclass
+class EnvVar:


Surprising to me that an abstraction like this doesn't exist in the Python world.

joecummings · 2025-10-10T16:47:50Z

src/forge/env.py

+)
+
+
+@functools.cache


What's the reasoning for caching this?

what happens if we do:

all_env_vars() Some code change an envvar all_env_vars()

Would we get the first cached or the updated one?

hmm, so I assumed that we wouldn't change env vars in the run itself. So to avoid having to create this list every time we create a proc mesh, we cache it

So in your example, we would get the first cached. I think to avoid confusion I'll remove the cache for now...

joecummings · 2025-10-10T16:49:14Z

src/forge/observability/perf_tracker.py

-            os.getenv(METRIC_TIMER_USES_GPU, str(self.time_with_gpu)).lower() == "true"
-        ) and torch.cuda.is_available()
+
+        # TODO - follow up on if this env var behavior makes sense.


I don't exactly follow this - maybe @felipemello1 can weigh in?

I wanted a way to shutdown the cuda timing, in case we were worried that it was causing OOM or blocking GPU. Currently it lets you make: everything cpu, everything gpu, keep as it is (none)

we could reduce it to 'make everything cpu', 'keep as it is (none)'

yeah I understand why it's implemented this way -- I don't think we need to change it now

felipemello1

looks good! I like how we can add descriptions to them :)

Just a small question on the cache portion

allenwang28 added 6 commits October 9, 2025 12:44

start adding env constants

655b186

adds env_constants

467ac95

get_value on env var

3a86ad9

makes changes in the rest of the codebase

0bf5d0f

final fix

5cbc920

a few more cleanups

a2d8023

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 10, 2025

merge

9254a04

allenwang28 requested review from ebsmothers and joecummings October 10, 2025 15:50

allenwang28 added 2 commits October 10, 2025 08:53

merge fixes

1a8d600

test fix

493ff0a

joecummings approved these changes Oct 10, 2025

View reviewed changes

felipemello1 approved these changes Oct 10, 2025

View reviewed changes

allenwang28 added 2 commits October 10, 2025 10:11

remove cache

0747645

remove cache test

4fdf9ba

allenwang28 merged commit 3303af5 into meta-pytorch:main Oct 10, 2025
8 checks passed

allenwang28 deleted the vllm_multinode branch October 10, 2025 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Environment variable related changes #374

Environment variable related changes #374

Uh oh!

allenwang28 commented Oct 10, 2025 •

edited

Loading

Uh oh!

joecummings left a comment

Uh oh!

joecummings Oct 10, 2025

Uh oh!

joecummings Oct 10, 2025

Uh oh!

felipemello1 Oct 10, 2025

Uh oh!

allenwang28 Oct 10, 2025

Uh oh!

joecummings Oct 10, 2025

Uh oh!

felipemello1 Oct 10, 2025

Uh oh!

allenwang28 Oct 10, 2025

Uh oh!

felipemello1 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@dataclass
		class EnvVar:

		)


		@functools.cache

Environment variable related changes #374

Environment variable related changes #374

Uh oh!

Conversation

allenwang28 commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Provisioner changes

Environment variable related changes

Uh oh!

joecummings left a comment

Choose a reason for hiding this comment

Uh oh!

joecummings Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

joecummings Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

felipemello1 Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

allenwang28 Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

joecummings Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

felipemello1 Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

allenwang28 Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

felipemello1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

allenwang28 commented Oct 10, 2025 •

edited

Loading