feat: remote environment support #56

devversion · 2025-09-25T16:49:42Z

This commit reorganizes the web-codegen-scorer to support remote
environments. A remote environment is similar to the existing concept of
environments, with the exception that the lifecycle of an environment
can be managed in a hosted standalone server within a e.g. corporate
network.

The server would then provide additional features to the
web-codegen-scorer, like:

different models for file generation
different execution sandboxes for building and serving an app (e.g.
consider a framework like Wiz that is internal to Google)

In practice, a remote environment exposes all of the important internal
hooks to advanced users, so that they can be fully in charge of:

file generation via LLMs
building generated apps
repairing generated apps
serving generated apps

Most users will never have to deal with this, but the architecture is highly
beneficial for further separation of concerns in the codebase, plus potentially
paving the way to support different languages (if we intend to do so), because
the logic for testing a "served app" is easy to disable with these changes.

R: @mturco

devversion · 2025-09-26T14:45:30Z

runner/configuration/base-environment-config.ts

+import { mcpServerOptionsSchema } from '../codegen/llm-runner.js';
+import { getPossiblePackageManagers } from './environment-config.js';
+
+export const baseEnvironmentConfigSchema = z.strictObject({


This is a copy of the existing schema, just with all fields like buildCommand, serveCommand removed. This is because remote environments don't have these options; so this is the "shared base schema".

devversion · 2025-09-26T14:48:31Z

runner/configuration/base-environment.ts

@@ -0,0 +1,332 @@
+import { readdirSync, readFileSync, statSync } from 'fs';


This is a copy of the previous environment with the exception that e.g. buildCommand is no longer in there. This is the "base/shared class" for environments. Local + remote environments will extend from this.

Logic is the same as before, just with some of these options removed + notice the abstract gateway.

devversion · 2025-09-26T14:49:07Z

runner/configuration/environment-config.ts

-  /** When enabled, the system prompts for this environment won't be included in the report. */
-  classifyPrompts: z.boolean().optional(),
-});
+const environmentConfigSchema = z.union([


Environments can now be defined by either a LocalEnvironmentConfig or a RemoteEnvironmentConfig

devversion · 2025-09-26T14:51:47Z

runner/configuration/environment-local.ts

+>;
+
+/** Represents a single prompt evaluation environment. */
+export class LocalEnvironment extends BaseEnvironment {


This is the logic from the old environment that was pulled out in the files before!

The new LocalGateway() is new here. This is because a local environment always uses the "default" local gateway; interacting with a project boilerplate for building etc.

devversion · 2025-09-26T14:53:21Z

runner/eval-cli.ts

  }

  try {
-    llm = await getRunnerByName(cliArgs.runner as RunnerName);


An actual LLM for generation is only created when a local environment is selected; so this logic moved into the generateCodeAndAssess call

devversion · 2025-09-26T14:59:32Z

runner/orchestration/serve-testing-worker.ts

+import { BrowserAgentTaskInput } from '../testing/browser-agent/models.js';
+
+/** Attempts to run & test an eval app. */
+export async function serveAndTestApp(


This is a new split from the previous build worker. After the build worker, serving can be initiated by the gateway. Remote might just serve from within the service, while local will start a child process from the serveCommand. Like ng serve --port 0.

devversion · 2025-09-26T15:00:38Z

runner/orchestration/serve-testing-worker.ts

+    rootPromptDef,
+    progress,
+    async (serveUrl) => {
+      const serveParams: ServeTestingWorkerMessage = {


Once the server is running (from the gateway), we start the serve-testing worker for connecting to the app and performing tests

devversion · 2025-09-26T15:02:05Z

runner/reporting/report-logging.ts

-    ` - MCP servers: ${options.startMcp && env.mcpServerOptions.length ? env.mcpServerOptions.length : 'none'}`,
+    ` - Runner: ${env instanceof LocalEnvironment ? env.llm.displayName : 'Remote'}`,
+    env instanceof LocalEnvironment
+      ? ` - MCP servers: ${options.startMcp && env.mcpServerOptions.length ? env.mcpServerOptions.length : 'none'}`


MCP servers are currently only launched by our local environments. We can look into this setting for remote environments in follow-ups

devversion · 2025-09-26T15:02:38Z

runner/shared-interfaces.ts

+  finalAttempt: {
+    buildResult: BuildResult;
+    serveTestingResult: ServeTestingResult | null;
+  };


These are breaking changes for older reports! We need to migrate older reports 😞

devversion · 2025-09-26T15:03:09Z

runner/workers/builder/builder-types.ts

@@ -0,0 +1,41 @@
+import { PackageSummary } from '@safety-web/types';


Same types as with the old builder worker, just that the logic for serving is gone here.

mturco

this was a ton of work — thank you for doing it!

I think this is exactly what I'm looking for / had in mind for remote envs, just have some non-blocking comments/questions

mturco · 2025-09-26T16:15:59Z

examples/environments/remote_env/config.js

@@ -0,0 +1,18 @@
+// @ts-check


can we just make this a .ts file? is that supported?

Unfortunately not I think 😞 We should look into this in a follow-up.

runner/codegen/llm-runner.ts

mturco · 2025-09-26T16:27:08Z

runner/configuration/base-environment.ts

+/** Represents a single prompt evaluation environment. */
+export abstract class BaseEnvironment {
+  /** Path at which the environment is defined. */
+  readonly rootPath: string;


wouldn't this just be the directory of the environment implementing this class?

Yeah, it's just the path where the environment config lives. I don't fully recall why this is needed, but it's unrelated to this PR (and needs to stay for remote envs too I think)

runner/eval-cli.ts

mturco · 2025-09-26T16:58:34Z

runner/orchestration/file-system.ts

  const directoriesToCopy: string[] = [];

-  if (env.projectTemplatePath) {
+  if (env instanceof LocalEnvironment && env.projectTemplatePath) {


this makes sense to me 👍

mturco · 2025-09-26T17:01:16Z

runner/orchestration/gateway.ts

+    rootPromptDef: RootPromptDefinition,
+    progress: ProgressLogger,
+    logicWhileServing: (serveUrl: string) => Promise<T>
+  ): Promise<T>;


just curious: why does this return value need to be generic?

The logic executing while an app is served, collects a result. This result should be exposed to the runner. I guess we could expect logicWhileServing callback to store that itself, but feels a bit unclean. See e.g.

let result: ServeTestingResult; await gateway.serveApp(..., (serverUrl) => { // logic for testing app result = bla; }) // Now expect `result` to be available. Would need a `result!` for type safety.

mturco · 2025-09-26T17:02:19Z

runner/orchestration/generate.ts

-  // via https://stackoverflow.com/questions/9006988/node-js-on-windows-how-to-clear-console
-  if (options.logging === 'dynamic') {
-    process.stdout.write('\x1Bc');
+  // TODO(devversion): Consider validating model names also for remote environments.


this would seemingly be difficult since remote envs could be using private models with unknown names, right?

Yeah, I would imagine we can either never do this (and remove the TODO), or eventually this validation can happen as part of initializeEval?

oh yeah that's a good idea 👍

runner/orchestration/generate.ts

This commit reorganizes the `web-codegen-scorer` to support remote environments. A remote environment is similar to the existing concept of environments, with the exception that the lifecycle of an environment can be managed in a hosted standalone server within a e.g. corporate network. The server would then provide additional features to the web-codegen-scorer, like: - different models for file generation - different execution sandboxes for building and serving an app (e.g. consider a framework like Wiz that is internal to Google) In practice, a remote environment exposes all of the important internal hooks to advanced users, so that they can be fully in charge of: - file generation via LLMs - building generated apps - repairing generated apps - serving generated apps Most users will never have to deal with this, but the architecture is highly beneficial for further separation of concerns in the codebase, plus potentially paving the way to support different languages (if we intend to do so), because the logic for testing a "served app" is easy to disable with these changes.

devversion force-pushed the remote-envs branch from 86800ff to 8f80ce8 Compare September 26, 2025 14:40

refactor: pre-format report-viewer.html to reduce future diffs

de8288f

devversion force-pushed the remote-envs branch 2 times, most recently from 4122cf6 to 7107e6c Compare September 26, 2025 14:47

devversion commented Sep 26, 2025

View reviewed changes

devversion force-pushed the remote-envs branch from 7107e6c to 810afd7 Compare September 26, 2025 15:03

devversion changed the title ~~WIP: feat: remote environment support~~ feat: remote environment support Sep 26, 2025

devversion marked this pull request as ready for review September 26, 2025 15:04

devversion force-pushed the remote-envs branch 3 times, most recently from b8c2a8a to ee61ccb Compare September 26, 2025 15:49

devversion requested a review from mturco September 26, 2025 16:05

mturco approved these changes Sep 26, 2025

View reviewed changes

devversion added 2 commits September 26, 2025 17:28

refactor: improve remote environment example by providing a fake gateway

cfcb979

devversion force-pushed the remote-envs branch from ee61ccb to cfcb979 Compare September 26, 2025 17:28

devversion merged commit a4b50ec into angular:main Sep 26, 2025
3 checks passed

devversion deleted the remote-envs branch September 26, 2025 17:36

		@@ -0,0 +1,332 @@
		import { readdirSync, readFileSync, statSync } from 'fs';

		@@ -0,0 +1,41 @@
		import { PackageSummary } from '@safety-web/types';

feat: remote environment support #56

feat: remote environment support #56

Uh oh!

Conversation

devversion commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mturco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devversion Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devversion Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

devversion commented Sep 25, 2025 •

edited

Loading

devversion Sep 26, 2025 •

edited

Loading

devversion Sep 26, 2025 •

edited

Loading