-
Notifications
You must be signed in to change notification settings - Fork 36
feat: remote environment support #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
86800ff
to
8f80ce8
Compare
4122cf6
to
7107e6c
Compare
import { mcpServerOptionsSchema } from '../codegen/llm-runner.js'; | ||
import { getPossiblePackageManagers } from './environment-config.js'; | ||
|
||
export const baseEnvironmentConfigSchema = z.strictObject({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a copy of the existing schema, just with all fields like buildCommand
, serveCommand
removed. This is because remote environments don't have these options; so this is the "shared base schema".
@@ -0,0 +1,332 @@ | |||
import { readdirSync, readFileSync, statSync } from 'fs'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a copy of the previous environment
with the exception that e.g. buildCommand
is no longer in there. This is the "base/shared class" for environments. Local + remote environments will extend from this.
Logic is the same as before, just with some of these options removed + notice the abstract gateway
.
/** When enabled, the system prompts for this environment won't be included in the report. */ | ||
classifyPrompts: z.boolean().optional(), | ||
}); | ||
const environmentConfigSchema = z.union([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Environments can now be defined by either a LocalEnvironmentConfig
or a RemoteEnvironmentConfig
>; | ||
|
||
/** Represents a single prompt evaluation environment. */ | ||
export class LocalEnvironment extends BaseEnvironment { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the logic from the old environment that was pulled out in the files before!
The new LocalGateway()
is new here. This is because a local environment always uses the "default" local gateway; interacting with a project boilerplate for building etc.
} | ||
|
||
try { | ||
llm = await getRunnerByName(cliArgs.runner as RunnerName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An actual LLM for generation is only created when a local environment is selected; so this logic moved into the generateCodeAndAssess
call
import { BrowserAgentTaskInput } from '../testing/browser-agent/models.js'; | ||
|
||
/** Attempts to run & test an eval app. */ | ||
export async function serveAndTestApp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new split from the previous build worker. After the build worker, serving can be initiated by the gateway. Remote might just serve from within the service, while local will start a child process from the serveCommand
. Like ng serve --port 0
.
rootPromptDef, | ||
progress, | ||
async (serveUrl) => { | ||
const serveParams: ServeTestingWorkerMessage = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the server is running (from the gateway), we start the serve-testing worker for connecting to the app and performing tests
` - MCP servers: ${options.startMcp && env.mcpServerOptions.length ? env.mcpServerOptions.length : 'none'}`, | ||
` - Runner: ${env instanceof LocalEnvironment ? env.llm.displayName : 'Remote'}`, | ||
env instanceof LocalEnvironment | ||
? ` - MCP servers: ${options.startMcp && env.mcpServerOptions.length ? env.mcpServerOptions.length : 'none'}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MCP servers are currently only launched by our local environments. We can look into this setting for remote environments in follow-ups
finalAttempt: { | ||
buildResult: BuildResult; | ||
serveTestingResult: ServeTestingResult | null; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are breaking changes for older reports! We need to migrate older reports 😞
@@ -0,0 +1,41 @@ | |||
import { PackageSummary } from '@safety-web/types'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same types as with the old builder worker, just that the logic for serving is gone here.
7107e6c
to
810afd7
Compare
b8c2a8a
to
ee61ccb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a ton of work — thank you for doing it!
I think this is exactly what I'm looking for / had in mind for remote envs, just have some non-blocking comments/questions
@@ -0,0 +1,18 @@ | |||
// @ts-check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just make this a .ts
file? is that supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not I think 😞 We should look into this in a follow-up.
/** Represents a single prompt evaluation environment. */ | ||
export abstract class BaseEnvironment { | ||
/** Path at which the environment is defined. */ | ||
readonly rootPath: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this just be the directory of the environment implementing this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's just the path where the environment config lives. I don't fully recall why this is needed, but it's unrelated to this PR (and needs to stay for remote envs too I think)
const directoriesToCopy: string[] = []; | ||
|
||
if (env.projectTemplatePath) { | ||
if (env instanceof LocalEnvironment && env.projectTemplatePath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sense to me 👍
rootPromptDef: RootPromptDefinition, | ||
progress: ProgressLogger, | ||
logicWhileServing: (serveUrl: string) => Promise<T> | ||
): Promise<T>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious: why does this return value need to be generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic executing while an app is served, collects a result. This result should be exposed to the runner. I guess we could expect logicWhileServing
callback to store that itself, but feels a bit unclean. See e.g.
let result: ServeTestingResult;
await gateway.serveApp(..., (serverUrl) => {
// logic for testing app
result = bla;
})
// Now expect `result` to be available. Would need a `result!` for type safety.
// via https://stackoverflow.com/questions/9006988/node-js-on-windows-how-to-clear-console | ||
if (options.logging === 'dynamic') { | ||
process.stdout.write('\x1Bc'); | ||
// TODO(devversion): Consider validating model names also for remote environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would seemingly be difficult since remote envs could be using private models with unknown names, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would imagine we can either never do this (and remove the TODO), or eventually this validation can happen as part of initializeEval
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah that's a good idea 👍
This commit reorganizes the `web-codegen-scorer` to support remote environments. A remote environment is similar to the existing concept of environments, with the exception that the lifecycle of an environment can be managed in a hosted standalone server within a e.g. corporate network. The server would then provide additional features to the web-codegen-scorer, like: - different models for file generation - different execution sandboxes for building and serving an app (e.g. consider a framework like Wiz that is internal to Google) In practice, a remote environment exposes all of the important internal hooks to advanced users, so that they can be fully in charge of: - file generation via LLMs - building generated apps - repairing generated apps - serving generated apps Most users will never have to deal with this, but the architecture is highly beneficial for further separation of concerns in the codebase, plus potentially paving the way to support different languages (if we intend to do so), because the logic for testing a "served app" is easy to disable with these changes.
ee61ccb
to
cfcb979
Compare
This commit reorganizes the
web-codegen-scorer
to support remoteenvironments. A remote environment is similar to the existing concept of
environments, with the exception that the lifecycle of an environment
can be managed in a hosted standalone server within a e.g. corporate
network.
The server would then provide additional features to the
web-codegen-scorer, like:
consider a framework like Wiz that is internal to Google)
In practice, a remote environment exposes all of the important internal
hooks to advanced users, so that they can be fully in charge of:
Most users will never have to deal with this, but the architecture is highly
beneficial for further separation of concerns in the codebase, plus potentially
paving the way to support different languages (if we intend to do so), because
the logic for testing a "served app" is easy to disable with these changes.
R: @mturco