Conversation
Signed-off-by: Connor Braa <connor@dagger.io>
Signed-off-by: Connor Braa <connor@dagger.io>
d1a9b33 to
8baab20
Compare
aluzzardi
left a comment
There was a problem hiding this comment.
Looks good!
Just to make sure I have the right mental model: worktrees are now only used as a staging area to dump data to be committed. Reading data is performed directly on the forked repo.
Slight but important change.
nit without a better suggestion: initialSourceDirID took me a minute to understand. Now it makes sense but I don't have a better name to offer.
|
|
||
| ID string | ||
| ID string | ||
| //TODO(braa): I think we only need this for Export, now. remove and pass explicitly. |
There was a problem hiding this comment.
Yep, AFAIK only Export() and Config.Locked() needs this.
We could remove Config.Locked (undocumented, unused feature) and make this explicit on Export().
This would be much cleaner since now repository is the only thing dealing with worktrees
There was a problem hiding this comment.
yeah, i intend to try this in a follow-up.
Strongly agree with this, I couldn't decide on a name so I went verbose. Other options include 'BaseSourceDirID', 'InitialCommitDirID' and variations... having typed it maybe now I'm partial to BaseSourceDirID? |
Random collection of thoughts:
|
|
Unfinished chain of thoughts (and this can be done in a different work stream, we don't want to tie everything to this PR, this is more of a UX/architecture brainstorm).
|
ah you're right! that's a good call, means the hackery i was gonna do is both unnecessary and excessive.
true, and that's worrisome.
more incremental UX brainstorming, i wonder if there's some happy medium where environment update:
like conceptually, it'd be more like
why would i want to only or by default do it this way when I could just continue my current agent session to get this behavior?
as a user, this sounds like a big faff to me, like if every time i've gotta prompt the agent on which environment to use, I'm not gonna use this ever for my "baseline" case, only for parallel agents (which is not something I'm doing today) and even then i've gotta figure out how to script the prompting to mux my agents across pre-created envs. |
|
@aluzzardi also the dumbest possible change to fix environment_update is still an option: base from the up-to-date source dir instead of the original one. we can even prompt engineer addService, run_command, etc to be like "to persist this change for future environment_create calls, use environment_update" |
|
soooo, possible TODOs before merging, partially because i wanna give @grouville a second to get the tests in: A:
B:
with B, i think there are problems around what's in the initial context windows. like we can have reasonable defaults (llm-provided->environment.json->ubuntu:latest) BUT the agent's not gonna know that the default will work, so i suspect it'll usually try to override? A & B are in conflict with each other, unfortunately, unless we add another method to Repository to support the update case... edit: took a bike ride, i think i can break the tradeoff here. |
|
|
From what I saw, LLMs are lazy and will do their best to never provide optional arguments :) |
- stop saving initialSourceDir - its saved in the llb. - have environment_update rebuild the base from workdir - improve prompt engineering around internal/external ports - improve run_command error handling so that failed commands can be saved into the container state Signed-off-by: Connor Braa <connor@dagger.io>
|
@aluzzardi C wasn't a real option because the run_command error handling was way too broken. but i fixed it 😈 b1e042c |
Signed-off-by: Connor Braa <connor@dagger.io>
Signed-off-by: Connor Braa <connor@dagger.io>
…agic Signed-off-by: Connor Braa <connor@dagger.io>
Signed-off-by: Connor Braa <connor@dagger.io>
| } | ||
| newState := env.container().WithExec(args, dagger.ContainerWithExecOpts{ | ||
| UseEntrypoint: useEntrypoint, | ||
| Expect: dagger.ReturnTypeAny, // Don't treat non-zero exit as error |
There was a problem hiding this comment.
note: I think this can have weird caching side effects by caching failed operations.
Particularly annoying for transient failures that won't get busted by content (e.g. pip install foo --> connection timeout --> will always return connection timeout from cache without trying).
Although not 100% sure the engine doesn't have workarounds already in place for that (/cc @sipsma ?)
There was a problem hiding this comment.
container.WithExec("pip install foo").WithExec("pip install foo")
because we're now saving the container on failures, too, doesn't it re-run because the second withexec is against a different container?
There was a problem hiding this comment.
Right! ...Maybe? Continue-on-failure is uncharted territory to me ... For instance, I'm not sure what would happen on cache bust, if some operation on the chain fails it would just push through anyway?
e.g.
container.WithExec("pip install foo").WithExec("pip install foo")
on cache bust, this will actually run install foo twice, right?
container.WithExec("pip install foo").WithExec("touch foo")
on cache bust, if pip install foo fails, this will actually return success and not bubble up any error to the LLM, right?
There was a problem hiding this comment.
As in -- on an operation-by-operation level it's all simple -- we manually check ExitCode and return accordingly.
However, when resuming from the Container state blob, we can't manually check ExitCodes for each exec in the chain and are relying on 1) the cache to not re-run execs OR 2) if it's not cached, the engine will re-exec and bubble up exec errors along the way, if any
I think dagger.ReturnTypeAny will prevent that, though. Emphasis on think.
There was a problem hiding this comment.
in my mental model, yes, this gets kinda broken in the case of new failures on cache bust.
| // Re-build the base image from the worktree | ||
| container, err := env.buildBase(ctx) | ||
| // Get current working directory from container to preserve changes | ||
| currentWorkdir := env.container().Directory(env.Config.Workdir) |
There was a problem hiding this comment.
dumb is sometimes best :)
this approach does mean that update no longer flattens the persisted container history, which was an accidental feature before - now it can get big and there's no way to collapse it. in the future we can maybe use the same AsGit().Ref().Tree() approach for update that we use for create.
it's working!
this PR implements "real" container resume where we store each Run WithExec in the container LLB definition and shores up the base of that container by saving off InitialSourceDir in state. InitialSourceDir is a made consistent and re-hydratable by using
dag.Host().Directory(forkRepoPath).AsGit().Ref(initialWorktreeHead).Tree(). because we can rely on that SHA always pointing at the exact right base content and the forkRepo always having that SHA, we can source the base of sourcedir without keeping a copy of the first commit on disk.6/25: this PR does have a big piece of "magic" in it. we template in the .git file to be
gitdir: repos/name/worktrees/idso that, on export, we don't constantly override reset the git history.Other stuff buried in here:
additionally on 6/25 cc @aluzzardi:
closes #115.