Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 35 additions & 20 deletions pkg/skaffold/runner/dev.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,12 @@ import (
"context"
"fmt"
"io"
"os"
"strconv"
"time"

"github.com/cenkalti/backoff/v4"
"go.opentelemetry.io/otel/trace"

"github.com/GoogleContainerTools/skaffold/v2/pkg/skaffold/constants"
"github.com/GoogleContainerTools/skaffold/v2/pkg/skaffold/event"
Expand Down Expand Up @@ -93,29 +95,16 @@ func (r *SkaffoldRunner) doDev(ctx context.Context, out io.Writer) error {
instrumentation.AddDevIteration("sync")
meterUpdated = true

syncHandler := func(s *sync.Item) error {
fileCount := len(s.Copy) + len(s.Delete)
output.Default.Fprintf(out, "Syncing %d files for %s\n", fileCount, s.Image)
fileSyncInProgress(fileCount, s.Image)

if err := r.deployer.GetSyncer().Sync(childCtx, out, s); err != nil {
log.Entry(ctx).Warn("Skipping deploy due to sync error:", err)
fileSyncFailed(fileCount, s.Image, err)
event.DevLoopFailedInPhase(r.devIteration, constants.Sync, err)
eventV2.TaskFailed(constants.DevLoop, err)
endTrace(instrumentation.TraceEndError(err))

return err
}

fileSyncSucceeded(fileCount, s.Image)

return nil
}
syncHandler := r.getSyncHandler(ctx, childCtx, out, endTrace)
for _, s := range r.changeSet.NeedsResync() {
err := backoff.Retry(
func() error {
return syncHandler(s)
err := syncHandler(s)
if err == nil || os.IsNotExist(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this essentially swallows the NotExist error entirely, treats it as though it wasn't an error at all. Is that the correct thing to do here?

The other option is to wrap it in PermanentError which will stop the retries, but still return it from backoff.Retry, and eventually it'll get returned from doDev on line 112.

But maybe that's not the right thing to do because a deleted file is intended and not an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved the logic. This approach ensures that:

  • Deleted files stop retrying immediately and log an informative message
  • Other transient errors get retried (up to 3 times as configured)
  • Persistent errors are logged, but don't stop the entire dev loop
  • The dev loop continues processing other sync items even if one fails

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this just feels like the wrong place for this. We're way too high up in the stack.

The syncHandler handles syncing every file in s.Copy and s.Delete. There could be dozens of files there. If one of them ends in a IsNotExist error, we didn't actually necessarily finish syncing all the files, right? We just gave up halfway through. There could be other errors we don't know about and other files we never got around to syncing.

So it feels like the syncer is going to have to be the one who decides if an IsNotExist error is relevant or not? If it's not relevant it shouldn't even return the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean this part:

if err != nil {
  log.Entry(ctx).Warnf("Sync failed after retries for %s: %v", s.Image, err)
  continue
}

Then yes, probably you're right, I can rollback the "return" statement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean something like this:

  1. syncHandler calls Sync() with a list of (for example) 50 files
  2. Sync() starts looping through those files
  3. on the third file, it can't find the file and returns a file-not-found error
    • notice: Files 4-50 are never synced!
  4. your retry code gets this error, calls IsNotExist on it, stops the retry and returns nil, forever hiding the fact that there was an error
    if os.IsNotExist(err) {
      log.Entry(ctx).Infof("Skipping sync for %s: file no longer exists", s.Image)
      return nil
    }
    
  5. the caller thinks the files are synced, but most of them are not!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification, you're right, it's not a good idea to swallow the error. Pushed commit where I wrapped not found error in Permanent error, as you suggested before, now it prints the error and continues syncing other files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, it is correct now, because it doesn't stop syncing other change sets if something happened with 1 change set.
Instead of return in case of error, it uses continue now

return nil
}

return err
}, backoff.WithContext(opts, childCtx),
)

Expand Down Expand Up @@ -232,6 +221,32 @@ func (r *SkaffoldRunner) doDev(ctx context.Context, out io.Writer) error {
return nil
}

func (r *SkaffoldRunner) getSyncHandler(
ctx, childCtx context.Context,
out io.Writer,
endTrace func(options ...trace.SpanEndOption),
) func(s *sync.Item) error {
return func(s *sync.Item) error {
fileCount := len(s.Copy) + len(s.Delete)
output.Default.Fprintf(out, "Syncing %d files for %s\n", fileCount, s.Image)
fileSyncInProgress(fileCount, s.Image)

if err := r.deployer.GetSyncer().Sync(childCtx, out, s); err != nil {
log.Entry(ctx).Warn("Skipping deploy due to sync error:", err)
fileSyncFailed(fileCount, s.Image, err)
event.DevLoopFailedInPhase(r.devIteration, constants.Sync, err)
eventV2.TaskFailed(constants.DevLoop, err)
endTrace(instrumentation.TraceEndError(err))

return err
}

fileSyncSucceeded(fileCount, s.Image)

return nil
}
}

// Dev watches for changes and runs the skaffold build, test and deploy
// config until interrupted by the user.
func (r *SkaffoldRunner) Dev(ctx context.Context, out io.Writer, artifacts []*latest.Artifact) error {
Expand Down
Loading