Skip to content

estimate: Use the default GOPATH to cache downloads#304

Open
rhansen wants to merge 1 commit intoDebian:masterfrom
rhansen:estimate-gopath
Open

estimate: Use the default GOPATH to cache downloads#304
rhansen wants to merge 1 commit intoDebian:masterfrom
rhansen:estimate-gopath

Conversation

@rhansen
Copy link
Contributor

@rhansen rhansen commented Feb 23, 2026

This avoids re-downloading the same modules (potentially hundreds of megabytes for moderately sized binaries) over and over.

This makes progressSize not useful, so go get's stdout and stderr are passed through and -x is passed to go get to get some progress indication.

Copy link
Contributor

@n-peugnet n-peugnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks alright at first glance, but I am not sure if this is a good idea or not. The advantage is that subsequent runs will be faster, but it does not clean behind it.
Also, I am not sure what is the de-duplication behaviour of go get when another version of a library is already downloaded.
I'm afraid this would only make the process faster when running estimate for the same package multiple times, at the cost of using disk space.

@ottok
Copy link
Contributor

ottok commented Feb 23, 2026

In what scenario is the re-downloading happening that this optimizes?

This avoids re-downloading the same modules (potentially hundreds of
megabytes for moderately sized binaries) over and over.  The downside
is wasted disk space that the user must manually clear (`go clean
-modcache`).

Why keeping the downloaded modules is believed to be more useful than
automatically deleting them:

  * It makes offline use possible.
  * Packaging a complex module with lots of unpackaged dependencies
    means a long slow process of chipping away at the dependencies.
    It would be nice to be able to re-run `dh-make-golang estimate`
    several times to track progress and plan next steps without
    incurring a huge download cost.
  * Developing `dh-make-golang estimate` itself involves re-running it
    over and over to test changes.  The interesting modules to test
    are the complex ones, and those take a while to download.
  * I believe that `golang-*-dev` package maintainers are already
    accustomed to running `go clean -modcache`, an occasional
    necessity with serious Go development.
  * The modules don't actually need to be fully downloaded and
    unpacked; only their `go.mod` files are necessary to construct a
    dependency graph.  Changing `dh-make-golang estimate` in a future
    commit to only download just `go.mod` would address disk usage
    concerns.  It would also mean that re-downloading over and over
    would be less of a problem, but keeping the `go.mod` files avoids
    abusing proxy.golang.org and sum.golang.org.

This change makes `progressSize` not useful, so `go get`'s stdout and
stderr are passed through, and `-x` is passed to `go get` to get some
progress indication.
@rhansen
Copy link
Contributor Author

rhansen commented Feb 24, 2026

In what scenario is the re-downloading happening that this optimizes?

Two scenarios that affect me:

  • Packaging a complex module with lots of unpackaged dependencies. It takes a while to chip away at the numerous dependencies, and it would be nice to be able to re-run dh-make-golang estimate multiple times to see progress and plan next steps.
  • Developing dh-make-golang itself. I have a handful of changes in the works to improve dh-make-golang (I'll discuss with the mailing before I open any PRs for substantial changes), and testing dh-make-golang with a complex package (github.com/juanfont/headscale) after every change means that every test run downloads the same ~175MiB.

and maybe a third scenario on occasion:

  • Offline use (especially on an airplane).

The modules don't actually need to be fully downloaded and unpacked, only their go.mod files are needed to construct a dependency graph. I have a change in the works that does just that. Keeping only the go.mod files mostly addresses the disk usage concern. Downloading just the go.mod also means that it wouldn't be as big of a deal to re-download them over and over, but keeping them avoids abusing proxy.golang.org and sum.golang.org.

(I'll update the commit message to include the above rationale.)

The advantage is that subsequent runs will be faster, but it does not clean behind it.

This is a problem with Go in general. I expect most Debian Go maintainers are also Go developers, and are already used to running go clean -modcache occasionally, so keeping the downloaded modules doesn't seem like a big deal to me. But maybe I'm wrong. As a compromise, this could be controlled by a command-line flag, but that adds complexity. WDYT?

I am not sure what is the de-duplication behaviour of go get when another version of a library is already downloaded.

Both versions are saved and unpacked in the module cache (no deduplication).

@n-peugnet
Copy link
Contributor

This is a problem with Go in general. I expect most Debian Go maintainers are also Go developers, and are already used to running go clean -modcache occasionally, so keeping the downloaded modules doesn't seem like a big deal to me. But maybe I'm wrong. As a compromise, this could be controlled by a command-line flag, but that adds complexity. WDYT?

After thinking about it a bit more, I think your current approach is alright. Maybe it simply needs an explanation in the man page and/or --help message of the estimate command that explains that one has to run go clean -modcache to clean the downloaded modules. Or maybe even as a log line on each run (this is maybe too much 😅).

@n-peugnet
Copy link
Contributor

n-peugnet commented Feb 26, 2026

In fact, after testing it a bit, I find the -x option to add too much noise to the output, and even printing Stderr by default to be too much IMO. I would instead add a single log line like this, at the begining of the download:

Downloading modules... (run "go clean -modcache" to clear downloaded modules)

And keep the get() function mostly as it was (printing Stderr only on error), simply removing the progress part, as it does not make sense anymore.

As the cache is reused, we can expect the downloads to be usually faster anyway.


So it would look something like this:

diff --git a/estimate.go b/estimate.go
index 81cc655..cf0ce74 100644
--- a/estimate.go
+++ b/estimate.go
@@ -70,12 +70,16 @@ func get(repodir, repo, rev string) error {
        if rev != "" {
                packages += "@" + rev
        }
-       cmd := exec.Command("go", "get", "-x", "-t", packages)
+       cmd := exec.Command("go", "get", "-t", packages)
 
+       out := bytes.Buffer{}
        cmd.Dir = repodir
-       cmd.Stderr = os.Stderr
-       cmd.Stdout = os.Stdout
-       return cmd.Run()
+       cmd.Stderr = &out
+       err := cmd.Run()
+       if err != nil {
+               fmt.Fprint(os.Stderr, out.String())
+       }
+       return err
 }
 
 // getModuleDir returns the path of the directory containing a module for the given repository dir.
@@ -204,6 +208,8 @@ func estimate(importpath, revision string) error {
                return fmt.Errorf("create dummymod: %w", err)
        }
 
+       log.Println(`Downloading modules... (run "go clean -modcache" to clear downloaded modules)`)
+
        if err := get(repodir, importpath, revision); err != nil {
                return fmt.Errorf("go get: %w", err)
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants