Skip to content

Commit be73c6c

Browse files
ToreMerkelyclaude
andcommitted
feat(cloudrun): friendly auth and not-found error messages (#4986)
Slice 6 of the snapshot cloud-run feature. cloudrun.Classify maps gRPC Unauthenticated / PermissionDenied / NotFound responses into actionable messages and preserves the underlying error via %w; other codes pass through. The Unauthenticated message names all three credential sources (env var, gcloud command, GCE/GKE metadata server or Workload Identity) because the production deployment is a cluster cron job, not a local-dev gcloud session. Classification lives in internal/cloudrun (the package owns GCP knowledge) but is applied at the command boundary, not inside Client.ListServices. Doing it inside the Client would double-wrap real-call errors when the command also classified them, and it would bypass the stubbed test path entirely. Calling Classify once at the command boundary covers both real and stub error sources. cloudrun.New now wraps construction errors with a generic "GCP client setup failed" prefix; the cluster case rarely fails here, and the SDK message is preserved for diagnosis. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 20076e2 commit be73c6c

7 files changed

Lines changed: 135 additions & 7 deletions

File tree

cmd/kosli/snapshotCloudRun.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ func (o *snapshotCloudRunOptions) run(args []string) error {
9292
}
9393
services, err := client.ListServices(ctx, o.project, o.region)
9494
if err != nil {
95-
return err
95+
return cloudrun.Classify(err, o.project, o.region)
9696
}
9797

9898
filtered := services[:0]

cmd/kosli/snapshotCloudRun_test.go

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ import (
99
"github.com/kosli-dev/cli/internal/cloudrun"
1010
"github.com/stretchr/testify/require"
1111
"github.com/stretchr/testify/suite"
12+
"google.golang.org/grpc/codes"
13+
"google.golang.org/grpc/status"
1214
)
1315

1416
type stubCloudRunLister struct {
@@ -170,6 +172,22 @@ func (suite *SnapshotCloudRunTestSuite) TestSnapshotCloudRunFilter_ExcludeRegex(
170172
require.Contains(suite.T(), out, `"service_name": "beta"`)
171173
}
172174

175+
// TestSnapshotCloudRunCmd_UnauthenticatedReturnsFriendlyError verifies that a
176+
// gRPC Unauthenticated error from GCP surfaces as the actionable ADC message
177+
// rather than a raw SDK string.
178+
func (suite *SnapshotCloudRunTestSuite) TestSnapshotCloudRunCmd_UnauthenticatedReturnsFriendlyError() {
179+
newCloudRunClient = func(_ context.Context) (cloudRunLister, error) {
180+
return stubCloudRunLister{err: status.Error(codes.Unauthenticated, "token expired")}, nil
181+
}
182+
183+
cmd := fmt.Sprintf(`snapshot cloud-run %s --project p --region r %s`, suite.envName, suite.defaultKosliArguments)
184+
_, combined, _, _, err := executeCommandC(cmd)
185+
186+
require.Error(suite.T(), err)
187+
require.Contains(suite.T(), combined, "GCP authentication failed")
188+
require.Contains(suite.T(), combined, "metadata server")
189+
}
190+
173191
func TestSnapshotCloudRunCommandTestSuite(t *testing.T) {
174192
suite.Run(t, new(SnapshotCloudRunTestSuite))
175193
}

docs/handover/2026-04-28-4986-google-cloud-run-1.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ GCP test environment (provisioned, ADC already configured on this machine):
3636
- Digest extraction follows the ECS pattern (`internal/aws/aws.go:670-693`): use a `@sha256:` substring if present, else leave the digest empty rather than calling Artifact Registry. Registry-lookup mode (analogous to Azure's `--digests-source acr`) is deferred until customers ask for it.
3737
- Wire payload mirrors `EcsEnvRequest` plus `project` and `region` fields per artifact (so each row is self-describing). Endpoint name is `report/cloud-run` (kebab-case, parallels `report/azure-apps`). Server-side endpoint does not yet exist; the forced dry-run means no network call is made.
3838
- Command depends on a local `cloudRunLister` interface and a package-level `newCloudRunClient` variable so tests can substitute a stub without touching ADC. The seam stays in `cmd/kosli/snapshotCloudRun.go` rather than being exposed from `internal/cloudrun` — keeps the public package surface minimal.
39+
- Error classification (`Classify`) lives in `internal/cloudrun` (GCP knowledge belongs to the package) but is *applied* at the command layer, not inside `Client.ListServices`. Why: applying it inside `ListServices` would double-wrap real-call errors when the command also classified them, and bypass the friendly path entirely for stub-driven tests. Calling it once at the command boundary covers both real and stub error sources.
3940

4041
---
4142

@@ -47,6 +48,6 @@ Slice plan (each slice is a separate, independently-mergeable branch):
4748
- [x] **Slice 2:** Internal `internal/cloudrun` package — wraps `cloud.google.com/go/run/apiv2` to list services in project+region; unit-tested with a fake. Done 2026-04-28: `Client.ListServices` returns `Service{Name, URI, Revisions}` with one `Revision{Name, Digests, CreatedAt}` per traffic-configured revision (any percent including 0%, with `LATEST` resolved via `LatestReadyRevision` and dupes removed). Digest extraction mirrors the ECS fallback (`@sha256:` parse, else empty string). 9 unit tests passing.
4849
- [x] **Slice 3:** End-to-end happy path — wire the package into `RunE`, build the snapshot payload, POST to the server `cloud-run` endpoint (still dry-run only). Done 2026-04-28: command now calls `cloudrun.New` + `ListServices`, builds an `EnvRequest` via `ToEnvRequest(services, project, region)`, and submits PUT `report/cloud-run` via `kosliClient.Do` (dry-run forced, so no network call leaves the client). Tested against the real `hello-world-cli-demo` GCP project — emits a digest-pinned artifact for the running `hello-world` service.
4950
- [x] **Slice 4:** Filtering flags — `--services`, `--services-regex`, `--exclude`, `--exclude-regex`. Done 2026-04-28: backed by `filters.ResourceFilterOptions` (same struct ECS uses); 4 mutex pairs validated in `PreRunE`. Filter is applied in the command after `cloudrun.ListServices` returns — services excluded by name still cost their revision-fetch round-trips. If that becomes a bottleneck, push the filter into `cloudrun.ListServices` so excluded services skip the per-revision API calls.
50-
- [ ] **Slice 5:** Multi-revision / traffic splitting — handle services with multiple active revisions and services with no active revisions.
51-
- [ ] **Slice 6:** Auth error UX — clear messages for ADC / `GOOGLE_APPLICATION_CREDENTIALS` failures and for missing project/region.
51+
- [x] **Slice 5:** ~~Multi-revision / traffic splitting — handle services with multiple active revisions and services with no active revisions.~~ Dropped 2026-04-28: multi-revision (traffic splitting), `LATEST` resolution, and dedup were all completed in Slice 2; the only remaining edge case (services with no active revisions emitting a placeholder artifact) was deferred until the server-side wire contract is defined, since picking the format unilaterally now risks rework. Re-open as a small slice once the server contract lands.
52+
- [x] **Slice 6:** Auth error UX — clear messages for ADC / `GOOGLE_APPLICATION_CREDENTIALS` failures and for missing project/region. Done 2026-04-28: `cloudrun.Classify(err, project, region)` maps gRPC `Unauthenticated` → ADC advice, `PermissionDenied``roles/run.viewer` advice, `NotFound` → "project or region not found"; other codes pass through. Auth message names all three credential sources (env var, `gcloud auth application-default login`, GCE/GKE metadata server / Workload Identity) since the production deployment is a GKE cron job. `cloudrun.New(ctx)` errors get a generic "GCP client setup failed" prefix.
5253
- [ ] **Slice 7:** Unhide the command, lift the forced dry-run, update CLI reference docs and examples.

go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ require (
4747
golang.org/x/oauth2 v0.36.0
4848
golang.org/x/term v0.42.0
4949
google.golang.org/api v0.274.0
50+
google.golang.org/grpc v1.80.0
5051
google.golang.org/protobuf v1.36.11
5152
gopkg.in/yaml.v3 v3.0.1
5253
k8s.io/api v0.35.0
@@ -231,7 +232,6 @@ require (
231232
google.golang.org/genproto v0.0.0-20260319201613-d00831a3d3e7 // indirect
232233
google.golang.org/genproto/googleapis/api v0.0.0-20260401024825-9d38bb4040a9 // indirect
233234
google.golang.org/genproto/googleapis/rpc v0.0.0-20260401024825-9d38bb4040a9 // indirect
234-
google.golang.org/grpc v1.80.0 // indirect
235235
gopkg.in/evanphx/json-patch.v4 v4.13.0 // indirect
236236
gopkg.in/inf.v0 v0.9.1 // indirect
237237
gopkg.in/warnings.v0 v0.1.2 // indirect

internal/cloudrun/cloudrun.go

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,15 +46,18 @@ type Client struct {
4646
}
4747

4848
// New returns a Client backed by the real Cloud Run Admin API v2 using
49-
// Application Default Credentials.
49+
// Application Default Credentials. Construction errors (typically rare in a
50+
// cluster cron job, since the metadata server provides credentials) are
51+
// wrapped with a generic "GCP client setup failed" prefix; the SDK's own
52+
// message is preserved via %w for diagnosis.
5053
func New(ctx context.Context) (*Client, error) {
5154
services, err := run.NewServicesClient(ctx)
5255
if err != nil {
53-
return nil, fmt.Errorf("creating Cloud Run services client: %w", err)
56+
return nil, fmt.Errorf("GCP client setup failed: %w", err)
5457
}
5558
revisions, err := run.NewRevisionsClient(ctx)
5659
if err != nil {
57-
return nil, fmt.Errorf("creating Cloud Run revisions client: %w", err)
60+
return nil, fmt.Errorf("GCP client setup failed: %w", err)
5861
}
5962
return &Client{api: &gcpAPI{services: services, revisions: revisions}}, nil
6063
}

internal/cloudrun/errors.go

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
package cloudrun
2+
3+
import (
4+
"fmt"
5+
6+
"google.golang.org/grpc/codes"
7+
"google.golang.org/grpc/status"
8+
)
9+
10+
// Classify wraps a Cloud Run SDK error with a user-actionable message based on
11+
// the gRPC status code. Errors without a recognised code (or non-gRPC errors)
12+
// pass through unchanged so callers can still inspect them.
13+
//
14+
// project and region are interpolated into messages where they help the user
15+
// localise the failure (e.g. NotFound on a misspelled project).
16+
func Classify(err error, project, region string) error {
17+
if err == nil {
18+
return nil
19+
}
20+
s, ok := status.FromError(err)
21+
if !ok {
22+
return err
23+
}
24+
switch s.Code() {
25+
case codes.Unauthenticated:
26+
return fmt.Errorf(
27+
"GCP authentication failed: ensure Application Default Credentials are available "+
28+
"(GOOGLE_APPLICATION_CREDENTIALS, 'gcloud auth application-default login', "+
29+
"or GCE/GKE metadata server / Workload Identity) (underlying error: %w)",
30+
err,
31+
)
32+
case codes.PermissionDenied:
33+
return fmt.Errorf(
34+
"GCP permission denied: the caller needs 'roles/run.viewer' on project %q (underlying error: %w)",
35+
project, err,
36+
)
37+
case codes.NotFound:
38+
return fmt.Errorf(
39+
"GCP project %q or region %q not found or not accessible (underlying error: %w)",
40+
project, region, err,
41+
)
42+
default:
43+
return err
44+
}
45+
}

internal/cloudrun/errors_test.go

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
package cloudrun
2+
3+
import (
4+
"errors"
5+
"testing"
6+
7+
"github.com/stretchr/testify/require"
8+
"google.golang.org/grpc/codes"
9+
"google.golang.org/grpc/status"
10+
)
11+
12+
func TestClassify_NilStaysNil(t *testing.T) {
13+
require.NoError(t, Classify(nil, "p", "r"))
14+
}
15+
16+
func TestClassify_NonGRPCErrorPassesThrough(t *testing.T) {
17+
original := errors.New("some plain go error")
18+
got := Classify(original, "p", "r")
19+
require.Same(t, original, got)
20+
}
21+
22+
func TestClassify_UnknownGRPCCodePassesThrough(t *testing.T) {
23+
original := status.Error(codes.ResourceExhausted, "rate limited")
24+
got := Classify(original, "p", "r")
25+
require.Same(t, original, got)
26+
}
27+
28+
func TestClassify_UnauthenticatedReturnsADCAdvice(t *testing.T) {
29+
original := status.Error(codes.Unauthenticated, "token expired")
30+
got := Classify(original, "proj-1", "europe-west1")
31+
32+
require.Error(t, got)
33+
require.Contains(t, got.Error(), "GCP authentication failed")
34+
require.Contains(t, got.Error(), "GOOGLE_APPLICATION_CREDENTIALS")
35+
require.Contains(t, got.Error(), "gcloud auth application-default login")
36+
require.Contains(t, got.Error(), "metadata server")
37+
require.Contains(t, got.Error(), "Workload Identity")
38+
require.ErrorIs(t, got, original, "underlying error must be preserved via %%w")
39+
}
40+
41+
func TestClassify_PermissionDeniedNamesProjectAndRoleViewer(t *testing.T) {
42+
original := status.Error(codes.PermissionDenied, "missing iam role")
43+
got := Classify(original, "proj-1", "europe-west1")
44+
45+
require.Error(t, got)
46+
require.Contains(t, got.Error(), "GCP permission denied")
47+
require.Contains(t, got.Error(), "roles/run.viewer")
48+
require.Contains(t, got.Error(), `"proj-1"`)
49+
require.ErrorIs(t, got, original)
50+
}
51+
52+
func TestClassify_NotFoundNamesProjectAndRegion(t *testing.T) {
53+
original := status.Error(codes.NotFound, "no such resource")
54+
got := Classify(original, "bad-project", "europe-west1")
55+
56+
require.Error(t, got)
57+
require.Contains(t, got.Error(), "not found or not accessible")
58+
require.Contains(t, got.Error(), `"bad-project"`)
59+
require.Contains(t, got.Error(), `"europe-west1"`)
60+
require.ErrorIs(t, got, original)
61+
}

0 commit comments

Comments
 (0)