Skip to content

Latest commit

 

History

History
488 lines (386 loc) · 21.3 KB

File metadata and controls

488 lines (386 loc) · 21.3 KB

Configure Authentication Mechanism for GCP

Authentication Mechanism for the Control Plane

This is the manual auth configuration for the Control Plane. Refer to the installation guide for automated scripts.

For both methods (Workload Identity and Kubernetes Secret), the first manual step is creating a Google Cloud Service Account with the appropriate permissions needed for the control plane to manage native GCP resources.

You need to create a new Google Cloud Service Account named cloud-run-events with the following command:

gcloud iam service-accounts create cloud-run-events

Then, give that Google Cloud Service Account permissions on your project. The actual permissions needed will depend on the resources you are planning to use. The Table below enumerates such permissions:

Resource / Functionality Roles
CloudPubSubSource roles/pubsub.editor
CloudStorageSource roles/storage.admin
CloudSchedulerSource roles/cloudscheduler.admin
CloudAuditLogsSource roles/pubsub.admin, roles/logging.configWriter, roles/logging.privateLogViewer
CloudBuildSource roles/pubsub.subscriber
Channel roles/pubsub.editor
PullSubscription roles/pubsub.editor
Topic roles/pubsub.editor

In this guide, and for the sake of simplicity, we will just grant roles/owner privileges to the Google Cloud Service Account, which encompasses all of the above plus some other permissions. Note that if you prefer finer-grained privileges, you can just grant the ones described in the Table. Also, you can refer to managing multiple projects in case you want your Google Cloud Service Account to manage multiple projects.

gcloud projects add-iam-policy-binding $PROJECT_ID \
--member=serviceAccount:cloud-run-events@$PROJECT_ID.iam.gserviceaccount.com \
--role roles/owner

Option 1 (Recommended): Workload Identity

  1. Enable Workload Identity.

    1. Ensure that you have enabled the Cloud IAM Service Account Credentials API.

      gcloud services enable iamcredentials.googleapis.com
    2. If you didn't enable Workload Identity when you created your cluster, run the following commands:

      Modify the cluster to enable Workload Identity first:

      gcloud container clusters update $CLUSTER_NAME \
        --workload-pool=$PROJECT_ID.svc.id.goog

      Enable GKE_METADATA for your node pool(s).

       pools=$(gcloud container node-pools list --cluster=${CLUSTER_NAME} --format="value(name)")
       while read -r pool_name
       do
         gcloud container node-pools update "${pool_name}" \
           --cluster=${CLUSTER_NAME} \
           --workload-metadata=GKE_METADATA
       done <<<"${pools}"

      NOTE: These commands may take a long time to finish. Check Enable Workload Identity on an existing cluster for more information.

  2. Create a Kubernetes Service Account ksa-name in the namespace your resources will reside. If you are configuring Authentication Mechanism for the Control Plane, you can skip this step, and directly use Kubernetes Service Account controller which is already in the Control Plane namespace cloud-run-events

    kubectl create serviceaccount ksa-name -n ksa-namespace
  3. Bind the Kubernetes Service Account ksa-name with Google Cloud Service Account.

    MEMBER=serviceAccount:$PROJECT_ID.svc.id.goog[ksa-namespace/ksa-name]
    
    gcloud iam service-accounts add-iam-policy-binding \
    --role roles/iam.workloadIdentityUser \
    --member $MEMBER gsa-name@$PROJECT_ID.iam.gserviceaccount.com

    If you are configuring Authentication Mechanism for the Control Plane, you can replace ksa-namespace with cloud-run-events, ksa-name with controller, and gsa-name with cloud-run-events

  4. Annotate the Kubernetes Service Account ksa-name.

    kubectl annotate serviceaccount ksa-name iam.gke.io/gcp-service-account=gsa-name@$PROJECT_ID.iam.gserviceaccount.com \
    --namespace ksa-namespace

    If you are configuring Authentication Mechanism for the Control Plane, you can replace ksa-namespace with cloud-run-events, ksa-name with controller, and gsa-name with cloud-run-events

Option 2: Kubernetes Secrets

  1. Download a new JSON private key for that Service Account. Be sure not to check this key into source control!

    gcloud iam service-accounts keys create cloud-run-events.json \
    --iam-account=cloud-run-events@$PROJECT_ID.iam.gserviceaccount.com
  2. Create a Secret on the Kubernetes cluster in the cloud-run-events namespace with the downloaded key:

    kubectl --namespace cloud-run-events create secret generic google-cloud-key --from-file=key.json=cloud-run-events.json

    Note that google-cloud-key and key.json are default values expected by our control plane.

Authentication Mechanism for the Data Plane

Please check Installing a Service Account for the Data Plane.

Troubleshooting

Workload Identity

To ensure Workload Identity is correctly configured for a Google Cloud Service Account (GSA), and a Kubernetes Service Account (KSA), you need to check:

  1. If the Google Cloud Service Account has the desired iam-policy-binding.

    By running the following command:

    gcloud iam service-accounts get-iam-policy gsa-name@$PROJECT_ID.iam.gserviceaccount.com

    if the iam-policy-binding is correct, you'll find an iam-policy-binding similar to:

     bindings:
     - members:
       - serviceAccount:$PROJECT_ID.svc.id.goog[ksa-namespace/ksa-name]
       role: roles/iam.workloadIdentityUser
  2. If the Kubernetes Service Account has the desired annotation.

    By running the following command:

    kubectl get serviceaccount ksa-name -n ksa-namespace -o yaml

    if the annotation exists, you'll find an annotation similar to:

    metadata:
      annotations:
        iam.gke.io/gcp-service-account: gsa-name@$PROJECT_ID.iam.gserviceaccount.com

Kubernetes Secrets

To ensure kubernetes Secrets correctly are configured in a namespace, you need to check if a Kubernetes Secret with the correct name is in the namespace.

By default, this secret is named google-cloud-key, but it could be set to something else either in the custom object itself (e.g. the Source's spec.secret) or in the config-gcp-auth ConfigMap. Verify that a secret with the correct name is in the same namespace as the custom object by running the following command:

kubectl get secret -n namespace

Common Issues

  1. Resources are not READY

    If a resource instance is not READY due to an authentication configuration problem, you are likely to get permission related error for User not authorized to perform this action (for Workload Identity, the error might look like IAM returned 403 Forbidden: The caller does not have permission).

    This error indicates that the Control Plane may not configure authentication properly. You can find it in a resource instance by:

    kubectl describe resource-kind resource-instance-name -n resource-instance-namespace

    Here is a detailed example checking a CloudAuditlogsSource test in namespace default:

    After running the following command, you will find a couple of Conditions under this CloudAuditLogsSource's Status.

    kubectl describe cloudauditlogssource test -n default
    • If this CloudAuditLogsSource failed due to a missing Kubernetes Secret (for Kubernetes Secret authentication configuration), or a missing Kubernetes Service Account annotation (for Workload Identity authentication configuration), the Condition Ready would look like:

      Type:     Ready
      Status:   False
      Message:  Failed to reconcile Pub/Sub topic: rpc error: code = PermissionDenied desc = User not authorized to perform this action.
      Reason:   TopicReconcileFailed
    • If this CloudAuditLogsSource failed due to the JSON private key stored in the Kubernetes Secret having been revoked (for Kubernetes Secret authentication configuration), the Condition Ready would look like:

      Type:     Ready
      Status:   False
      Message:  Failed to reconcile Pub/Sub topic: rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: 400 Bad Request
                Response: {"error":"invalid_grant","error_description":"Invalid JWT Signature."}
      Reason:   TopicReconcileFailed
    • If this CloudAuditLogsSource failed due to the GSA having insufficient permissions (for either Kubernetes Secret authentication configuration or Workload Identity authentication configuration), the Condition Ready would look like:

      Type:     Ready
      Status:   False
      Message:  Failed to ensure creation of logging sink: rpc error: code = PermissionDenied desc = The caller does not have permission
      Reason:   SinkCreateFailed
    • If this CloudAuditLogsSource failed due to the iam-policy-binding being setup incorrectly on the Google Service Account (for Workload Identity authentication configuration), the Condition Ready would look like:

      Type:     Ready
      Status:   False
      Message:  Failed to reconcile Pub/Sub topic: rpc error: code = Unauthenticated desc = transport: compute: Received 403 `
                unable to generate token; IAM returned 403 Forbidden: The caller does not have permission
                his error could be caused by a missing IAM policy binding on the target IAM service account.
      Reason:   TopicReconcileFailed

      To solve this issue, you can:

    • Check the Google Cloud Service Account cloud-run-events for the Control Plane has all required permissions.

    • Check authentication configuration is correct for the Control Plane.

      • If you are using Workload Identity for the Control Plane, refer here to check the Google Cloud Service Account cloud-run-events, and the Kubernetes Service Account controller in namespace cloud-run-events.
      • If you are using Kubernetes Secret for the Control Plane, refer here to check the Kubernetes Secret google-cloud-key in namespace cloud-run-events.

    Note: For Kubernetes Secret, if the JSON private key no longer exists under your Google Cloud Service Account cloud-run-events. Then, even the Google Cloud Service Account cloud-run-events has all required permissions, and the corresponding Kubernetes Secret google-cloud-key is in namespace cloud-run-events, you still get permission related error. To such case, you have to re-download the JSON private key and re-create the Kubernetes Secret, refer here for instructions.

  2. Resources are READY, but can't receive Events

    Sometimes, a resource instance is READY, but it can't receive Events. It might be an authentication configuration problem, if the underlying Deployment doesn't have minimum availability.

    Typically, the name of an underlying Deployment for a resource instance could be a truncated version of (prefix)-(resource-instance-name)-(uid).

    1. If the resource instance is a Source, the prefix is cre-src.
    2. If the resource instance is a Channel, the prefix is cre-chan.
    3. If the resource instance is a Pullsubscription, the prefix is cre-ps

    You can use the following command to check if the underlying Deployment's available pod is zero.

    kubectl get deployment -n resource-instance-namespace

    Here is a detailed example checking a CloudAuditlogsSource test's underlying Deployment in namespace default:

    After running the following command, you will find a Deployment named as cre-src-cloudauditlogssource-te(UID).

    kubectl get deployment -n default
    • If this Deployment doesn't have minimum availability due to a missing Kubernetes Secret (for Kubernetes Secret authentication configuration), the Pod which belongs to this Deployment (Pod's name will have the same prefix cre-src-cloudauditlogssource- as the Deployment's name) will block at ContainerCreating status. Using kubectl describe pod pod-name -n namespace, you can find a Warning Event like this:

      Warning  FailedMount  27s (x2 over 2m40s)   kubelet, gke-knative-1-default-pool-73d30583-c1hv
      Unable to mount volumes for pod "cre-src-cloudauditlogssource-tef1a55bfca8e1d5620e0c4199495mwk6k_default(f35883bd-ee66-4497-94fa-05196a6043fe)":
      timeout expired waiting for volumes to attach or mount for pod "default"/"cre-src-cloudauditlogssource-tef1a55bfca8e1d5620e0c4199495mwk6k".
      list of unmounted volumes=[google-cloud-key]. list of unattached volumes=[google-cloud-key default-token-dd9cd]
    • If this Deployment doesn't have minimum availability due to the JSON private key stored in the Kubernetes Secret having been revoked (for Kubernetes Secret authentication configuration), the Pod which belongs to this Deployment (Pod's name will have the same prefix cre-src-cloudauditlogssource- as the Deployment's name) will block at Error status. Using kubectl log pod-name -n namespace, you can find a Log containing information like this:

      "msg":"failed to start adapter: ",
      "commit":"9e8388f",
      "error":"unable to create subscription \"cre-src_default_cloudauditlogssource-test_a2ca50b7-c951-4682-b8f0-e902a449dbb1\",
      rpc error: code = Unauthenticated desc = transport: oauth2: cannot fetch token: 400 Bad Request\nResponse: {\"error\":\"invalid_grant\",\"error_description\":\"Invalid JWT Signature.\"}"
    • If this Deployment doesn't have minimum availability due to the GSA having insufficient permissions (for either Kubernetes Secret authentication configuration or Workload Identity authentication configuration), the Pod which belongs to this Deployment (Pod's name will have the same prefix cre-src-cloudauditlogssource- as the Deployment's name) will block at Error status. Using kubectl log pod-name -n namespace, you can find a Log containing information like this:

      "msg":"failed to start adapter: ",
      "commit":"9e8388f",
      "error":"unable to create subscription \"cre-src_default_cloudauditlogssource-test_663a7f95-8b82-46aa-85b8-8fee4e65b26f\",
      rpc error: code = PermissionDenied desc = User not authorized to perform this action.
    • If this Deployment doesn't have minimum availability due to a missing Kubernetes Service Account (for Workload Identity authentication configuration), the Deployment can't create any Pod. Using kubectl describe deployment-name -n namespace, you can find a Condition ReplicaFailure under Status like this:

      type: ReplicaFailure
      status: "True"
      reason: FailedCreate
      message: 'pods "cre-src-cloudauditlogssource-tebc146e4349a11efaf078c20f8ab65089-6b6b57997d-"
        is forbidden: error looking up service account default/test1: serviceaccount
        "test1" not found'
    • If this Deployment doesn't have minimum availability due to a missing Kubernetes Service Account annotation (for Workload Identity authentication configuration), the Pod which belongs to this Deployment (Pod's name will have the same prefix cre-src-cloudauditlogssource- as the Deployment's name) will block at Error status. Using kubectl log pod-name -n namespace, you can find a Log containing information like this:

      "msg":"failed to start adapter: ",
      "commit":"9e8388f",
      "error":"unable to create subscription \"cre-src_default_cloudauditlogssource-test_663a7f95-8b82-46aa-85b8-8fee4e65b26f\",
      rpc error: code = PermissionDenied desc = User not authorized to perform this action.
    • If this Deployment doesn't have minimum availability due to the iam-policy-binding being setup incorrectly (for Workload Identity authentication configuration), the Pod which belongs to this Deployment (Pod's name will have the same prefix cre-src-cloudauditlogssource- as the Deployment's name) will block at Error status. Using kubectl log pod-name -n namespace, you can find a Log containing information like this:

      "msg":"failed to start adapter: ",
      "commit":"9e8388f",
      "error":"unable to create subscription \"cre-src_default_cloudauditlogssource-test_4daeacfe-6155-4f69-8d52-64745fa0591d\",
      rpc error: code = Unauthenticated desc = transport: compute: Received 403 `\nUnable to generate token; IAM returned 403 Forbidden:
      The caller does not have permission\n\nThis error could be caused by a missing IAM policy binding on the target IAM service account.

      To solve this issue, you can:

    • Check the Google Cloud Service Account cre-dataplane for the Data Plane has all required permissions.

    • Check authentication configuration is correct for this resource instance.

      • If you are using Workload Identity for this resource instance, refer here to check the Google Cloud Service Account cre-dataplane, and the Kubernetes Service Account in the namespace where this resource instance resides.
      • If you are using Kubernetes Secrets for this resource instance, refer here to check the Kubernetes Secret in namespace where this resource instance resides.

    Note: For Workload Identity, there is a known issue #759 for credential sync delay (~1 min) in resources' underlying components. You'll probably encounter this issue, if you immediately send Events after you finish Workload Identity setup for a resource instance.

  3. Resources are not READY, due to WorkloadIdentityReconcileFailed

    This error only exists when you use default scenario for Workload Identity. You can find detailed failure message by:

    kubectl describe resource-kind resource-instance-name -n resource-instance-namespace

    To solve this issue, you can:

    • Make sure the workloadIdentityMapping under default-auth-config in ConfigMap config-gcp-auth is correct (a correct Kubernetes Service Account paired with a correct Google Cloud Service Account).
    • If the Condition Ready has permission related error message like this:
      type: Ready
      status: "False"
      message: 'rpc error: code = PermissionDenied desc = Permission iam.serviceAccounts.setIamPolicy
        is required to perform this operation on service account projects/-/serviceAccounts/cre-dataplane@PROJECT_ID.iam.gserviceaccount.com.'
      reason: WorkloadIdentityFailed
      it is most likely that you didn't grant iam.serviceAccountAdmin permission of the Google Cloud Service Account to the Control Plane's Google Cloud Service Account cloud-run-events, refer to default scenario to grant permission.
    • If the Condition Ready has concurrency related error message like this:
      type: Ready
      status: "False"
      message: 'adding iam policy binding failed with: failed to set iam policy: googleapi: Error 409: There were concurrent policy changes.
        Please retry the whole read-modify-write with exponential backoff'
      reason: WorkloadIdentityFailed
      the controller will retry it in the next reconciliation loop (the maximum retry period is 5 min). You can also use non-default scenario if this error lasts for a long time.