Skip to content

Conversation

@itsankit-google
Copy link
Contributor

@itsankit-google itsankit-google commented Sep 17, 2024

Program Failure Exception Class: reference

Tested in CDAP Sandbox:

GCS Batch SInk:

2024-09-27 08:46:57,006 - ERROR [SparkRunner-phase-1:i.c.c.i.a.r.ProgramControllerServiceAdapter@98] - Spark Program 'phase-1' failed.
java.util.concurrent.ExecutionException: io.cdap.cdap.api.exception.ProgramFailureException: xxxxx.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).
	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
	at io.cdap.cdap.app.runtime.spark.submit.AbstractSparkJobFuture.get(AbstractSparkJobFuture.java:119)
	at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.run(SparkRuntimeService.java:444)
	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52)
	at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.lambda$null$2(SparkRuntimeService.java:525)
	at java.lang.Thread.run(Thread.java:750)
Caused by: io.cdap.cdap.api.exception.ProgramFailureException: xxxxx.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on resource (or it may not exist).
	at io.cdap.cdap.api.exception.ProgramFailureException$Builder.build(ProgramFailureException.java:186)
	at io.cdap.cdap.api.exception.ErrorUtils.getProgramFailureException(ErrorUtils.java:161)
	at io.cdap.plugin.gcp.common.ExceptionUtils.getProgramFailureException(ExceptionUtils.java:135)
	at io.cdap.plugin.gcp.common.ExceptionUtils.getProgramFailureException(ExceptionUtils.java:150)
	at io.cdap.plugin.gcp.common.ExceptionUtils.invokeWithProgramFailureAndInterruptionHandling(ExceptionUtils.java:64)
	at io.cdap.plugin.gcp.gcs.sink.ForwardingOutputFormat.checkOutputSpecs(ForwardingOutputFormat.java:73)
	at io.cdap.cdap.etl.batch.DelegatingOutputFormat.checkOutputSpecs(DelegatingOutputFormat.java:46)
	at org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.assertConf(SparkHadoopWriter.scala:403)
	at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:71)
	at org.apache.spark.rdd.PairRDDFunctions.$anonfun$saveAsNewAPIHadoopDataset$1(PairRDDFunctions.scala:1078)

GCS Multi Sink:

2024-09-27 10:41:32,695 - ERROR [SparkRunner-phase-1:i.c.c.i.a.r.ProgramControllerServiceAdapter@98] - Spark Program 'phase-1' failed.
java.util.concurrent.ExecutionException: io.cdap.cdap.api.exception.ProgramFailureException: 403 Forbidden
POST https://storage.googleapis.com/upload/storage/v1/b/cdf_example/o?ifGenerationMatch=0&uploadType=multipart
{
  "error": {
    "code": 403,
    "message": "xxxxx.iam.gserviceaccount.com does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist).",
    "errors": [
      {
        "message": "xxxxxx.iam.gserviceaccount.com does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist).",
        "domain": "global",
        "reason": "forbidden"
      }
    ]
  }
}

	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
	at io.cdap.cdap.app.runtime.spark.submit.AbstractSparkJobFuture.get(AbstractSparkJobFuture.java:119)
	at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.run(SparkRuntimeService.java:444)
	at com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:52)
	at io.cdap.cdap.app.runtime.spark.SparkRuntimeService.lambda$null$2(SparkRuntimeService.java:525)
	at java.lang.Thread.run(Thread.java:750)
Caused by: io.cdap.cdap.api.exception.ProgramFailureException: 403 Forbidden
POST https://storage.googleapis.com/upload/storage/v1/b/cdf_example/o?ifGenerationMatch=0&uploadType=multipart
{
  "error": {
    "code": 403,
    "message": "xxxxx.iam.gserviceaccount.com does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist).",
    "errors": [
      {
        "message": "xxxxx.iam.gserviceaccount.com does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist).",
        "domain": "global",
        "reason": "forbidden"
      }
    ]
  }
}

	at io.cdap.cdap.api.exception.ProgramFailureException$Builder.build(ProgramFailureException.java:186)
	at io.cdap.cdap.api.exception.ErrorUtils.getProgramFailureException(ErrorUtils.java:161)
	at io.cdap.plugin.gcp.common.ExceptionUtils.getProgramFailureException(ExceptionUtils.java:139)
	at io.cdap.plugin.gcp.common.ExceptionUtils.getProgramFailureException(ExceptionUtils.java:153)
	at io.cdap.plugin.gcp.common.ExceptionUtils.invokeWithProgramFailureHandling(ExceptionUtils.java:70)
	at io.cdap.plugin.gcp.gcs.sink.ForwardingOutputCommitter.setupJob(ForwardingOutputCommitter.java:45)
	at io.cdap.cdap.etl.spark.io.TrackingOutputCommitter.setupJob(TrackingOutputCommitter.java:40)

@itsankit-google itsankit-google self-assigned this Sep 17, 2024
@itsankit-google itsankit-google added the build Trigger unit test build label Sep 17, 2024
@itsankit-google itsankit-google force-pushed the PLUGIN-1807 branch 3 times, most recently from c98a86b to f1294a6 Compare September 17, 2024 14:09
@itsankit-google itsankit-google force-pushed the PLUGIN-1807 branch 7 times, most recently from 10c9cf1 to 123832f Compare September 27, 2024 11:32
@itsankit-google itsankit-google changed the title [PLUGIN-1807] Implement Program Failure Exception Handling in GCS plugins to catch known errors [PLUGIN-1807] Implement Program Failure Exception Handling in GCS sink plugins to catch known errors Sep 27, 2024
Copy link
Contributor

@tivv tivv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

* @param <T> the return type of the function
*/
@FunctionalInterface
public interface IOFunction<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider using Callable / Runnable instead of Function / Operation. This terms are more used in Java

}

// Helper method for handling both IOException and InterruptedException
public static void invokeWithProgramFailureAndInterruptionHandling(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may be able to go with less boilerplate by using generic exceptions: https://www.mscharhag.com/java/java-exceptions-and-generic-types

@itsankit-google
Copy link
Contributor Author

Closing the PR based on offline discussion with @albertshau & @tivv.

New approach will be addressed in #1452.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Trigger unit test build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants