Skip to content

[Bug]: Datastore's Read fails Pipeline validation and throws on DataflowRunner #28034

@rafaelsms

Description

@rafaelsms

What happened?

Hello!

Sorry if this is a duplicate. To be honest, I don't know much about Apache Beam and Dataflow, so I am still learning and might be doing something wrong, let me know :)

I attempted to create a somewhat minimal example below. The exception is thrown by a pipeline that just reads an entry from Datastore when running on DataflowRunner. It works and succeeds when using DirectRunner.

The validation code was added in the PR #26675, throwing at this line.

Example code:

    private static Read buildQueryTransform(Options options, String kind, String column3Value) {
        Query.Builder queryBuilder = Query.newBuilder();
        queryBuilder.addKindBuilder().setName(kind);
        queryBuilder.addProjection(makeProjection(COLUMN_NAME_1));
        queryBuilder.addProjection(makeProjection(COLUMN_NAME_2));
        queryBuilder.setFilter(makeFilter(COLUMN_NAME_3, Operator.EQUAL, makeValue(column3Value)));
        Query query = queryBuilder.build();
        return DatastoreIO.v1().read().withProjectId(options.getDataset()).withQuery(query);
    }

    private static Projection.Builder makeProjection(String propertyName) {
        Projection.Builder prjBuilder = Projection.newBuilder();
        prjBuilder.setProperty(makePropertyReference(propertyName));
        return prjBuilder;
    }

    public static void main(String[] args) {
        // ...

        Pipeline pipeline = Pipeline.create(options);

        // ...

        Read reader = buildQueryTransform(options, sourceKind, value);
        pipeline.apply("Read " + value + " from " + sourceKind, reader);

        // ...

        PipelineResult result = pipeline.run(); // throws IllegalArgumentException

        // ...
    }

Exception thrown:

SEVERE: Unexpected error in Dataflow job
java.lang.IllegalArgumentException: Transform Read-[column 3 value]-from-[datastore kind name]-Create-Values-Read-CreateSource--Impulse is not a composite transform but does not have a specified URN. outputs {
  key: "org.apache.beam.sdk.values.PCollection.<init>:397#bb20b45fd4d95138"
  value: "Read [column 3 value] from [datastore kind name]/Create.Values/Read(CreateSource)/Impulse.out"
}
unique_name: "Read [column 3 value] from [datastore kind name]/Create.Values/Read(CreateSource)/Impulse"

        at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
        at org.apache.beam.runners.core.construction.graph.PipelineValidator.validateTransform(PipelineValidator.java:219)
        at org.apache.beam.runners.core.construction.graph.PipelineValidator.validateComponents(PipelineValidator.java:121)
        at org.apache.beam.runners.core.construction.graph.PipelineValidator.validate(PipelineValidator.java:101)
        at org.apache.beam.runners.core.construction.PipelineTranslation.toProto(PipelineTranslation.java:106)
        at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:1122)
        at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:198)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:321)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:307)
        at [our main class].main(MainDataflowJob.java:120)

Other log related to this unique name:

<record>
  <date>2023-08-16T13:56:55</date>
  <millis>1692205015473</millis>
  <sequence>278</sequence>
  <logger>org.apache.beam.sdk.runners.TransformHierarchy</logger>
  <level>FINE</level>
  <class>org.apache.beam.sdk.runners.TransformHierarchy$Node</class>
  <method>visit</method>
  <thread>1</thread>
  <message>Visiting primitive node Node{fullName=Read [column 3 value] from [datastore kind name]/Create.Values/Read(CreateSource)/Impulse, transform=Impulse}</message>
</record>
<record>
  <date>2023-08-16T13:56:55</date>
  <millis>1692205015473</millis>
  <sequence>279</sequence>
  <logger>org.apache.beam.sdk.runners.TransformHierarchy</logger>
  <level>FINE</level>
  <class>org.apache.beam.sdk.runners.TransformHierarchy$Node</class>
  <method>visit</method>
  <thread>1</thread>
  <message>Visiting output value Read [column 3 value] from [datastore kind name]/Create.Values/Read(CreateSource)/Impulse.out [PCollection@1917161212]</message>
</record>

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions