Skip to content

Transaction.commit and Transaction.createReadStream unconditionally retry by invoke Transaction.begin() if any non-ABORTED error is returned; really it should firstly check for error better #2159

@odeke-em

Description

@odeke-em

Discovered while implementing tracing for this package that we have this code

.on('error', err => {
setSpanError(span, err as Error);
const isServiceError =
err && typeof err === 'object' && 'code' in err;
if (
!this.id &&
this._useInRunner &&
!(
isServiceError &&
(err as grpc.ServiceError).code === grpc.status.ABORTED
)
) {
this.begin();
}

which when interpreted will always invoke Transaction.begin() as long this was the transaction was not previously run aka doesn't have an id and the error returned in .commit() was not ABORTED; but really if we just tried to insert data that already exists and the gRPC backend returns 6 ALREADY_EXISTS: Failed to insert row with primary key ({pk#SingerId:100}) due to previously existing row
we shall ALWAYS try to invoke .begin() which then will again invoke .commit()

Reproducing code

'use strict';
const {
  NodeTracerProvider,
  TraceIdRatioBasedSampler,
} = require('@opentelemetry/sdk-trace-node');
const {BatchSpanProcessor} = require('@opentelemetry/sdk-trace-base');

const {
  TraceExporter,
} = require('@google-cloud/opentelemetry-cloud-trace-exporter');
const exporter = new TraceExporter();

const provider = new NodeTracerProvider({
  sampler: new TraceIdRatioBasedSampler(1.0),
});
provider.addSpanProcessor(new BatchSpanProcessor(exporter));

async function main(
  projectId = 'my-project-id',
  instanceId = 'my-instance-id',
  databaseId = 'my-project-id'
) {
  // Create the Cloud Spanner Client.
  const {Spanner} = require('@google-cloud/spanner');
  const spanner = new Spanner({
    projectId: projectId,
    observabilityOptions: {
      tracerProvider: provider,
      enableExtendedTracing: true,
    },
  });

  const instance = spanner.instance(instanceId);
  const database = instance.database(databaseId);

  const update = {
    sql: "INSERT INTO Singers(firstName, SingerId) VALUES('Foo', 100)",
  };
  database.runTransaction(async (err, transaction) => {
    if (err) {
      console.error(err);
      return;
    }
    try {
      await transaction.run(update);
    } catch (err) {
      console.error(err);
    } finally {
      await transaction.commit();
      await new Promise(resolve => {
        setTimeout(() => {
          resolve();
        }, 2000);
      });
      provider.forceFlush();
      // This sleep gives ample time for the trace
      // spans to be exported to Google Cloud Trace.
      await new Promise(resolve => {
        setTimeout(() => {
          resolve();
        }, 4000);
      });
      spanner.close();
    }
  });
}

process.on('unhandledRejection', err => {
  console.error(err.message);
  process.exitCode = 1;
});
main(...process.argv.slice(2));

and here is a trace of the problems to confirm visualization

Currently wrong

Screenshot 2024-10-13 at 6 03 43 AM

Expected

Screenshot 2024-10-13 at 6 04 23 AM

I'd classify as a P0 because this adds extra calls and latency to customer calls on the most minor inconvenience like just a typo or data existing

Suggestion

We should only invoke Transaction.begin() on errors that are retryable for example a transient error or a timeout or server unknown internal errors, deadline_exceeded, status_unavailable, resource_exhaused

Kindly /cc-ing @surbhigarg92 @alkatrivedi

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: spannerIssues related to the googleapis/nodejs-spanner API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions