-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Description
Our program is uploading parquet files to stages on different goroutines. Each goroutine is uploading to a different unity volume via PUT '{{.LocalFilePath}}' INTO '{{.UnityVolumePath}}' OVERWRITE; When one goroutine fails with a rate limit error, we non deterministically encounter a panic (and therefore crash of the program) from other concurrent goroutines with message fatal error: found pointer to free object. This panic was raised by the go runtime reportZombies function. Sometimes the error is also runtime: marked free object in span 0x7e6b8843ecd8, elemsize=48 freeindex=0 (bad use of unsafe.Pointer? try -d=checkptr). These errors are non deterministic as far as we can tell.
The stack trace confirms that goroutines running when the panic occurs are all Databricks PUT executions. All the other goroutines in the program are sleeping on channels and mutexes. We are also not running enough goroutines to trigger a rate limit on the bucket or Databricks. We're running on the order of 10 to 20 goroutines worth of uploads. The rate limit seems to be due to other traffic on the same bucket, unrelated to our program.
Reproduction
This is hard to outline with test code as its pretty coupled to a lot of other business logic we have and requires the underlying bucket of the unity volume to raise a rate limit.
Expected behavior
Expected behavior is that the other goroutines should succesfully upload their files regardless of the rate limit error on one of the other goroutines.
Is it a regression?
The code running on this deployment hasn't changed in a few months, including dependencies. So it is unlikely to be a regression.
Debug Logs
Here is the original error we got about the rate limit
staging operation over HTTP was unsuccessful: 503-<?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>ServerBusy</Code><Message>Operations per second is over the account limit.\nRequestId:[redacted]\nTime:2025-06-04T19:37:48.5111742Z</Message></Error>","time":"2025-06-04T19:37:48Z","message":"databricks: failed to execute query: query \n\tPUT '/tmp/1913351475.parquet' INTO '/Volumes/finance/[redacted]/invoice_pbrszrkigdtsnob_0_temp/1913351475.parquet' OVERWRITE
it appears to be a rate limit passed through by the underlying Azure storage account backing the unity volume.
Other Information
- OS: Debian (Bullseye)
- Versions:
- go 1.23.6
github.com/databricks/databricks-sql-go v1.6.1github.com/databricks/databricks-sdk-go v0.54.0