Skip to content

Conversation

architch
Copy link

@architch architch commented Jun 23, 2025

Description

This change optimises byte array to hex string conversion
Current implementation uses string formatting which needs to parse the format for every byte. This takes a lot of time and CPU.
New implementation uses bit operations.

How did the Spark Cassandra Connector Work or Not Work Before this Patch

Below code demonstrates the time taken by the two implementations (>600ms to <20ms for 1Million bytearray)

//create random byte array of 1 Million bytes
val byteArray = (1 to 1000000).map(x => ((x%128). * ( if x%2 == 0 then -1 else 1)).toByte).toArray

// Time taken by current code ~ 800ms
var start = System.currentTimeMillis();
var hexString = "0x" + byteArray.map("%02x" format _).mkString
var end = System.currentTimeMillis()
print("time taken : " + (end - start)) // time taken : 854


//Time taken by new code ~ 15 ms 
start = System.currentTimeMillis();
hexString = byteArrayToHexString(byteArray)
end = System.currentTimeMillis()
print("time taken : " + (end - start)) // time taken : 14

General Design of the patch

Why pursue this particular fix?
We found lots of threads busy/occupied parsing the format while converting the byteArray to String in our application.

Fixes: CASSANALYTICS-68

How Has This Been Tested?

Unit Tests.

Checklist:

  • I have a ticket in the JIRA
  • I have performed a self-review of my own code
  • Locally all tests pass (make sure tests fail without your patch)

@architch architch changed the title optimise byte array to hex string conversion CASSANALYTICS-68: Optimise byte array to hex string conversion Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant