Skip to content

Conversation

@alexcheng1982
Copy link
Contributor

@alexcheng1982 alexcheng1982 commented Sep 23, 2024

This PR fixes an issue with parallel stream used in JsonReader.

Problem

I wrote a simple program to test the JsonReader. The JSON content is very simple.

[
  {
    "name": "Alex",
    "email": "[email protected]",
    "jobTitle": "Software Engineer"
  },
  {
    "name": "Bob",
    "email": "[email protected]",
    "jobTitle": "System Admin"
  }
]

The code shown below extracts keys name and jobTitle and puts them into the Document.

public class JsonReaderSample {

  void read() {
    var metadataGenerator = new JsonMetadataGenerator() {
      @Override
      public Map<String, Object> generate(Map<String, Object> jsonMap) {
        return Map.of("email", jsonMap.getOrDefault("email", ""));
      }
    };
    var resource = new FileSystemResource(
        Path.of(".", "data", "json-array.json"));
    var reader = new JsonReader(resource, metadataGenerator, "name",
        "jobTitle");
    var docs = reader.read();
    docs.forEach(System.out::println);
  }

  public static void main(String[] args) {
    new JsonReaderSample().read();
  }
}

When running this simple program, the output may look like below. The text of keys and values from different keys are mingled (jobTitle: name: System AdminBob).

Document{id='a55153c1-09ab-4fc6-aa18-07a6f20e94d6', metadata={[email protected]}, content='name: Alex
jobTitle: Software Engineer
', media=[]}
Document{id='46b27f32-1ea3-4ca9-a005-9c93195bb335', metadata={[email protected]}, content='jobTitle: name: System AdminBob

', media=[]}

The usage of parallelStream caused the invocations of StringBuffer.append for different keys from different threads intertwined when operating on the shared StringBuffer.

Fix

The parallel processing of multiple keys seems unnecessary, since parallel processing is already enabled for documents in the JSON array, and we are dealing with only in-memory data. So I changed to use normal stream and replaced StringBuffer with StringBuilder.

Parallel stream used in JsonReader causes values from different keys mingled
@markpollack
Copy link
Member

Yikes! thanks a ton.

merged in c205c7d

@markpollack markpollack added this to the 1.0.0-M3 milestone Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants