(feature) Multithreading and batches in export#91
(feature) Multithreading and batches in export#91DaniilSmirnov wants to merge 1 commit intoeroshenkoam:mainfrom
Conversation
irdkwmnsb
left a comment
There was a problem hiding this comment.
Since all the items are already being stored in memory - using a List<CompletableFuture>> pattern doesn't impact performance that much - we use the same amount of memory, just in a shorter timespan. Although if there is a lot of testRefIds or testSummaries or testAttachments - there will be a lot of lambdas on the heap. ThreadPoolExecutor's task queue should be bounded to a reasonable number: https://github.com/eroshenkoam/xcresults/pull/91/changes#diff-d989955202a14b833a38506d59b6fd65123f68c0fe4f1062994cdd4253b34ea7R85
| description = "Thread count for export processor", | ||
| defaultValue = "1" | ||
| ) | ||
| protected String threadCount; |
There was a problem hiding this comment.
This should probably be an Integer, instead of a String
| final List<JsonNode> summaries = new ArrayList<>(); | ||
| summaries.addAll(getAttributeValues(entry.getKey(), ACTIVITY_SUMMARIES)); | ||
| summaries.addAll(getAttributeValues(entry.getKey(), FAILURE_SUMMARIES)); | ||
| summaries.forEach(summary -> { |
There was a problem hiding this comment.
No reason to create a summaries list. You can instead concat two streams and then do forEach on that stream.
| if (attachmentSources.containsKey(name)) { | ||
| final List<String> sources = attachmentSources.get(name); | ||
| sources.forEach(source -> attachmentsRefs.putIfAbsent(source, ref)); | ||
| } |
There was a problem hiding this comment.
This can be replaced with attachmentSources.computeIfPresent
| }, customThreadPool); | ||
| testFutures.add(future); | ||
| } | ||
| CompletableFuture.allOf(testFutures.toArray(new CompletableFuture[0])).join(); |
There was a problem hiding this comment.
I'm not sure if allOf is anyhow better than testFutures.forEach(fut -> fut.join()), while the latter is simpler to read
| private void exportAttachmentsInBatches(Map<String, String> attachmentsRefs) { | ||
| List<Map.Entry<String, String>> entries = new ArrayList<>(attachmentsRefs.entrySet()); | ||
| if (entries.isEmpty()) return; | ||
|
|
||
| int batchSize = Math.max(1, Math.min(entries.size() / (threadCount * 2), 100)); | ||
| List<List<Map.Entry<String, String>>> batches = partition(entries, batchSize); | ||
|
|
||
| List<CompletableFuture<Void>> batchFutures = new ArrayList<>(); | ||
| for (List<Map.Entry<String, String>> batch : batches) { | ||
| CompletableFuture<Void> future = CompletableFuture.runAsync(() -> { | ||
| for (Map.Entry<String, String> entry : batch) { | ||
| try { | ||
| Path attachmentPath = outputPath.resolve(entry.getKey()); | ||
| exportReference(entry.getValue(), attachmentPath); | ||
| } catch (Exception e) { | ||
| System.err.printf("Failed to export attachment %s: %s%n", entry.getKey(), e.getMessage()); | ||
| } | ||
| } | ||
| }, customThreadPool); | ||
| batchFutures.add(future); | ||
| } | ||
| CompletableFuture.allOf(batchFutures.toArray(new CompletableFuture[0])).join(); | ||
| } |
There was a problem hiding this comment.
I dont see a reason to extract this to a different method. It's only littering the namespace
| exitCode, heicPath); | ||
| } | ||
| } catch (IOException e) { | ||
| System.err.printf("IO error converting HEIC to JPEG %s: %s%n", heicPath, e.getMessage()); |
There was a problem hiding this comment.
It's a bad idea to supress the error and not throw it up the chain. The callers of this method might expect the jpeg to exist, if the function exited successfully.
Based on idea from https://github.com/eroshenkoam/xcresults/pull/82/changes
I add cli option for thread count (with default value - 1) and add batches and multithreading in report generation
As a result it shows 30-50% speed up depends on report size and threads count
In attachment you can see diagram with difference between versions and their speed