-
-
Notifications
You must be signed in to change notification settings - Fork 738
Add approaches for Parallel Letter Frequency #2863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
a0ca392
b84ea23
4007c38
616bcc2
1f0531d
28a8ebc
49512f7
42bee45
faf10e0
72b1485
d876b63
9ed7813
01cbae2
fbb7525
ba55162
e624a68
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,8 +2,6 @@ | |
|
|
||
| <!-- Your content goes here: --> | ||
|
|
||
|
|
||
|
|
||
| <!-- DO NOT EDIT BELOW THIS LINE! --> | ||
| --- | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| { | ||
| "introduction": { | ||
| "authors": [ | ||
| "masiljangajji" | ||
| ] | ||
| }, | ||
| "approaches": [ | ||
| { | ||
| "uuid": "dee2a79d-3e64-4220-b99f-55667549c12c", | ||
| "slug": "fork-join", | ||
| "title": "Fork/Join", | ||
| "blurb": "Parallel Computation Using Fork/Join", | ||
| "authors": [ | ||
| "masiljangajji" | ||
| ] | ||
| }, | ||
| { | ||
| "uuid": "75e9e93b-4da4-4474-8b6e-3c0cb9b3a9bb", | ||
| "slug": "parallel-stream", | ||
| "title": "Parallel Stream", | ||
| "blurb": "Parallel Computation Using Parallel Stream", | ||
| "authors": [ | ||
| "masiljangajji" | ||
| ] | ||
| } | ||
| ] | ||
| } |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,94 @@ | ||||||
| # `Fork/Join` | ||||||
|
|
||||||
| ```java | ||||||
| import java.util.Map; | ||||||
| import java.util.List; | ||||||
| import java.util.concurrent.ConcurrentMap; | ||||||
| import java.util.concurrent.ConcurrentHashMap; | ||||||
| import java.util.concurrent.ForkJoinPool; | ||||||
| import java.util.concurrent.RecursiveTask; | ||||||
|
|
||||||
| class ParallelLetterFrequency { | ||||||
|
|
||||||
| List<String> texts; | ||||||
| ConcurrentMap<Character, Integer> letterCount; | ||||||
|
|
||||||
| ParallelLetterFrequency(String[] texts) { | ||||||
| this.texts = List.of(texts); | ||||||
| letterCount = new ConcurrentHashMap<>(); | ||||||
| } | ||||||
|
|
||||||
| Map<Character, Integer> countLetters() { | ||||||
| if (texts.isEmpty()) { | ||||||
| return letterCount; | ||||||
| } | ||||||
|
|
||||||
| ForkJoinPool forkJoinPool = new ForkJoinPool(); | ||||||
| forkJoinPool.invoke(new LetterCountTask(texts, 0, texts.size(), letterCount)); | ||||||
| forkJoinPool.shutdown(); | ||||||
|
|
||||||
| return letterCount; | ||||||
| } | ||||||
|
|
||||||
| private static class LetterCountTask extends RecursiveTask<Void> { | ||||||
| private static final int THRESHOLD = 10; | ||||||
| private final List<String> texts; | ||||||
| private final int start; | ||||||
| private final int end; | ||||||
| private final ConcurrentMap<Character, Integer> letterCount; | ||||||
|
|
||||||
| LetterCountTask(List<String> texts, int start, int end, ConcurrentMap<Character, Integer> letterCount) { | ||||||
| this.texts = texts; | ||||||
| this.start = start; | ||||||
| this.end = end; | ||||||
| this.letterCount = letterCount; | ||||||
| } | ||||||
|
|
||||||
| @Override | ||||||
| protected Void compute() { | ||||||
| if (end - start <= THRESHOLD) { | ||||||
| for (int i = start; i < end; i++) { | ||||||
| for (char c : texts.get(i).toLowerCase().toCharArray()) { | ||||||
| if (Character.isAlphabetic(c)) { | ||||||
| letterCount.merge(c, 1, Integer::sum); | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| } else { | ||||||
| int mid = (start + end) / 2; | ||||||
| LetterCountTask leftTask = new LetterCountTask(texts, start, mid, letterCount); | ||||||
| LetterCountTask rightTask = new LetterCountTask(texts, mid, end, letterCount); | ||||||
| invokeAll(leftTask, rightTask); | ||||||
| } | ||||||
| return null; | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| Using [`ConcurrentHashMap`][ConcurrentHashMap] ensures that frequency counting and updates are safely handled in a parallel environment. | ||||||
|
|
||||||
| If there are no strings, a validation step prevents unnecessary processing. | ||||||
|
|
||||||
| A [`ForkJoinPool`][ForkJoinPool] is then created. | ||||||
|
|
||||||
| The core of [`ForkJoinPool`][ForkJoinPool] is the Fork/Join mechanism, which divides tasks into smaller units and processes them in parallel. | ||||||
|
|
||||||
| THRESHOLD is the criterion for task division. | ||||||
|
||||||
| THRESHOLD is the criterion for task division. | |
| `THRESHOLD` is the criterion for task division. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If the range of texts exceeds the THRESHOLD, the task is divided into two subtasks, and [`invokeAll`][invokeAll](leftTask, rightTask) is called to execute both tasks in parallel. | |
| If the range of texts exceeds the `THRESHOLD`, the task is divided into two subtasks, and [`invokeAll(leftTask, rightTask)`][invokeAll] is called to execute both tasks in parallel. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Each subtask in LetterCountTask will continue calling compute() to divide itself further until the range is smaller than or equal to the threshold. | |
| Each subtask in `LetterCountTask` will continue calling `compute()` to divide itself further until the range is smaller than or equal to the threshold. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Additionally, since uppercase and lowercase letters are treated as the same character (e.g., A and a), each character is converted to lowercase. | |
| Additionally, since uppercase and lowercase letters are treated as the same character (e.g., `A` and `a`), each character is converted to lowercase. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the links consistent, could you use the Java 8 Javadocs for the invokeAll link?
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| for (int i = start; i < end; i++) { | ||
| for (char c : texts.get(i).toLowerCase().toCharArray()) { | ||
| if (Character.isAlphabetic(c)) { | ||
| letterCount.merge(c, 1, Integer::sum); | ||
| } | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,150 @@ | ||||||
| # Introduction | ||||||
|
|
||||||
| There are multiple ways to solve the Parallel Letter Frequency problem. | ||||||
|
|
||||||
|
||||||
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| With parallelStream(), developers can use the ForkJoinPool model for workload division and parallel execution, without the need to manually manage threads or create custom thread pools. | |
| With `parallelStream()`, developers can use the ForkJoinPool model for workload division and parallel execution, without the need to manually manage threads or create custom thread pools. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| However, parallelStream() uses the common ForkJoinPool by default, meaning multiple parallelStream instances share the same thread pool unless configured otherwise. | |
| However, `parallelStream()` uses the common `ForkJoinPool` by default, meaning multiple `parallelStream` instances share the same thread pool unless configured otherwise. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Therefore, a custom ForkJoinPool approach is also provided below. | |
| Therefore, a custom `ForkJoinPool` approach is also provided below. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| When tasks are simple or do not require a dedicated thread pool (such as in this case), the parallelStream approach is recommended. | |
| When tasks are simple or do not require a dedicated thread pool (such as in this case), the `parallelStream` approach is recommended. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| However, if the work is complex or there is a need to isolate thread pools from other concurrent tasks, the ForkJoinPool approach is preferable. | |
| However, if the work is complex or there is a need to isolate thread pools from other concurrent tasks, the `ForkJoinPool` approach is preferable. |
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,49 @@ | ||||||||
| # `parallelStream` | ||||||||
|
|
||||||||
| ```java | ||||||||
| import java.util.Map; | ||||||||
| import java.util.List; | ||||||||
| import java.util.concurrent.ConcurrentMap; | ||||||||
| import java.util.concurrent.ConcurrentHashMap; | ||||||||
|
|
||||||||
| class ParallelLetterFrequency { | ||||||||
|
|
||||||||
| List<String> texts; | ||||||||
| ConcurrentMap<Character, Integer> letterCount; | ||||||||
|
|
||||||||
| ParallelLetterFrequency(String[] texts) { | ||||||||
| this.texts = List.of(texts); | ||||||||
| letterCount = new ConcurrentHashMap<>(); | ||||||||
| } | ||||||||
|
|
||||||||
| Map<Character, Integer> countLetters() { | ||||||||
| if (texts.isEmpty()) { | ||||||||
| return letterCount; | ||||||||
| } | ||||||||
| texts.parallelStream().forEach(text -> { | ||||||||
| for (char c: text.toLowerCase().toCharArray()) { | ||||||||
| if (Character.isAlphabetic(c)) { | ||||||||
| letterCount.merge(c, 1, Integer::sum); | ||||||||
| } | ||||||||
| } | ||||||||
| }); | ||||||||
| return letterCount; | ||||||||
| } | ||||||||
|
|
||||||||
| } | ||||||||
| ``` | ||||||||
|
|
||||||||
| Using [`ConcurrentHashMap`][ConcurrentHashMap] ensures that frequency counting and updates are safely handled in a parallel environment. | ||||||||
|
|
||||||||
| If there are no strings to process, a validation step avoids unnecessary computation. | ||||||||
|
|
||||||||
| To calculate letter frequency, a parallel stream is used. | ||||||||
|
|
||||||||
| The [`Character.isAlphabetic`][isAlphabetic] method identifies all characters classified as alphabetic in Unicode, covering characters from various languages like English, Korean, Japanese, Chinese, etc., returning true. Non-alphabetic characters, including numbers, special characters, and spaces, return false. | ||||||||
|
||||||||
| The [`Character.isAlphabetic`][isAlphabetic] method identifies all characters classified as alphabetic in Unicode, covering characters from various languages like English, Korean, Japanese, Chinese, etc., returning true. Non-alphabetic characters, including numbers, special characters, and spaces, return false. | |
| The [`Character.isAlphabetic`][isAlphabetic] method identifies all characters classified as alphabetic in Unicode, covering characters from various languages like English, Korean, Japanese, Chinese, etc., returning `true`. | |
| Non-alphabetic characters, including numbers, special characters, and spaces, return `false`. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Since we treat uppercase and lowercase letters as the same character (e.g., A and a), characters are converted to lowercase. | |
| Since we treat uppercase and lowercase letters as the same character (e.g., `A` and `a`), characters are converted to lowercase. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| texts.parallelStream().forEach(text -> { | ||
| for (char c: text.toLowerCase().toCharArray()) { | ||
| if (Character.isAlphabetic(c)) { | ||
| letterCount.merge(c, 1, Integer::sum); | ||
| } | ||
| } | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this related to the PR. Could you please undo this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the following error will occur.
Should we go ahead and cancel it anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we should. I've just tried running our Markdown Lint action on our main branch (before your change). I think the rules are a bit different with that one.