Skip to content

Commit e28da7b

Browse files
Add approaches for Parallel Letter Frequency (#2863)
1 parent 3369915 commit e28da7b

File tree

6 files changed

+323
-0
lines changed

6 files changed

+323
-0
lines changed
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
{
2+
"introduction": {
3+
"authors": [
4+
"masiljangajji"
5+
]
6+
},
7+
"approaches": [
8+
{
9+
"uuid": "dee2a79d-3e64-4220-b99f-55667549c12c",
10+
"slug": "fork-join",
11+
"title": "Fork/Join",
12+
"blurb": "Parallel Computation Using Fork/Join",
13+
"authors": [
14+
"masiljangajji"
15+
]
16+
},
17+
{
18+
"uuid": "75e9e93b-4da4-4474-8b6e-3c0cb9b3a9bb",
19+
"slug": "parallel-stream",
20+
"title": "Parallel Stream",
21+
"blurb": "Parallel Computation Using Parallel Stream",
22+
"authors": [
23+
"masiljangajji"
24+
]
25+
}
26+
]
27+
}
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# `Fork/Join`
2+
3+
```java
4+
import java.util.Map;
5+
import java.util.List;
6+
import java.util.concurrent.ConcurrentMap;
7+
import java.util.concurrent.ConcurrentHashMap;
8+
import java.util.concurrent.ForkJoinPool;
9+
import java.util.concurrent.RecursiveTask;
10+
11+
class ParallelLetterFrequency {
12+
13+
List<String> texts;
14+
ConcurrentMap<Character, Integer> letterCount;
15+
16+
ParallelLetterFrequency(String[] texts) {
17+
this.texts = List.of(texts);
18+
letterCount = new ConcurrentHashMap<>();
19+
}
20+
21+
Map<Character, Integer> countLetters() {
22+
if (texts.isEmpty()) {
23+
return letterCount;
24+
}
25+
26+
ForkJoinPool forkJoinPool = new ForkJoinPool();
27+
forkJoinPool.invoke(new LetterCountTask(texts, 0, texts.size(), letterCount));
28+
forkJoinPool.shutdown();
29+
30+
return letterCount;
31+
}
32+
33+
private static class LetterCountTask extends RecursiveTask<Void> {
34+
private static final int THRESHOLD = 10;
35+
private final List<String> texts;
36+
private final int start;
37+
private final int end;
38+
private final ConcurrentMap<Character, Integer> letterCount;
39+
40+
LetterCountTask(List<String> texts, int start, int end, ConcurrentMap<Character, Integer> letterCount) {
41+
this.texts = texts;
42+
this.start = start;
43+
this.end = end;
44+
this.letterCount = letterCount;
45+
}
46+
47+
@Override
48+
protected Void compute() {
49+
if (end - start <= THRESHOLD) {
50+
for (int i = start; i < end; i++) {
51+
for (char c : texts.get(i).toLowerCase().toCharArray()) {
52+
if (Character.isAlphabetic(c)) {
53+
letterCount.merge(c, 1, Integer::sum);
54+
}
55+
}
56+
}
57+
} else {
58+
int mid = (start + end) / 2;
59+
LetterCountTask leftTask = new LetterCountTask(texts, start, mid, letterCount);
60+
LetterCountTask rightTask = new LetterCountTask(texts, mid, end, letterCount);
61+
invokeAll(leftTask, rightTask);
62+
}
63+
return null;
64+
}
65+
}
66+
}
67+
```
68+
69+
Using [`ConcurrentHashMap`][ConcurrentHashMap] ensures that frequency counting and updates are safely handled in a parallel environment.
70+
71+
If there are no strings, a validation step prevents unnecessary processing.
72+
73+
A [`ForkJoinPool`][ForkJoinPool] is then created.
74+
The core of [`ForkJoinPool`][ForkJoinPool] is the Fork/Join mechanism, which divides tasks into smaller units and processes them in parallel.
75+
76+
`THRESHOLD` is the criterion for task division.
77+
If the range of texts exceeds the `THRESHOLD`, the task is divided into two subtasks, and [`invokeAll(leftTask, rightTask)`][invokeAll] is called to execute both tasks in parallel.
78+
Each subtask in `LetterCountTask` will continue calling `compute()` (via `invokeAll(leftTask, rightTask)`) to divide itself further until the range is smaller than or equal to the `THRESHOLD`.
79+
For tasks that are within the `THRESHOLD`, letter frequency is calculated.
80+
81+
The [`Character.isAlphabetic`][isAlphabetic] method identifies all characters classified as alphabetic in Unicode, covering characters from various languages like English, Korean, Japanese, Chinese, etc., returning `true`.
82+
Non-alphabetic characters, including numbers, special characters, and spaces, return `false`.
83+
84+
Additionally, since uppercase and lowercase letters are treated as the same character (e.g., `A` and `a`), each character is converted to lowercase.
85+
86+
After updating letter frequencies, the final map is returned.
87+
88+
[ConcurrentHashMap]: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html
89+
[ForkJoinPool]: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html
90+
[isAlphabetic]: https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isAlphabetic-int-
91+
[invokeAll]: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ExecutorService.html
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
for (int i = start; i < end; i++) {
2+
for (char c : texts.get(i).toLowerCase().toCharArray()) {
3+
if (Character.isAlphabetic(c)) {
4+
letterCount.merge(c, 1, Integer::sum);
5+
}
6+
}
7+
}
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Introduction
2+
3+
There are multiple ways to solve the Parallel Letter Frequency problem.
4+
One approach is to use [`Stream.parallelStream`][stream], and another involves using [`ForkJoinPool`][ForkJoinPool].
5+
6+
## General guidance
7+
8+
To count occurrences of items, a map data structure is often used, though arrays and lists can work as well.
9+
A [`map`][map], being a key-value pair structure, is suitable for recording frequency by incrementing the value for each key.
10+
If the data being counted has a limited range (e.g., `Characters` or `Integers`), an `int[] array` or [`List<Integer>`][list] can be used to record frequencies.
11+
12+
Parallel processing typically takes place in a multi-[`thread`][thread] environment.
13+
The Java 8 [`stream`][stream] API provides methods that make parallel processing easier, including the [`parallelStream()`][stream] method.
14+
With [`parallelStream()`][stream], developers can use the [`ForkJoinPool`][ForkJoinPool] model for workload division and parallel execution, without the need to manually manage threads or create custom thread pools.
15+
16+
The [`ForkJoinPool`][ForkJoinPool] class, optimized for dividing and managing tasks, makes parallel processing efficient.
17+
However, [`parallelStream()`][stream] uses the common [`ForkJoinPool`][ForkJoinPool] by default, meaning multiple [`parallelStream`][stream] instances share the same thread pool unless configured otherwise.
18+
19+
As a result, parallel streams may interfere with each other when sharing this thread pool, potentially affecting performance.
20+
Although this doesn’t directly impact solving the Parallel Letter Frequency problem, it may introduce issues when thread pool sharing causes conflicts in other applications.
21+
Therefore, a custom [`ForkJoinPool`][ForkJoinPool] approach is also provided below.
22+
23+
## Approach: `parallelStream`
24+
25+
```java
26+
import java.util.Map;
27+
import java.util.List;
28+
import java.util.concurrent.ConcurrentMap;
29+
import java.util.concurrent.ConcurrentHashMap;
30+
31+
class ParallelLetterFrequency {
32+
33+
List<String> texts;
34+
ConcurrentMap<Character, Integer> letterCount;
35+
36+
ParallelLetterFrequency(String[] texts) {
37+
this.texts = List.of(texts);
38+
letterCount = new ConcurrentHashMap<>();
39+
}
40+
41+
Map<Character, Integer> countLetters() {
42+
if (!letterCount.isEmpty() || texts.isEmpty()) {
43+
return letterCount;
44+
}
45+
texts.parallelStream().forEach(text -> {
46+
for (char c: text.toLowerCase().toCharArray()) {
47+
if (Character.isAlphabetic(c)) {
48+
letterCount.merge(c, 1, Integer::sum);
49+
}
50+
}
51+
});
52+
return letterCount;
53+
}
54+
55+
}
56+
```
57+
58+
For more information, check the [`parallelStream` approach][approach-parallel-stream].
59+
60+
## Approach: `Fork/Join`
61+
62+
```java
63+
import java.util.Map;
64+
import java.util.List;
65+
import java.util.concurrent.ConcurrentMap;
66+
import java.util.concurrent.ConcurrentHashMap;
67+
import java.util.concurrent.ForkJoinPool;
68+
import java.util.concurrent.RecursiveTask;
69+
70+
class ParallelLetterFrequency {
71+
72+
List<String> texts;
73+
ConcurrentMap<Character, Integer> letterCount;
74+
75+
ParallelLetterFrequency(String[] texts) {
76+
this.texts = List.of(texts);
77+
letterCount = new ConcurrentHashMap<>();
78+
}
79+
80+
Map<Character, Integer> countLetters() {
81+
if (!letterCount.isEmpty() || texts.isEmpty()) {
82+
return letterCount;
83+
}
84+
85+
ForkJoinPool forkJoinPool = new ForkJoinPool();
86+
forkJoinPool.invoke(new LetterCountTask(texts, 0, texts.size(), letterCount));
87+
forkJoinPool.shutdown();
88+
89+
return letterCount;
90+
}
91+
92+
private static class LetterCountTask extends RecursiveTask<Void> {
93+
private static final int THRESHOLD = 10;
94+
private final List<String> texts;
95+
private final int start;
96+
private final int end;
97+
private final ConcurrentMap<Character, Integer> letterCount;
98+
99+
LetterCountTask(List<String> texts, int start, int end, ConcurrentMap<Character, Integer> letterCount) {
100+
this.texts = texts;
101+
this.start = start;
102+
this.end = end;
103+
this.letterCount = letterCount;
104+
}
105+
106+
@Override
107+
protected Void compute() {
108+
if (end - start <= THRESHOLD) {
109+
for (int i = start; i < end; i++) {
110+
for (char c : texts.get(i).toLowerCase().toCharArray()) {
111+
if (Character.isAlphabetic(c)) {
112+
letterCount.merge(c, 1, Integer::sum);
113+
}
114+
}
115+
}
116+
} else {
117+
int mid = (start + end) / 2;
118+
LetterCountTask leftTask = new LetterCountTask(texts, start, mid, letterCount);
119+
LetterCountTask rightTask = new LetterCountTask(texts, mid, end, letterCount);
120+
invokeAll(leftTask, rightTask);
121+
}
122+
return null;
123+
}
124+
}
125+
}
126+
127+
```
128+
129+
For more information, check the [`fork/join` approach][approach-fork-join].
130+
131+
## Which approach to use?
132+
133+
When tasks are simple or do not require a dedicated thread pool (such as in this case), the [`parallelStream`][stream] approach is recommended.
134+
However, if the work is complex or there is a need to isolate thread pools from other concurrent tasks, the [`ForkJoinPool`][ForkJoinPool] approach is preferable.
135+
136+
[thread]: https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html
137+
[stream]: https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
138+
[ForkJoinPool]: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html
139+
[map]: https://docs.oracle.com/javase/8/docs/api/?java/util/Map.html
140+
[list]: https://docs.oracle.com/javase/8/docs/api/?java/util/List.html
141+
[approach-parallel-stream]: https://exercism.org/tracks/java/exercises/parallel-letter-frequency/approaches/parallel-stream
142+
[approach-fork-join]: https://exercism.org/tracks/java/exercises/parallel-letter-frequency/approaches/fork-join
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# `parallelStream`
2+
3+
```java
4+
import java.util.Map;
5+
import java.util.List;
6+
import java.util.concurrent.ConcurrentMap;
7+
import java.util.concurrent.ConcurrentHashMap;
8+
9+
class ParallelLetterFrequency {
10+
11+
List<String> texts;
12+
ConcurrentMap<Character, Integer> letterCount;
13+
14+
ParallelLetterFrequency(String[] texts) {
15+
this.texts = List.of(texts);
16+
letterCount = new ConcurrentHashMap<>();
17+
}
18+
19+
Map<Character, Integer> countLetters() {
20+
if (texts.isEmpty()) {
21+
return letterCount;
22+
}
23+
texts.parallelStream().forEach(text -> {
24+
for (char c: text.toLowerCase().toCharArray()) {
25+
if (Character.isAlphabetic(c)) {
26+
letterCount.merge(c, 1, Integer::sum);
27+
}
28+
}
29+
});
30+
return letterCount;
31+
}
32+
33+
}
34+
```
35+
36+
Using [`ConcurrentHashMap`][ConcurrentHashMap] ensures that frequency counting and updates are safely handled in a parallel environment.
37+
38+
If there are no strings to process, a validation step avoids unnecessary computation.
39+
40+
To calculate letter frequency, a parallel stream is used.
41+
The [`Character.isAlphabetic`][isAlphabetic] method identifies all characters classified as alphabetic in Unicode, covering characters from various languages like English, Korean, Japanese, Chinese, etc., returning `true`.
42+
Non-alphabetic characters, including numbers, special characters, and spaces, return `false`.
43+
44+
Since we treat uppercase and lowercase letters as the same character (e.g., `A` and `a`), characters are converted to lowercase.
45+
46+
After updating letter frequencies, the final map is returned.
47+
48+
[ConcurrentHashMap]: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html
49+
[isAlphabetic]: https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isAlphabetic-int-
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
texts.parallelStream().forEach(text -> {
2+
for (char c: text.toLowerCase().toCharArray()) {
3+
if (Character.isAlphabetic(c)) {
4+
letterCount.merge(c, 1, Integer::sum);
5+
}
6+
}
7+
});

0 commit comments

Comments
 (0)