Skip to content

Commit 27db8c2

Browse files
committed
feat(algorithms, dynamic programming): word break puzzle using dp
1 parent c0348e1 commit 27db8c2

16 files changed

+460
-0
lines changed
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# Word Break
2+
3+
You are given a string, s, and an array of strings, word_dict, representing a dictionary. Your task is to add spaces to
4+
s to break it up into a sequence of valid words from word_dict. We are required to return an array of all possible
5+
sequences of words (sentences). The order in which the sentences are listed is not significant.
6+
7+
> Note: The same dictionary word may be reused multiple times in the segmentation.
8+
9+
## Constraints
10+
11+
- 1 <= s.length <= 20
12+
- 1 <= word_dict.length <= 1000
13+
- 1 <= word_dict[i].length <= 10
14+
- s and word_dict[i] consist of only lowercase English letters.
15+
- All the strings in word_dict are unique.
16+
17+
## Topics
18+
19+
- Array
20+
- Hash Table
21+
- String
22+
- Dynamic Programming
23+
- Backtracking
24+
- Trie
25+
- Memoization
26+
27+
## Solutions
28+
29+
### Naive Approach
30+
31+
The naive approach to solve this problem is to use a traditional recursive strategy in which we take each prefix of the
32+
input string, s, and compare it to each word in the dictionary. If it matches, we take the string’s suffix and repeat
33+
the process.
34+
35+
Here is how the algorithm works:
36+
37+
1. **Base case**: If the string is empty, there are no characters in the string that are left to process, so there’ll
38+
be no sentences that can be formed. Hence, we return an empty array.
39+
2. Otherwise, the string will not be empty, so we’ll iterate every word of the dictionary and check whether or not the
40+
string starts with the current dictionary word. This ensures that only valid word combinations are considered:
41+
- If it doesn’t start with the current dictionary word, no valid combinations can be formed from this word, so we
42+
move on to the next dictionary word.
43+
- If it does start with the current dictionary word, we have two options:
44+
- If the length of the current dictionary word is equal to the length of the string, it means the entire string
45+
can be formed from the current dictionary word. In this case, the string s is directly added to the result without
46+
any further processing.
47+
- **Recursive case**: Otherwise, the length of the current dictionary word will be less than the length of the
48+
string. This means that the string can be broken down further. Therefore, we make a recursive call to evaluate
49+
the remaining portion (suffix) of the string.
50+
51+
- We’ll then concatenate the prefix and the result of the suffix computed by the recursive call above and store it in
52+
the result.
53+
54+
3. After all possible combinations have been explored, we return the result.
55+
56+
The time complexity of this solution is O(k^n * m), where k is the number of words in the dictionary, `n` is the length
57+
of the string, and `m` is the length of the longest word in the dictionary.
58+
59+
The space complexity is O(k^n * n), where k is the number of words in the dictionary and `n` is the length of the string.
60+
61+
### Optimized approach using dynamic programming - tabulation
62+
63+
Since the recursive solution to this problem is very costly, let’s see if we can reduce this cost in any way. Dynamic
64+
programming helps us avoid recomputing the same subproblems. Therefore, let’s analyze our recursive solution to see if
65+
it has the properties needed for conversion to dynamic programming.
66+
67+
- **Optimal substructure**: Given an input string ,s, that we want to break up into dictionary words, we find the first
68+
word that matches a word from the dictionary, and then repeat the process for the remaining, shorter input string.
69+
This means that, to solve the problem for input `q`, we need to solve the same problem for `p`, where `p` is at
70+
least one character shorter than`q`. Therefore, this problem obeys the optimal substructure property.
71+
72+
- **Overlapping subproblems**: The algorithm solves the same subproblems repeatedly. Consider input string “ancookbook”
73+
and the dictionary [“an”, “book”, “cook”, “cookbook”]. The following is the partial call tree for the naive recursive
74+
solution:
75+
```text
76+
"ancookbook"
77+
/ \
78+
"ancookbook" "cookbook"
79+
/ \ / \
80+
"cookbook" ... "book" ...
81+
```
82+
83+
From the tree above, it can be seen that the subproblem “cookbook” is evaluated twice. To take advantage of these
84+
opportunities for optimization, we will use bottom-up dynamic programming, also known as the tabulation approach. This
85+
is an iterative method of solving dynamic programming problems. The idea is that if a prefix of the input string matches
86+
any word `w` in the dictionary, we can split the string into two parts: the matching word and the suffix of the input
87+
string. We start from an empty prefix which is the base case. The prefix would eventually develop into the complete
88+
input string.
89+
90+
> The tabulation approach is often more efficient than backtracking and memoization in terms of time and space complexity
91+
because it avoids the overhead of recursive calls and stack usage. It also eliminates the need for a separate
92+
memoization map, as the table itself serves as the storage for the subproblem solutions.
93+
94+
Here’s how the algorithm works:
95+
96+
- We initialize an empty lookup table, dp, of length, n+1, where dp[i] will correspond to the prefix of length i. This
97+
table will be used to store the solutions to previously solved subproblems. It will have the following properties:
98+
- The first entry of the table will represent a prefix of length 0 , i.e., an empty string “”.
99+
- The rest of the entries will represent the other prefixes of the string s. For example, the input string “vegan”
100+
will have the prefixes “v”, “ve”, “veg”, “vega”, and “vegan”.
101+
- Each entry of the table will contain an array containing the sentences that can be formed from the respective prefix.
102+
At this point, all the arrays are empty.
103+
- For the base case, we add an empty string to the array corresponding to the first entry of the dp table. This is
104+
because the only sentence that can be formed from an empty string is an empty string itself.
105+
- Next, we traverse the input string by breaking it into its prefixes by including a single character, one at a time,
106+
in each iteration.
107+
- For the current prefix, we initialize an array, temp, that will store the valid sentences formed from that prefix.
108+
Let’s suppose that the input string is “vegan”, and that the current prefix is “vega”.
109+
- For all possible suffixes of the current prefix, we check if the suffix exists in the given dictionary. In our
110+
example, this would mean checking the dictionary for the suffixes “vega”, “ega”, “ga”, and “a”. For each suffix, it
111+
will either match a dictionary word, or not:
112+
- If it does, we know that the suffix is a valid word from the dictionary and can be used as part of the solution.
113+
Therefore, in the dp table, we retrieve all the possible sentences for the prefix to the left of this suffix.
114+
Supposing that the current suffix of “vega” is “a”, and that “a” is present in the dictionary, we would retrieve
115+
all the sentences already found for “veg”. This means that we reuse the solutions of the subproblem smaller than
116+
the current subproblem. Now, we form new sentences for the current prefix by appending a space character and the
117+
current suffix (which is a valid dictionary word) to each of the retrieved sentences. Supposing that the valid
118+
sentences for the subproblem “veg” are “v eg”, and “ve g”, we will add these new sentences for the current
119+
subproblem, “vega”: “veg a”, “v eg a”, and “ve g a”. We add the new sentences to the temp array of this prefix.
120+
- If the suffix is not present in the dictionary, no sentences can be made from the current prefix, so the temp
121+
array of that prefix remains empty.
122+
- We repeat the above steps for all suffixes of the current prefix.
123+
- We set the entry corresponding to the current prefix in the dp table equal to the temp array.
124+
- We repeat the steps above for all prefixes of the input string.
125+
- After all the prefixes have been evaluated, the last entry of the dp table will be an array containing all the
126+
sentences formed from the largest prefix, i.e., the complete string. Therefore, we return this array.
127+
128+
#### Solution summary
129+
130+
To recap, the solution to this problem can be divided into the following six main steps:
131+
132+
1. We create a 2D table where each entry corresponds to a prefix of the input string. At this point, each entry contains
133+
an empty array.
134+
2. We iterate over all prefixes of the input string. For each prefix, we iterate over all of its suffixes.
135+
3. For each suffix, we check whether it’s a valid word, i.e., whether it’s present in the provided dictionary.
136+
4. If the suffix is a valid word, we combine it with all valid sentences from the corresponding entry (in the table) of
137+
the prefix to the left of it.
138+
5. We store the array of all possible sentences that can be formed using the current prefix in the corresponding entry
139+
of the table.
140+
6. After processing all prefixes of the input string, we return the array in the last entry of our table.
141+
142+
![Solution 1](images/solution/word_break_dynamic_programming_tabulation_solution_1.png)
143+
![Solution 2](images/solution/word_break_dynamic_programming_tabulation_solution_2.png)
144+
![Solution 3](images/solution/word_break_dynamic_programming_tabulation_solution_3.png)
145+
![Solution 4](images/solution/word_break_dynamic_programming_tabulation_solution_4.png)
146+
![Solution 5](images/solution/word_break_dynamic_programming_tabulation_solution_5.png)
147+
![Solution 6](images/solution/word_break_dynamic_programming_tabulation_solution_6.png)
148+
![Solution 7](images/solution/word_break_dynamic_programming_tabulation_solution_7.png)
149+
![Solution 8](images/solution/word_break_dynamic_programming_tabulation_solution_8.png)
150+
![Solution 9](images/solution/word_break_dynamic_programming_tabulation_solution_9.png)
151+
![Solution 10](images/solution/word_break_dynamic_programming_tabulation_solution_10.png)
152+
![Solution 11](images/solution/word_break_dynamic_programming_tabulation_solution_11.png)
153+
![Solution 12](images/solution/word_break_dynamic_programming_tabulation_solution_12.png)
154+
![Solution 13](images/solution/word_break_dynamic_programming_tabulation_solution_13.png)
155+
156+
#### Time Complexity
157+
158+
The time complexity of this solution is O(n^2 * v), where n is the length of the string `s` and v is the number of valid
159+
combinations
160+
161+
#### Space Complexity
162+
163+
The space complexity is O(n * v), where n is the length of the string and v is the number of valid combinations stored in
164+
the `dp` array.
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
from typing import List, Dict
2+
from datastructures.trees.trie import AlphabetTrie
3+
4+
5+
def word_break_trie(s: str, word_dict: List[str]) -> List[str]:
6+
"""
7+
This adds spaces to s to break it up into a sequence of valid words from word_dict.
8+
9+
This uses a Trie to store the words in the dictionary and a map to store the results of subproblems.
10+
11+
Complexity:
12+
Time: O(n*2^n): where n is the length of the string
13+
Space: O(n*2^n): where n is the length of the string
14+
15+
Args:
16+
s: The input string
17+
word_dict: The dictionary of words
18+
Returns:
19+
List of valid sentences
20+
"""
21+
# build the Trie from the word dictionary
22+
trie = AlphabetTrie()
23+
for word in word_dict:
24+
trie.insert(word)
25+
26+
# map to store results of subproblems
27+
results: Dict[int, List[str]] = dict()
28+
29+
# iterate from the end to the start of the string
30+
for start_idx in range(len(s), -1, -1):
31+
# store valid sentences starting from start_idx
32+
valid_sentences = []
33+
34+
# initialize current node to the root of the Trie
35+
current_node = trie.root
36+
37+
# iterate from start_idx to the end of the string
38+
for end_idx in range(start_idx, len(s)):
39+
char = s[end_idx]
40+
index = ord(char) - ord("a")
41+
42+
# check if the current character exists in the trie
43+
if not current_node.children[index]:
44+
break
45+
46+
# move to the next node in the trie
47+
current_node = current_node.children[index]
48+
49+
# check if we have found a valid word
50+
if current_node.is_end_of_word:
51+
current_word = s[start_idx : end_idx + 1]
52+
53+
# if it is the last word, add it as a valid sentence
54+
if end_idx == len(s) - 1:
55+
valid_sentences.append(current_word)
56+
else:
57+
# if it's not the last word, append it to each sentence formed by the remaining substring
58+
sentences_from_next_index = results.get(end_idx + 1, [])
59+
for sentence in sentences_from_next_index:
60+
valid_sentences.append(f"{current_word} {sentence}")
61+
62+
# store the valid sentences for the current start index
63+
results[start_idx] = valid_sentences
64+
65+
# return the sentences formed from the entire string
66+
return results.get(0, [])
67+
68+
69+
def word_break_dp(s: str, word_dict: List[str]) -> List[str]:
70+
"""
71+
This adds spaces to s to break it up into a sequence of valid words from word_dict.
72+
73+
This uses dynamic programming with tabulation to store the words in the dictionary and a map to store the results
74+
of subproblems.
75+
76+
Complexity:
77+
Time: O(n*2^n): where n is the length of the string
78+
Space: O(n*2^n): where n is the length of the string
79+
80+
Args:
81+
s: The input string
82+
word_dict: The dictionary of words
83+
Returns:
84+
List of valid sentences
85+
"""
86+
# Initializing the dp table of size s.length + 1
87+
dp = [[]] * (len(s) + 1)
88+
# Setting the base case
89+
dp[0] = [""]
90+
91+
# For each substring in the input string, repeat the process.
92+
for i in range(1, len(s) + 1):
93+
prefix = s[:i]
94+
95+
# An array to store the valid sentences formed from the current prefix being checked.
96+
temp = []
97+
98+
# Iterate over the current prefix and break it down into all possible suffixes.
99+
for j in range(0, i):
100+
suffix = prefix[j:]
101+
102+
# Check if the current suffix exists in word_dict. If it does, we know that it is a valid word
103+
# and can be used as part of the solution.
104+
if suffix in word_dict:
105+
# Retrieve the valid sentences from the previously computed subproblem
106+
for substring in dp[j]:
107+
# Merge the suffix with the already calculated results
108+
temp.append((substring + " " + suffix).strip())
109+
dp[i] = temp
110+
111+
# returning all the sentences formed from the complete string s
112+
return dp[len(s)]
113+
114+
115+
def word_break_dp_2(s: str, word_dict: List[str]) -> List[str]:
116+
"""
117+
This adds spaces to s to break it up into a sequence of valid words from word_dict.
118+
119+
This uses dynamic programming with tabulation to store the words in the dictionary and a map to store the results
120+
of subproblems.
121+
122+
Complexity:
123+
Time: O(n*2^n): where n is the length of the string
124+
Space: O(n*2^n): where n is the length of the string
125+
126+
Args:
127+
s: The input string
128+
word_dict: The dictionary of words
129+
Returns:
130+
List of valid sentences
131+
"""
132+
# map to store results of the subproblems
133+
dp: Dict[int, List[str]] = dict()
134+
135+
# iterate from the end of the string to the beginning
136+
for start_idx in range(len(s), -1, -1):
137+
# store valid sentences starting from start_idx
138+
valid_sentences = []
139+
140+
# Iterate from start index to the end of the string
141+
for end_idx in range(start_idx, len(s)):
142+
# extract substring from start_idx to end_idx
143+
current_word = s[start_idx : end_idx + 1]
144+
145+
# Check if the current substring is a valid word
146+
if current_word in word_dict:
147+
# If it's the last word, add it as a valid sentence
148+
if end_idx == len(s) - 1:
149+
valid_sentences.append(current_word)
150+
else:
151+
# If it's not the last word, append it to each sentence formed by the remaining substring
152+
sentences_from_next_index = dp.get(end_idx + 1, [])
153+
for sentence in sentences_from_next_index:
154+
valid_sentences.append(f"{current_word} {sentence}")
155+
156+
# Store the valid sentences in dp
157+
dp[start_idx] = valid_sentences
158+
159+
# returning all the sentences formed from the complete string s
160+
return dp.get(0, [])
20.1 KB
Loading
43.4 KB
Loading
44.1 KB
Loading
56.2 KB
Loading
57.3 KB
Loading
25.4 KB
Loading
29.7 KB
Loading
31.2 KB
Loading

0 commit comments

Comments
 (0)