|
| 1 | +# Word Break |
| 2 | + |
| 3 | +You are given a string, s, and an array of strings, word_dict, representing a dictionary. Your task is to add spaces to |
| 4 | +s to break it up into a sequence of valid words from word_dict. We are required to return an array of all possible |
| 5 | +sequences of words (sentences). The order in which the sentences are listed is not significant. |
| 6 | + |
| 7 | +> Note: The same dictionary word may be reused multiple times in the segmentation. |
| 8 | +
|
| 9 | +## Constraints |
| 10 | + |
| 11 | +- 1 <= s.length <= 20 |
| 12 | +- 1 <= word_dict.length <= 1000 |
| 13 | +- 1 <= word_dict[i].length <= 10 |
| 14 | +- s and word_dict[i] consist of only lowercase English letters. |
| 15 | +- All the strings in word_dict are unique. |
| 16 | + |
| 17 | +## Topics |
| 18 | + |
| 19 | +- Array |
| 20 | +- Hash Table |
| 21 | +- String |
| 22 | +- Dynamic Programming |
| 23 | +- Backtracking |
| 24 | +- Trie |
| 25 | +- Memoization |
| 26 | + |
| 27 | +## Solutions |
| 28 | + |
| 29 | +### Naive Approach |
| 30 | + |
| 31 | +The naive approach to solve this problem is to use a traditional recursive strategy in which we take each prefix of the |
| 32 | +input string, s, and compare it to each word in the dictionary. If it matches, we take the string’s suffix and repeat |
| 33 | +the process. |
| 34 | + |
| 35 | +Here is how the algorithm works: |
| 36 | + |
| 37 | +1. **Base case**: If the string is empty, there are no characters in the string that are left to process, so there’ll |
| 38 | + be no sentences that can be formed. Hence, we return an empty array. |
| 39 | +2. Otherwise, the string will not be empty, so we’ll iterate every word of the dictionary and check whether or not the |
| 40 | + string starts with the current dictionary word. This ensures that only valid word combinations are considered: |
| 41 | + - If it doesn’t start with the current dictionary word, no valid combinations can be formed from this word, so we |
| 42 | + move on to the next dictionary word. |
| 43 | + - If it does start with the current dictionary word, we have two options: |
| 44 | + - If the length of the current dictionary word is equal to the length of the string, it means the entire string |
| 45 | + can be formed from the current dictionary word. In this case, the string s is directly added to the result without |
| 46 | + any further processing. |
| 47 | + - **Recursive case**: Otherwise, the length of the current dictionary word will be less than the length of the |
| 48 | + string. This means that the string can be broken down further. Therefore, we make a recursive call to evaluate |
| 49 | + the remaining portion (suffix) of the string. |
| 50 | + |
| 51 | + - We’ll then concatenate the prefix and the result of the suffix computed by the recursive call above and store it in |
| 52 | + the result. |
| 53 | + |
| 54 | +3. After all possible combinations have been explored, we return the result. |
| 55 | + |
| 56 | +The time complexity of this solution is O(k^n * m), where k is the number of words in the dictionary, `n` is the length |
| 57 | +of the string, and `m` is the length of the longest word in the dictionary. |
| 58 | + |
| 59 | +The space complexity is O(k^n * n), where k is the number of words in the dictionary and `n` is the length of the string. |
| 60 | + |
| 61 | +### Optimized approach using dynamic programming - tabulation |
| 62 | + |
| 63 | +Since the recursive solution to this problem is very costly, let’s see if we can reduce this cost in any way. Dynamic |
| 64 | +programming helps us avoid recomputing the same subproblems. Therefore, let’s analyze our recursive solution to see if |
| 65 | +it has the properties needed for conversion to dynamic programming. |
| 66 | + |
| 67 | +- **Optimal substructure**: Given an input string ,s, that we want to break up into dictionary words, we find the first |
| 68 | + word that matches a word from the dictionary, and then repeat the process for the remaining, shorter input string. |
| 69 | + This means that, to solve the problem for input `q`, we need to solve the same problem for `p`, where `p` is at |
| 70 | + least one character shorter than`q`. Therefore, this problem obeys the optimal substructure property. |
| 71 | + |
| 72 | +- **Overlapping subproblems**: The algorithm solves the same subproblems repeatedly. Consider input string “ancookbook” |
| 73 | + and the dictionary [“an”, “book”, “cook”, “cookbook”]. The following is the partial call tree for the naive recursive |
| 74 | + solution: |
| 75 | + ```text |
| 76 | + "ancookbook" |
| 77 | + / \ |
| 78 | + "ancookbook" "cookbook" |
| 79 | + / \ / \ |
| 80 | + "cookbook" ... "book" ... |
| 81 | + ``` |
| 82 | + |
| 83 | +From the tree above, it can be seen that the subproblem “cookbook” is evaluated twice. To take advantage of these |
| 84 | +opportunities for optimization, we will use bottom-up dynamic programming, also known as the tabulation approach. This |
| 85 | +is an iterative method of solving dynamic programming problems. The idea is that if a prefix of the input string matches |
| 86 | +any word `w` in the dictionary, we can split the string into two parts: the matching word and the suffix of the input |
| 87 | +string. We start from an empty prefix which is the base case. The prefix would eventually develop into the complete |
| 88 | +input string. |
| 89 | + |
| 90 | +> The tabulation approach is often more efficient than backtracking and memoization in terms of time and space complexity |
| 91 | +because it avoids the overhead of recursive calls and stack usage. It also eliminates the need for a separate |
| 92 | +memoization map, as the table itself serves as the storage for the subproblem solutions. |
| 93 | + |
| 94 | +Here’s how the algorithm works: |
| 95 | + |
| 96 | +- We initialize an empty lookup table, dp, of length, n+1, where dp[i] will correspond to the prefix of length i. This |
| 97 | + table will be used to store the solutions to previously solved subproblems. It will have the following properties: |
| 98 | + - The first entry of the table will represent a prefix of length 0 , i.e., an empty string “”. |
| 99 | + - The rest of the entries will represent the other prefixes of the string s. For example, the input string “vegan” |
| 100 | + will have the prefixes “v”, “ve”, “veg”, “vega”, and “vegan”. |
| 101 | + - Each entry of the table will contain an array containing the sentences that can be formed from the respective prefix. |
| 102 | + At this point, all the arrays are empty. |
| 103 | +- For the base case, we add an empty string to the array corresponding to the first entry of the dp table. This is |
| 104 | + because the only sentence that can be formed from an empty string is an empty string itself. |
| 105 | +- Next, we traverse the input string by breaking it into its prefixes by including a single character, one at a time, |
| 106 | + in each iteration. |
| 107 | + - For the current prefix, we initialize an array, temp, that will store the valid sentences formed from that prefix. |
| 108 | + Let’s suppose that the input string is “vegan”, and that the current prefix is “vega”. |
| 109 | + - For all possible suffixes of the current prefix, we check if the suffix exists in the given dictionary. In our |
| 110 | + example, this would mean checking the dictionary for the suffixes “vega”, “ega”, “ga”, and “a”. For each suffix, it |
| 111 | + will either match a dictionary word, or not: |
| 112 | + - If it does, we know that the suffix is a valid word from the dictionary and can be used as part of the solution. |
| 113 | + Therefore, in the dp table, we retrieve all the possible sentences for the prefix to the left of this suffix. |
| 114 | + Supposing that the current suffix of “vega” is “a”, and that “a” is present in the dictionary, we would retrieve |
| 115 | + all the sentences already found for “veg”. This means that we reuse the solutions of the subproblem smaller than |
| 116 | + the current subproblem. Now, we form new sentences for the current prefix by appending a space character and the |
| 117 | + current suffix (which is a valid dictionary word) to each of the retrieved sentences. Supposing that the valid |
| 118 | + sentences for the subproblem “veg” are “v eg”, and “ve g”, we will add these new sentences for the current |
| 119 | + subproblem, “vega”: “veg a”, “v eg a”, and “ve g a”. We add the new sentences to the temp array of this prefix. |
| 120 | + - If the suffix is not present in the dictionary, no sentences can be made from the current prefix, so the temp |
| 121 | + array of that prefix remains empty. |
| 122 | + - We repeat the above steps for all suffixes of the current prefix. |
| 123 | + - We set the entry corresponding to the current prefix in the dp table equal to the temp array. |
| 124 | +- We repeat the steps above for all prefixes of the input string. |
| 125 | +- After all the prefixes have been evaluated, the last entry of the dp table will be an array containing all the |
| 126 | + sentences formed from the largest prefix, i.e., the complete string. Therefore, we return this array. |
| 127 | + |
| 128 | +#### Solution summary |
| 129 | + |
| 130 | +To recap, the solution to this problem can be divided into the following six main steps: |
| 131 | + |
| 132 | +1. We create a 2D table where each entry corresponds to a prefix of the input string. At this point, each entry contains |
| 133 | + an empty array. |
| 134 | +2. We iterate over all prefixes of the input string. For each prefix, we iterate over all of its suffixes. |
| 135 | +3. For each suffix, we check whether it’s a valid word, i.e., whether it’s present in the provided dictionary. |
| 136 | +4. If the suffix is a valid word, we combine it with all valid sentences from the corresponding entry (in the table) of |
| 137 | + the prefix to the left of it. |
| 138 | +5. We store the array of all possible sentences that can be formed using the current prefix in the corresponding entry |
| 139 | + of the table. |
| 140 | +6. After processing all prefixes of the input string, we return the array in the last entry of our table. |
| 141 | + |
| 142 | + |
| 143 | + |
| 144 | + |
| 145 | + |
| 146 | + |
| 147 | + |
| 148 | + |
| 149 | + |
| 150 | + |
| 151 | + |
| 152 | + |
| 153 | + |
| 154 | + |
| 155 | + |
| 156 | +#### Time Complexity |
| 157 | + |
| 158 | +The time complexity of this solution is O(n^2 * v), where n is the length of the string `s` and v is the number of valid |
| 159 | +combinations |
| 160 | + |
| 161 | +#### Space Complexity |
| 162 | + |
| 163 | +The space complexity is O(n * v), where n is the length of the string and v is the number of valid combinations stored in |
| 164 | +the `dp` array. |
0 commit comments