Skip to content

Commit 524c6a1

Browse files
committed
feat(algorithms, graphs, topoligical-sort): alien dictionary variation 2
1 parent 90e79aa commit 524c6a1

13 files changed

+303
-3
lines changed

algorithms/graphs/alien_dictionary/README.md

Lines changed: 263 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,266 @@ If multiple valid orderings exist, you may return any of them.
2727
![Example 3](./images/examples/alien_dictionary_example_3.png)
2828
![Example 4](./images/examples/alien_dictionary_example_4.png)
2929
![Example 5](./images/examples/alien_dictionary_example_5.png)
30+
31+
## Solution
32+
33+
We can solve this problem using the topological sort pattern. Topological sort is used to find a linear ordering of
34+
elements that have dependencies on or priority over each other. For example, if A is dependent on B or B has priority
35+
over A, then B is listed before A in topological order.
36+
37+
Using the list of words, we identify the relative precedence order of the letters in the words and generate a graph to
38+
represent this ordering. To traverse a graph, we can use breadth-first search to find the letters’ order.
39+
40+
We can essentially map this problem to a graph problem, but before exploring the exact details of the solution, there
41+
are a few things that we need to keep in mind:
42+
43+
1. The letters within a word don’t tell us anything about the relative order. For example, the word “educative” in the list
44+
doesn’t tell us that the letter “e” is before the letter “d.”
45+
46+
2. The input can contain words followed by their prefix, such as “educated” and then “educate.” These cases will never result
47+
in a valid alphabet because in a valid alphabet, prefixes are always first. We need to make sure our solution detects
48+
these cases correctly.
49+
3. There can be more than one valid alphabet ordering. It’s fine for our algorithm to return any one of them.
50+
4. The output dictionary must contain all unique letters within the words list, including those that could be in any position
51+
within the ordering. It shouldn’t contain any additional letters that weren’t in the input.
52+
53+
### Step-by-step solution construction
54+
55+
For the graph problem, we can break this particular problem into three parts:
56+
57+
1. Extract the necessary information to identify the dependency rules from the words. For example, in the words
58+
[“patterns”, “interview”], the letter “p” comes before “i.”
59+
2. With the gathered information, we can put these dependency rules into a directed graph with the letters as nodes and
60+
the dependencies (order) as the edges.
61+
3. Lastly, we can sort the graph nodes topologically to generate the letter ordering (dictionary).
62+
63+
Let’s look at each part in more depth.
64+
65+
#### Part 1: Identifying the dependencies
66+
67+
Let’s start with example words and observe the initial ordering through simple reasoning:
68+
69+
`["mzosr", "mqov", "xxsvq", "xazv", "xazau", "xaqu", "suvzu", "suvxq", "suam", "suax", "rom", "rwx", "rwv"]`
70+
71+
As in the English language dictionary, where all the words starting with “a” come at the start followed by the words
72+
starting with “b,” “c,” “d,” and so on, we can expect the first letters of each word to be in alphabetical order.
73+
74+
`["m", "m", "x", "x", "x", "x", "s", "s", "s", "s", "r", "r", "r"]`
75+
76+
Removing the duplicates, we get the following:
77+
78+
`["m", "x", "s", "r"]`
79+
80+
Following the intuition explained above, we can assume that the first letters in the messages are in alphabetical order:
81+
82+
![Solution 1](./images/solutions/alien_dictionary_solution_1.png)
83+
84+
Looking at the letters above, we know the relative order of these letters, but we don’t know how these letters fit in
85+
with the rest of the letters. To get more information, we need to look further into our English dictionary analogy. The
86+
word “dirt” comes before “dorm.” This is because we look at the second letter when the first letter is the same. In this
87+
case, “i” comes before “o” in the alphabet.
88+
89+
We can apply the same logic to our alien words and look at the first two words, “mzsor” and “mqov.” As the first letter
90+
is the same in both words, we look at the second letter. The first word has “z,” and the second one has “q.” Therefore,
91+
we can safely say that “z” comes before “q” in this alien language. We now have two fragments of the letter order:
92+
93+
![Solution 2](./images/solutions/alien_dictionary_solution_2.png)
94+
95+
> Note: Notice that we didn’t mention rules such as “m -> a”. This is fine because we can derive this relation from
96+
> “m -> x”, “x -> a”.
97+
98+
This is it for the first part. Let’s put the pieces that we have in place.
99+
100+
#### Part 2: Representing the dependencies
101+
102+
We now have a set of relations mentioning the relative order of the pairs of letters:
103+
104+
`["z -> q", "m -> x", "x -> a", "x -> v", "x -> s", "z -> x", "v -> a", "s -> r", "o -> w"]`
105+
106+
Now the question arises, how can we put these relations together? It might be tempting to start chaining all these
107+
together. Let’s look at a few possible chains:
108+
109+
![Solution 3](./images/solutions/alien_dictionary_solution_3.png)
110+
111+
We can observe from our chains above that some letters might appear in more than one chain, and putting the chains into
112+
the output list one after the other won’t work. Some of the letters might be duplicated and would result in an invalid
113+
ordering. Let’s try to visualize the relations better with the help of a graph. The nodes are the letters, and an edge
114+
between two letters, “x” and “y” represents that “x” is before “y” in the alien words.
115+
116+
![Solution 4](./images/solutions/alien_dictionary_solution_4.png)
117+
118+
#### Part 3: Generating the dictionary
119+
120+
As we can see from the graph, four of the letters have no incoming arrows. This means that there are no letters that
121+
have to come before any of these four.
122+
123+
> Remember: There could be multiple valid dictionaries, and if there are, then it’s fine for us to return any of them.
124+
125+
Therefore, a valid start to the ordering we return would be as follows:
126+
`["o", "m", "u", "z"]`
127+
128+
We can now remove these letters and edges from the graph because any other letters that required them first will now have
129+
this requirement satisfied.
130+
131+
![Solution 5](./images/solutions/alien_dictionary_solution_5.png)
132+
133+
There are now three new letters on this new graph that have no in arrows. We can add these to our output list.
134+
135+
`["o", "m", "u", "z", "x", "q", "w"]`
136+
137+
Again, we can remove these from the graph.
138+
139+
![Solution 6](./images/solutions/alien_dictionary_solution_6.png)
140+
141+
Then, we add the two new letters with no in arrows.
142+
143+
`["o", "m", "u", "z", "x", "q", "w", "v", "s"]`
144+
This leaves the following graph:
145+
146+
![Solution 7](./images/solutions/alien_dictionary_solution_7.png)
147+
148+
We can place the final two letters in our output list and return the ordering:
149+
150+
`["o", "m", "u", "z", "x", "q", "w", "v", "s", "a", "r"]`
151+
Let’s now review how we can implement this approach.
152+
153+
Identifying the dependencies and representing them in the form of a graph is pretty straightforward. We extract the
154+
relations and insert them into an adjacency list:
155+
156+
![Solution 8](./images/solutions/alien_dictionary_solution_8.png)
157+
158+
Next, we need to generate the dictionary from the extracted relations: identify the letters (nodes) with no incoming links.
159+
Identifying whether a particular letter (node) has any incoming links or not from our adjacency list format can be a
160+
little complicated. A naive approach is to repeatedly iterate over the adjacency lists of all the other nodes and check
161+
whether or not they contain a link to that particular node.
162+
163+
This naive method would be fine for our case, but perhaps we can do it more optimally.
164+
165+
An alternative is to keep two adjacency lists:
166+
167+
One with the same contents as the one above.
168+
One reversed that shows the incoming links.
169+
This way, every time we traverse an edge, we can remove the corresponding edge from the reversed adjacency list:
170+
171+
![Solution 9](./images/solutions/alien_dictionary_solution_9.png)
172+
173+
What if we can do better than this? Instead of tracking the incoming links for all the letters from a particular letter,
174+
we can track the count of how many incoming edges there are. We can keep the in-degree count of all the letters along with
175+
the forward adjacency list.
176+
177+
> In-degree corresponds to the number of incoming edges of a node.
178+
179+
It will look like this:
180+
181+
![Solution 10](./images/solutions/alien_dictionary_solution_10.png)
182+
183+
Now, we can decrement the in-degree count of a node instead of removing it from the reverse adjacency list. When the in-degree of the node reaches
184+
0
185+
0
186+
, this represents that this particular node has no incoming links left.
187+
188+
We perform BFS on all the letters that are reachable, that is, the in-degree count of the letters is zero. A letter is
189+
only reachable once the letters that need to be before it have been added to the output, result.
190+
191+
We use a queue to keep track of reachable nodes and perform BFS on them. Initially, we put the letters that have zero
192+
in-degree count. We keep adding the letters to the queue as their in-degree counts become zero.
193+
194+
We continue this until the queue is empty. Next, we check whether all the letters in the words have been added to the
195+
output or not. This would only happen when some letters still have some incoming edges left, which means there is a cycle.
196+
In this case, we return an empty string.
197+
198+
> Remember: There can be letters that don’t have any incoming edges. This can result in different orderings for the same
199+
> set of words, and that’s all right.
200+
201+
### Solution summary
202+
203+
To recap, the solution to this problem can be divided into the following parts:
204+
205+
1. Build a graph from the given words and keep track of the in-degrees of alphabets in a dictionary.
206+
2. Add the sources to a result list.
207+
3. Remove the sources and update the in-degrees of their children. If the in-degree of a child becomes 0, it’s the next
208+
source.
209+
4. Repeat until all alphabets are covered.
210+
211+
### Time Complexity
212+
213+
There are three parts to the algorithm:
214+
215+
- Identifying all the relations.
216+
- Putting them into an adjacency list.
217+
- Converting it into a valid alphabet ordering.
218+
219+
In the worst case, the identification and initialization parts require checking every letter of every word, which is
220+
O(c), where c is the total length of all the words in the input list added together.
221+
222+
For the generation part, we can recall that a breadth-first search has a cost of O(v+e), where v is the number of vertices
223+
and e is the number of edges. Our algorithm has the same cost as BFS because it visits each edge and node once.
224+
225+
> Note: A node is visited once all of its edges are visited, unlike the traditional BFS where it’s visited once any edge
226+
> is visited.
227+
228+
Therefore, determining the cost of our algorithm requires determining how many nodes and edges there are in the graph.
229+
230+
**Nodes**: We know that there’s one vertex for each unique letter, that is, O(u) vertices, where u is the total number of
231+
unique letters in words. While this is limited to 26 in our case, we still look at how it would impact the complexity if
232+
this weren’t the case.
233+
234+
**Edges**: We generate each edge in the graph by comparing two adjacent words in the input list. There are n−1 pairs of
235+
adjacent words, and only one edge can be generated from each pair, where n is the total number of words in the input list.
236+
We can again look back at the English dictionary analogy to make sense of this:
237+
238+
"dirt"
239+
"dorm"
240+
241+
The only conclusion we can draw is that “i” is before “o.” This is the reason "dirt" appears before "dorm" in an English
242+
dictionary. The solution explains that the remaining letters “rt” and “rm” are irrelevant for determining the alphabetical
243+
ordering.
244+
245+
> Remember: We only generate rules for adjacent words and don’t add the “implied” rules to the adjacency list.
246+
247+
So with this, we know that there are at most n−1 edges.
248+
249+
We can place one additional upper limit on the number of edges since it’s impossible to have more than one edge between
250+
each pair of nodes. With u nodes, this means there can’t be more than u^2 edges.
251+
252+
Because the number of edges has to be lower than both n−1 and u^2, we know it’s at most the smallest of these two values:
253+
min(u^2 ,n−1).
254+
255+
We can now substitute the number of nodes and the number of edges in our breadth-first search cost:
256+
- v=u
257+
- e=min(u^2 ,n−1)
258+
259+
This gives us the following:
260+
> O(v+e) = O(u + min(u^2, n−1)) = O(u + min(u^2 ,n))
261+
262+
Finally, we combine the three parts: O(c) for the first two parts and O(u + min(u^2 ,n)) for the third part. Since we
263+
have two independent parts, we can add them and look at the final formula to see whether or not we can identify any
264+
relation between them. Combining them, we get the following:
265+
266+
> O(c) + O(u + min(u^2, n)) = O(c + u + min(u^2,n))
267+
268+
So, what do we know about the relative values of n, c and u? We can deduce that both n, the total number of words, and
269+
u, the total number of unique letters, are smaller than the total number of letters, c, because each word contains at
270+
least one character and there can’t be more unique characters than there are characters.
271+
272+
We know that c is the biggest of the three, but we don’t know the relation between n and u.
273+
274+
Let’s simplify our formulation a little since we know that the u bit is insignificant compared to c
275+
276+
> O(c+u+min(u^2,n))−>O(c+min(u^2 ,n))
277+
278+
Let’s now consider two cases to simplify it a little further:
279+
280+
- If u^2 is smaller than n, then min(u^2,n)=u^2. We have already established that u^2 is smaller than n, which is, in
281+
turn, smaller than c, and so u^2 is definitely less than c. This leaves us with O(c).
282+
- If u^2 is larger than n, then min(u^2,n)=n. Because c>n, we’re left with O(c).
283+
284+
So in all cases, we know that c>min(u^2 ,n). This gives us a final time complexity of O(c).
285+
286+
### Space Complexity
287+
288+
The space complexity is O(1) or O(u+min(u^2, n)). The adjacency list uses O(v+e) memory, which in the worst case is
289+
min(u^2 ,n), as explained in the time complexity analysis. So in total, the adjacency list takes
290+
O(u+min(u^2,n)) space. So, the space complexity for a large number of letters is O(u+min(u^2 ,n)). However, for our use
291+
case, where u is fixed at a maximum of 26, the space complexity is O(1). This is because u is fixed at 26, and the number
292+
of relations is fixed at 26^2, so O(min(26^2,n))=O(26^2)=O(1).

algorithms/graphs/alien_dictionary/__init__.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,3 +71,35 @@ def alien_order(words: List[str]) -> str:
7171
queue.append(next_character)
7272

7373
return "" if len(word) < len(in_degree.keys()) else word
74+
75+
76+
def alien_order_2(words: List[str]) -> str:
77+
adj_list: DefaultDict[str, Set[str]] = defaultdict(set)
78+
counts: Counter[str] = Counter({c: 0 for word in words for c in word})
79+
80+
for word1, word2 in zip(words, words[1:]):
81+
for c, d in zip(word1, word2):
82+
if c != d:
83+
if d not in adj_list[c]:
84+
adj_list[c].add(d)
85+
counts[d] += 1
86+
break
87+
88+
else:
89+
if len(word2) < len(word1):
90+
return ""
91+
92+
result: List[str] = []
93+
sources_queue: Deque[str] = deque([c for c in counts if counts[c] == 0])
94+
while sources_queue:
95+
c = sources_queue.popleft()
96+
result.append(c)
97+
98+
for d in adj_list[c]:
99+
counts[d] -= 1
100+
if counts[d] == 0:
101+
sources_queue.append(d)
102+
103+
if len(result) < len(counts):
104+
return ""
105+
return "".join(result)
53.7 KB
Loading
21.7 KB
Loading
55.4 KB
Loading
40.8 KB
Loading
87.4 KB
Loading
60.3 KB
Loading
26.1 KB
Loading
14.2 KB
Loading

0 commit comments

Comments
 (0)