Skip to content

Commit 65524ad

Browse files
committed
Add greedy set cover approximation algorithm
1 parent ae68a78 commit 65524ad

File tree

1 file changed

+183
-0
lines changed

1 file changed

+183
-0
lines changed

greedy_methods/greedy_set_cover.py

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
"""
2+
Greedy approximation algorithm for the minimum set cover problem.
3+
4+
Author: Ben Chaddha (https://github.com/benchaddha)
5+
6+
Problem Definition:
7+
Given a universe U and a collection S of subsets of U such that the union
8+
of all subsets equals U, find the minimum number of subsets whose union
9+
covers U.
10+
11+
This problem is NP-complete (Karp, 1972), making exact solutions
12+
computationally infeasible for large instances.
13+
14+
Algorithm:
15+
This implementation uses the standard greedy heuristic that iteratively
16+
selects the subset covering the most uncovered elements until all elements
17+
are covered.
18+
19+
Complexity:
20+
Time: O(|U| * |S|) where |S| is the number of subsets
21+
Space: O(|U| + |S|)
22+
23+
Approximation Guarantee:
24+
The greedy algorithm achieves an H_d-approximation where:
25+
- d = max_i |S_i| (maximum subset size)
26+
- H_d = sum(1/i for i in 1..d) (d-th harmonic number)
27+
- H_d ≤ ln(d) + 1
28+
29+
For the general case, this gives a ln(|U|) + 1 approximation. Feige (1998)
30+
proved this is essentially optimal: no polynomial-time algorithm can achieve
31+
a (1 - ε) * ln(|U|) approximation for any ε > 0, unless P = NP.
32+
33+
Limitations:
34+
- This implementation assumes all subsets have equal cost (unweighted).
35+
- For the weighted set cover variant, subset selection should be based on
36+
the cost-effectiveness ratio (uncovered elements / cost).
37+
- The greedy approach may be far from optimal for specific instances,
38+
though it provides the theoretical guarantee stated above.
39+
40+
References:
41+
- R. M. Karp, "Reducibility Among Combinatorial Problems", 1972.
42+
- D. S. Johnson, "Approximation algorithms for combinatorial problems", 1974.
43+
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein,
44+
"Introduction to Algorithms", Chapter 35.3, Set Cover.
45+
- U. Feige, "A Threshold of ln n for Approximating Set Cover",
46+
Journal of the ACM, 45(4), 634-652, 1998.
47+
48+
Note:
49+
Element and subset identifier types must be hashable for set operations.
50+
51+
Example:
52+
>>> universe = {1, 2, 3, 4, 5}
53+
>>> subsets = {
54+
... "A": {1, 2},
55+
... "B": {2, 3, 4},
56+
... "C": {3, 4},
57+
... "D": {4, 5},
58+
... }
59+
>>> cover = greedy_set_cover(universe, subsets)
60+
>>> # Verify the cover is valid
61+
>>> set().union(*(subsets[i] for i in cover)) == universe
62+
True
63+
>>> # The greedy algorithm selects B and D (or B and a subset covering 1, 5)
64+
>>> len(cover) <= 3
65+
True
66+
67+
>>> # Optimal case: one subset covers everything
68+
>>> universe2 = {1, 2, 3}
69+
>>> subsets2 = {"A": {1, 2, 3}, "B": {1}, "C": {2}}
70+
>>> cover2 = greedy_set_cover(universe2, subsets2)
71+
>>> cover2 == {"A"}
72+
True
73+
74+
>>> # Example showing greedy may not be optimal
75+
>>> # Optimal is 2 sets {B,C}, but greedy might pick A first
76+
>>> universe3 = {1, 2, 3, 4}
77+
>>> subsets3 = {"A": {1, 2}, "B": {1, 3, 4}, "C": {2, 3, 4}}
78+
>>> cover3 = greedy_set_cover(universe3, subsets3)
79+
>>> len(cover3) <= 3
80+
True
81+
>>> set().union(*(subsets3[i] for i in cover3)) == universe3
82+
True
83+
84+
>>> # Error handling - empty universe
85+
>>> greedy_set_cover(set(), {"A": {1}})
86+
Traceback (most recent call last):
87+
...
88+
ValueError: Universe must be non-empty.
89+
90+
>>> # Error handling - empty subsets
91+
>>> greedy_set_cover({1, 2}, {})
92+
Traceback (most recent call last):
93+
...
94+
ValueError: Subsets mapping must be non-empty.
95+
96+
>>> # Error handling - subsets don't cover universe
97+
>>> greedy_set_cover({1, 2, 3}, {"A": {1}, "B": {2}})
98+
Traceback (most recent call last):
99+
...
100+
ValueError: The provided subsets do not cover the universe.
101+
"""
102+
103+
from __future__ import annotations
104+
105+
from collections.abc import Hashable, Iterable, Mapping
106+
107+
108+
def greedy_set_cover(
109+
universe: Iterable[Hashable],
110+
subsets: Mapping[Hashable, Iterable[Hashable]],
111+
) -> set[Hashable]:
112+
"""
113+
Greedy approximation for minimum set cover.
114+
115+
Args:
116+
universe: The set of elements to be covered.
117+
subsets: A mapping from subset identifiers to their elements.
118+
119+
Returns:
120+
A set of subset identifiers that covers the universe.
121+
122+
Raises:
123+
ValueError: If the universe is empty, subsets is empty, or if the
124+
provided subsets cannot cover the universe.
125+
126+
Time Complexity: O(|universe| * |subsets|)
127+
Space Complexity: O(|universe| + |subsets|)
128+
"""
129+
130+
# Normalize inputs to sets so we do not mutate user-provided structures.
131+
universe_set = set(universe)
132+
if not universe_set:
133+
raise ValueError("Universe must be non-empty.")
134+
135+
if not subsets:
136+
raise ValueError("Subsets mapping must be non-empty.")
137+
138+
normalized_subsets: dict[Hashable, set[Hashable]] = {
139+
key: set(s) for key, s in subsets.items()
140+
}
141+
142+
# Quick feasibility check: if the union of all subsets does not cover U,
143+
# we can terminate early. This is preferable to silently returning
144+
# an incomplete "cover".
145+
union_of_subsets: set[Hashable] = set().union(*normalized_subsets.values())
146+
if not universe_set.issubset(union_of_subsets):
147+
raise ValueError("The provided subsets do not cover the universe.")
148+
149+
uncovered: set[Hashable] = set(universe_set)
150+
chosen_subsets: set[Hashable] = set()
151+
152+
# Standard greedy loop: at each step, select the subset that covers
153+
# the largest number of remaining uncovered elements.
154+
while uncovered:
155+
best_key: Hashable | None = None
156+
best_gain = 0
157+
158+
for key, subset in normalized_subsets.items():
159+
if key in chosen_subsets:
160+
continue # already selected
161+
# Intersection with uncovered elements gives the marginal gain.
162+
gain = len(uncovered & subset)
163+
if gain > best_gain:
164+
best_gain = gain
165+
best_key = key
166+
167+
# If no subset yields a positive gain, but uncovered is non-empty,
168+
# the instance is effectively uncoverable (should not happen if the
169+
# feasibility check above passed, but we keep this for robustness).
170+
if best_key is None or best_gain == 0:
171+
raise ValueError("The provided subsets do not cover the universe.")
172+
173+
# Commit to the chosen subset and mark its elements as covered.
174+
chosen_subsets.add(best_key)
175+
uncovered -= normalized_subsets[best_key]
176+
177+
return chosen_subsets
178+
179+
180+
if __name__ == "__main__":
181+
import doctest
182+
183+
doctest.testmod()

0 commit comments

Comments
 (0)