9.27
Greedy-Approximation for Set Cover
Given a universe U of n elements and a family S = {S1, … , Sm} of subsets whose union is U, repeatedly pick the set that covers the largest number of still-uncovered elements until every element is covered.
Why it works. Although NP-hard to solve optimally, this greedy rule guarantees an H(n) ≈ ln n + 1 approximation factor (H is the n-th harmonic number), which is the best possible in polynomial time unless P = NP.
Pseudocode (unit costs):
GREEDY-SET-COVER(U, S)
C ← ∅ // chosen sets
U_left ← U // uncovered elements
while U_left ≠ ∅
pick T ∈ S that maximizes |T ∩ U_left|
C ← C ∪ {T}
U_left ← U_left \ T
return C
Python implementation (each set is a Python
set
):
def greedy_set_cover(universe, subsets):
uncovered = set(universe)
chosen = []
while uncovered:
best = max(subsets, key=lambda s: len(s & uncovered))
chosen.append(best)
uncovered -= best
return chosen
Complexities: Each iteration scans all m sets and performs set intersections costing O(n). In the worst case we iterate |C| ≤ n times, giving O(mn²) naïvely, or O(mn) if element counts are cached and updated incrementally.
Thus, the presented greedy algorithm is easy to code (see above), runs in polynomial time, and returns a solution no worse than a factor H(n) from the optimum.
Back to
Chapter 9