4.7

From The Algorithm Design Manual Solution Wiki
Jump to navigation Jump to search

Idea. If a researcher has [math]n[/math] papers, her [math]h[/math]-index is the largest integer [math]h[/math] such that at least [math]h[/math] papers have ≥ [math]h[/math] citations.

Because [math]h \le n[/math], we can count how many papers reach each citation level 0 … [math]n[/math] and scan this summary once—avoiding the [math]O(n\log n)[/math] sort.

Algorithm (bucket counting, time [math]O(n)[/math], space [math]O(n)[/math]):
1. Let bucket = [0]*(n+1).
2. For every citation count c:
  • if c ≥ n add 1 to bucket[n] (all larger values collapse here);
  • else add 1 to bucket[c].
3. Starting from i = n down to 0, maintain papers += bucket[i]. The first index where papers ≥ i is the answer [math]h[/math].

Python code.
def h_index(citations): n = len(citations) bucket = [0]*(n+1) for c in citations: bucket[min(c, n)] += 1 # step 2 papers = 0 for i in range(n, -1, -1): # step 3 papers += bucket[i] if papers >= i: return i

Correctness sketch. The downwards scan keeps papers = number of articles with at least i citations. • When we first meet papers ≥ i, by construction there are ≥ i such papers, so [math]h\ge i[/math]. • For any larger [math]j>i[/math], papers counted at step [math]j[/math] is < [math]j[/math], so the definition fails; hence [math]i[/math] is maximal and equals the [math]h[/math]-index.

Complexities. • Building buckets: [math]O(n)[/math] time. • One backward pass: [math]O(n)[/math] time. • Extra array of size [math]n+1[/math]: [math]O(n)[/math] space.

Thus the algorithm computes the researcher’s [math]h[/math]-index in linear time.


Back to Chapter 4