4.7
Idea. If a researcher has [math]n[/math] papers, her [math]h[/math]-index is the largest integer [math]h[/math] such that at least [math]h[/math] papers have ≥ [math]h[/math] citations.
Because [math]h \le n[/math], we can count how many papers reach each citation level 0 … [math]n[/math] and scan this summary once—avoiding the [math]O(n\log n)[/math] sort.
Algorithm (bucket counting, time [math]O(n)[/math], space [math]O(n)[/math]):
1. Let bucket = [0]*(n+1)
.
2. For every citation count c
:
• if c ≥ n
add 1 to bucket[n]
(all larger values collapse here);
• else add 1 to bucket[c]
.
3. Starting from i = n
down to 0, maintain papers += bucket[i]
.
The first index where papers ≥ i
is the answer [math]h[/math].
Python code.
def h_index(citations):
n = len(citations)
bucket = [0]*(n+1)
for c in citations:
bucket[min(c, n)] += 1 # step 2
papers = 0
for i in range(n, -1, -1): # step 3
papers += bucket[i]
if papers >= i:
return i
Correctness sketch.
The downwards scan keeps papers
= number of articles with at least i
citations.
• When we first meet papers ≥ i
, by construction there are ≥ i such papers, so [math]h\ge i[/math].
• For any larger [math]j>i[/math], papers
counted at step [math]j[/math] is < [math]j[/math], so the definition fails; hence [math]i[/math] is maximal and equals the [math]h[/math]-index.
Complexities.
• Building buckets: [math]O(n)[/math] time.
• One backward pass: [math]O(n)[/math] time.
• Extra array of size [math]n+1[/math]: [math]O(n)[/math] space.
Thus the algorithm computes the researcher’s [math]h[/math]-index in linear time.
Back to
Chapter 4