# Sorting and Searching

Applications of Sorting

4-1. The Grinch is given the job of partitioning $2n$ players into two teams of $n$ players each. Each player has a numerical rating that measures how good he/she is at the game. He seeks to divide the players as unfairly as possible, so as to create the biggest possible talent imbalance between team $A$ and team $B$. Show how the Grinch can do the job in $O(n \log n)$ time. Grinch, The

4-2. For each of the following problems, give an algorithm that finds the desired numbers within the given amount of time. To keep your answers brief, feel free to use algorithms from the book as subroutines. For the example, $S = \{6, 13, 19, 3, 8\}$, $19-3$ maximizes the difference, while $8-6$ minimizes the difference.
(a) Let $S$ be an unsorted array of $n$ integers. Give an algorithm that finds the pair $x, y \in S$ that maximizes $|x-y|$. Your algorithm must run in $O(n)$ worst-case time.
(b) Let $S$ be a sorted array of $n$ integers. Give an algorithm that finds the pair $x, y \in S$ that maximizes $|x-y|$. Your algorithm must run in $O(1)$ worst-case time.
(c) Let $S$ be an unsorted array of $n$ integers. Give an algorithm that finds the pair $x, y \in S$ that minimizes $|x-y|$, for $x \neq y$. Your algorithm must run in $O(n \log n)$ worst-case time.
(d) Let $S$ be a sorted array of $n$ integers. Give an algorithm that finds the pair $x, y \in S$ that minimizes $|x-y|$, for $x \neq y$. Your algorithm must run in $O(n)$ worst-case time.

4-3. Take a sequence of $2n$ real numbers as input. Design an $O(n \log n)$ algorithm that partitions the numbers into $n$ pairs, with the property that the partition minimizes the maximum sum of a pair. For example, say we are given the numbers (1,3,5,9). The possible partitions are ((1,3),(5,9)), ((1,5),(3,9)), and ((1,9),(3,5)). The pair sums for these partitions are (4,14), (6,12), and (10,8). Thus the third partition has 10 as its maximum sum, which is the minimum over the three partitions.

4-4. Assume that we are given $n$ pairs of items as input, where the first item is a number and the second item is one of three colors (red, blue, or yellow). Further assume that the items are sorted by number. Give an $O(n)$ algorithm to sort the items by color (all reds before all blues before all yellows) such that the numbers for identical colors stay sorted. For example: (1,blue), (3,red), (4,blue), (6,yellow), (9,red) should become (3,red), (9,red), (1,blue), (4,blue), (6,yellow).

4-5. The mode of a set of numbers is the number that occurs most frequently in the set. The set $(4,6,2,4,3,1)$ has a mode of 4. Give an efficient and correct algorithm to compute the mode of a set of $n$ numbers.

4-6. Given two sets $S_1$ and $S_2$ (each of size $n$), and a number $x$, describe an $O(n \log n)$ algorithm for finding whether there exists a pair of elements, one from $S_1$ and one from $S_2$, that add up to $x$. (For partial credit, give a $\Theta(n^2)$ algorithm for this problem.)

4-7. Outline a reasonable method of solving each of the following problems. Give the order of the worst-case complexity of your methods.

1. You are given a pile of thousands of telephone bills and thousands of checks sent in to pay the bills. Find out who did not pay.
2. You are given a list containing the title, author, call number and publisher of all the books in a school library and another list of 30 publishers. Find out how many of the books in the library were published by each company.
3. You are given all the book checkout cards used in the campus library during the past year, each of which contains the name of the person who took out the book. Determine how many distinct people checked out at least one book.

4-8. Given a set of $S$ containing $n$ real numbers, and a real number $x$. We seek an algorithm to determine whether two elements of $S$ exist whose sum is exactly $x$.

1. Assume that $S$ is unsorted. Give an $O(n \log n)$ algorithm for the problem.
2. Assume that $S$ is sorted. Give an $O(n)$ algorithm for the problem.

4-9. Give an efficient algorithm to compute the union of sets $A$ and $B$, where $n = \max(|A|,|B|)$. The output should be an array of distinct elements that form the union of the sets, such that they appear more than once in the union.

1. Assume that $A$ and $B$ are unsorted. Give an $O(n \log n)$ algorithm for the problem.
2. Assume that $A$ and $B$ are sorted. Give an $O(n)$ algorithm for the problem.

4-10. Given a set $S$ of $n$ integers and an integer $T$, give an $O( n^{k-1} \log n )$ algorithm to test whether $k$ of the integers in $S$ add up to $T$.

4-11. Design an $O(n)$ algorithm that, given a list of $n$ elements, finds all the elements that appear more than $n/2$ times in the list. Then, design an $O(n)$ algorithm that, given a list of $n$ elements, finds all the elements that appear more than $n/4$ times.

Heaps

4-12. Devise an algorithm for finding the $k$ smallest elements of an unsorted set of $n$ integers in $O(n + k \log n)$.

4-13. You wish to store a set of $n$ numbers in either a max-heap or a sorted array. For each application below, state which data structure is better, or if it does not matter. Explain your answers.

1. Want to find the maximum element quickly.
2. Want to be able to delete an element quickly.
3. Want to be able to form the structure quickly.
4. Want to find the minimum element quickly.

4-14. Give an $O(n \log k)$-time algorithm that merges $k$ sorted lists with a total of $n$ elements into one sorted list. (Hint: use a heap to speed up the elementary $O(k n)$-time algorithm).

4-15. (a) Give an efficient algorithm to find the second-largest key among $n$ keys. You can do better than $2n-3$ comparisons. (b) Then, give an efficient algorithm to find the third-largest key among $n$ keys. How many key comparisons does your algorithm do in the worst case? Must your algorithm determine which key is largest and second-largest in the process?

Quicksort

4-16. Use the partitioning idea of quicksort to give an algorithm that finds the median element of an array of $n$ integers in expected $O(n)$ time. (Hint: must you look at both sides of the partition?)

4-17. The median of a set of $n$ values is the $\lceil n/2 \rceil$th smallest value.

1. Suppose quicksort always pivoted on the median of the current sub-array. How many comparisons would Quicksort make then in the worst case?
2. Suppose quicksort were always to pivot on the $\lceil n/3 \rceil$th smallest value of the current sub-array. How many comparisons would be made then in the worst case?

4-18. Suppose an array $A$ consists of $n$ elements, each of which is red, white, or blue. We seek to sort the elements so that all the reds come before all the whites, which come before all the blues The only operation permitted on the keys are

1. Examine(A,i) -- report the color of the $i$th element of $A$.
2. Swap(A,i,j) -- swap the $i$th element of $A$ with the $j$th element.

Find a correct and efficient algorithm for red-white-blue sorting. There is a linear-time solution. This is also known as the Dutch national flag problem. The simplest linear time solution performs two passes of the partition operation from Quicksort. The first pass treats the red and white elements as indistinguishable, and separates them from the blue. The second pass is just separates the elements within the red/white sub-array.

4-19. An inversion of a permutation is a pair of elements that are out of order.

1. Show that a permutation of $n$ items has at most $n(n-1)/2$ inversions. Which permutation(s) have exactly $n(n-1)/2$ inversions?
2. Let $P$ be a permutation and $P^r$ be the reversal of this permutation. Show that $P$ and $P^r$ have a total of exactly $n(n-1)/2$ inversions.
3. Use the previous result to argue that the expected number of inversions in a random permutation is $n(n-1)/4$.

4-20. Give an efficient algorithm to rearrange an array of $n$ keys so that all the negative keys precede all the nonnegative keys. Your algorithm must be in-place, meaning you cannot allocate another array to temporarily hold the items. How fast is your algorithm?

Other Sorting Algorithms

4-21. Stable sorting algorithms leave equal-key items in the same relative order as in the original permutation. Explain what must be done to ensure that mergesort is a stable sorting algorithm.

4-22. Show that $n$ positive integers in the range 1 to $k$ can be sorted in $O(n \log k)$ time. The interesting case is when $k << n$.

4-23. We seek to sort a sequence $S$ of $n$ integers with many duplications, such that the number of distinct integers in $S$ is $O(\log n)$. Give an $O(n \log \log n)$ worst-case time algorithm to sort such sequences.

4-24. Let $A[1..n]$ be an array such that the first $n-\sqrt n$ elements are already sorted (though we know nothing about the remaining elements). Give an algorithm that sorts $A$ in substantially better than $n\log n$ steps.

4-25. Assume that the array $A[1..n]$ only has numbers from $\{1,\ldots, n^2\}$ but that at most $\log\log n$ of these numbers ever appear. Devise an algorithm that sorts $A$ in substantially less than $O(n\log n)$.

4-26. Consider the problem of sorting a sequence of $n$ 0's and 1's using comparisons. For each comparison of two values $x$ and $y$, the algorithm learns which of $x < y$, $x = y$, or $x > y$ holds.

1. Give an algorithm to sort in $n-1$ comparisons in the worst case. Show that your algorithm is optimal.
2. Give an algorithm to sort in $2n/3$ comparisons in the average case (assuming each of the $n$ inputs is 0 or 1 with equal probability). Show that your algorithm is optimal.

4-27. Let $P$ be a simple, but not necessarily convex, polygon and $q$ an arbitrary point not necessarily in $P$. Design an efficient algorithm to find a line segment originating from $q$ that intersects the maximum number of edges of $P$. In other words, if standing at point $q$, in what direction should you aim a gun so the bullet will go through the largest number of walls. A bullet through a vertex of $P$ gets credit for only one wall. An $O(n \log n)$ algorithm is possible.

Lower Bounds

4-28. sorting algorithm that runs in $O( n \log (\sqrt n) )$. Given the existence of an $\Omega(n \log n)$ lower bound for sorting, how can this be possible? lower bound

4-29. Mr. B. C. Dull claims to have developed a new data structure for priority queues that supports the operations Insert, Maximum, and Extract-Max---all in $O(1)$ worst-case time. Prove that he is mistaken. (Hint: the argument does not involve a lot of gory details---just think about what this would imply about the $\Omega(n \log n)$ lower bound for sorting.)

Searching

4-30. A company database consists of 10,000 sorted names, 40% of whom are known as good customers and who together account for 60% of the accesses to the database. There are two data structure options to consider for representing the database:

1. Put all the names in a single array and use binary search.
2. Put the good customers in one array and the rest of them in a second array.

Only if we do not find the query name on a binary search of the first array do we do a binary search of the second array. Demonstrate which option gives better expected performance. Does this change if linear search on an unsorted array is used instead of binary search for both options?

4-31. Suppose you are given an array $A$ of $n$ sorted numbers that has been circularly shifted $k$ positions to the right. For example, $\{35, 42, 5, 15, 27, 29\}$ is a sorted array that has been circularly shifted $k=2$ positions, while $\{27, 29, 35, 42, 5, 15\}$ has been shifted $k=4$ positions.

1. Suppose you know what $k$ is. Give an $O(1)$ algorithm to find the largest number in $A$.
2. Suppose you do not know what $k$ is. Give an $O(\lg n)$ algorithm to find the largest number in $A$. For partial credit, you may give an $O(n)$ algorithm.

4-32. Consider the numerical 20 Questions game. In this game, Player 1 thinks of a number in the range 1 to $n$. Player 2 has to figure out this number by asking the fewest number of true/false questions. Assume that nobody cheats.

1. What is an optimal strategy if $n$ is known?
2. What is a good strategy if $n$ is not known?

4-33. Suppose that you are given a sorted sequence of distinct integers $\{a_1, a_2, \ldots, a_n\}$. Give an $O(\lg n)$ algorithm to determine whether there exists an $i$ index such as $a_i = i$. For example, in $\{-10,-3,3,5,7\}$, $a_3 = 3$. In $\{2,3,4,5,6,7\}$, there is no such $i$.

4-34. Suppose that you are given a sorted sequence of distinct integers $\{a_1, a_2, \ldots, a_n\}$, drawn from $1$ to $m$ where $n < m$. Give an $O(\lg n)$ algorithm to find an integer $\leq m$ that is not present in $a$. For full credit, find the smallest such integer.

4-35. Let $M$ be an $n \times m$ integer matrix in which the entries of each row are sorted in increasing order (from left to right) and the entries in each column are in increasing order (from top to bottom). Give an efficient algorithm to find the position of an integer $x$ in $M$, or to determine that $x$ is not there. How many comparisons of $x$ with matrix entries does your algorithm use in worst case?

Implementation Challenges

4-36. Consider an $n \times n$ array $A$ containing integer elements (positive, negative, and zero). Assume that the elements in each row of $A$ are in strictly increasing order, and the elements of each column of $A$ are in strictly decreasing order. (Hence there cannot be two zeroes in the same row or the same column.) Describe an efficient algorithm that counts the number of occurrences of the element $0$ in $A$. Analyze its running time.

4-37. Implement versions of several different sorting algorithms, such as selection sort, insertion sort, heapsort, mergesort, and quicksort. Conduct experiments to assess the relative performance of these algorithms in a simple application that reads a large text file and reports exactly one instance of each word that appears within it. This application can be efficiently implemented by sorting all the words that occur in the text and then passing through the sorted sequence to identify one instance of each distinct word. Write a brief report with your conclusions.

4-38. Implement an external sort, which uses intermediate files to sort files bigger than main memory. Mergesort is a good algorithm to base such an implementation on. Test your program both on files with small records and on files with large records.

4-39. Design and implement a parallel sorting algorithm that distributes data across several processors. An appropriate variation of mergesort is a likely candidate. Measure the speedup of this algorithm as the number of processors increases. Later, compare the execution time to that of a purely sequential mergesort implementation. What are your experiences?

Interview Problems

4-40. If you are given a million integers to sort, what algorithm would you use to sort them? How much time and memory would that consume?

4-42. Implement an algorithm that takes an input array and returns only the unique elements in it.

4-43. You have a computer with only 2Mb of main memory. How do you use it to sort a large file of 500 Mb that is on disk?

4-44. Design a stack that supports push, pop, and retrieving the minimum element in constant time. Can you do this?

4-45. Given a search string of three words, find the smallest snippet of the document that contains all three of the search words---i.e., the snippet with smallest number of words in it. You are given the index positions where these words occur in the document, such as word1: (1, 4, 5), word2: (3, 9, 10), and word3: (2, 6, 15). Each of the lists are in sorted order, as above.

4-46. You are given 12 coins. One of them is heavier or lighter than the rest. Identify this coin in just three weighings.

Note - weighings are with a balance, where you can have a greater than, equal to, or less than result. You can't do this with a digital scale.