# Difference between revisions of "Sorting-searching-TADM2E"

m (question clarification) |
(adding a solution for 4.40) |
||

Line 437: | Line 437: | ||

How much time and | How much time and | ||

memory would that consume? | memory would that consume? | ||

+ | |||

+ | [[TADM2E 4.40|(Solution 4.40)]] | ||

<br>4-41. | <br>4-41. |

## Revision as of 21:41, 8 September 2015

# Sorting and Searching

**Applications of Sorting**

4-1.
The Grinch is given the job of partitioning $ 2n $ players into
two teams of $ n $ players each.
Each player has a numerical rating that measures how good he/she is
at the game.
He seeks to divide the players as *unfairly*
as possible, so as to create the biggest possible talent imbalance between team
$ A $ and team $ B $.
Show how the Grinch can do the job in $ O(n \log n) $ time.
*Grinch, The*

4-2.
For each of the following problems, give an algorithm that finds the
desired numbers within the given amount of time.
To keep your answers brief, feel free to use algorithms from the
book as subroutines.
For the example, $ S = \{6, 13, 19, 3, 8\} $, $ 19-3 $ maximizes the difference,
while $ 8-6 $ minimizes the difference.

(a) Let $ S $ be an *unsorted* array of $ n $ integers.
Give an algorithm that finds the pair $ x, y \in S $ that *maximizes*
$ |x-y| $.
Your algorithm must run in $ O(n) $ worst-case time.

(b) Let $ S $ be a *sorted* array of $ n $ integers.
Give an algorithm that finds the pair $ x, y \in S $ that *maximizes*
$ |x-y| $.
Your algorithm must run in $ O(1) $ worst-case time.

(c) Let $ S $ be an *unsorted* array of $ n $ integers.
Give an algorithm that finds the pair $ x, y \in S $ that *minimizes*
$ |x-y| $, for $ x \neq y $.
Your algorithm must run in $ O(n \log n) $ worst-case time.

(d) Let $ S $ be a *sorted* array of $ n $ integers.
Give an algorithm that finds the pair $ x, y \in S $ that *minimizes*
$ |x-y| $, for $ x \neq y $.
Your algorithm must run in $ O(n) $ worst-case time.

4-3.
Take a sequence of $ 2n $ real numbers as input.
Design an $ O(n \log n) $ algorithm that partitions the numbers into $ n $ pairs,
with the property that the partition minimizes the maximum sum of a pair.
For example, say we are given the numbers (1,3,5,9).
The possible partitions are ((1,3),(5,9)), ((1,5),(3,9)), and ((1,9),(3,5)).
The pair sums for these partitions are (4,14), (6,12), and (10,8).
Thus the third partition has 10 as its maximum sum, which is the minimum
over the three partitions.

4-4.
Assume that we are given $ n $ pairs of items as input, where the first
item is a number and the second item is one of three colors
(red, blue, or yellow). Further assume that the items are sorted by number.
Give an $ O(n) $ algorithm to sort the items by color (all reds before all blues
before all yellows) such that the numbers for identical colors stay sorted.
For example: (1,blue), (3,red), (4,blue), (6,yellow), (9,red) should
become (3,red), (9,red), (1,blue), (4,blue), (6,yellow).

4-5.
The *mode* of a set of numbers is the number that occurs most frequently
in the set.
The set $ (4,6,2,4,3,1) $ has a mode of 4.
Give an efficient and correct algorithm to compute the mode of a set of
$ n $ numbers.

4-6.
Given two sets $ S_1 $ and $ S_2 $ (each of size $ n $), and a number $ x $,
describe an $ O(n \log n) $
algorithm for finding whether there exists a pair of elements, one from
$ S_1 $ and one from $ S_2 $, that add up to $ x $.
(For partial credit, give a $ \Theta(n^2) $ algorithm for this problem.)

4-7.
Outline a reasonable method of solving each of the
following problems.
Give the order of the worst-case complexity of your
methods.

- You are given a pile of thousands of telephone bills and thousands of checks sent in to pay the bills. Find out who did not pay.
- You are given a list containing the title, author, call number and publisher of all the books in a school library and another list of 30 publishers. Find out how many of the books in the library were published by each company.
- You are given all the book checkout cards used in the campus library during the past year, each of which contains the name of the person who took out the book. Determine how many distinct people checked out at least one book.

4-8.
Given a set of $ S $ containing $ n $ real numbers, and a real number $ x $.
We seek an algorithm to determine whether two elements
of $ S $ exist whose sum is exactly $ x $.

- Assume that $ S $ is unsorted. Give an $ O(n \log n) $ algorithm for the problem.
- Assume that $ S $ is sorted. Give an $ O(n) $ algorithm for the problem.

4-9.
Give an efficient algorithm to compute the union of sets $ A $ and $ B $,
where $ n = \max(|A|,|B|) $.
The output should be an array of distinct
elements that form the union of the sets, such that
they appear more than once in the union.

- Assume that $ A $ and $ B $ are unsorted. Give an $ O(n \log n) $ algorithm for the problem.
- Assume that $ A $ and $ B $ are sorted. Give an $ O(n) $ algorithm for the problem.

4-10.
Given a set $ S $ of $ n $ integers and an integer $ T $, give
an $ O( n^{k-1} \log n ) $
algorithm to test whether $ k $ of the integers in $ S $ add up to $ T $.

4-11.
Design an $ O(n) $ algorithm that, given a list of $ n $ elements,
finds all the elements that appear more than $ n/2 $ times in the list.
*Then*,
design an $ O(n) $ algorithm that, given a list of $ n $ elements,
finds all the elements
that appear more than $ n/4 $ times.

**Heaps**

4-12.
Devise an algorithm for finding the $ k $ smallest elements of an unsorted
set of $ n $ integers in $ O(n + k \log n) $.

4-13.
You wish to store a set of $ n $ numbers in
either a max-heap or a sorted array.
For each application below,
state which data structure is better, or
if it does not matter.
Explain your answers.

- Want to find the maximum element quickly.
- Want to be able to delete an element quickly.
- Want to be able to form the structure quickly.
- Want to find the minimum element quickly.

4-14.
Give an $ O(n \log k) $-time algorithm that merges $ k $ sorted lists with a
total of $ n $ elements into one sorted list.
(Hint: use a heap to speed up the elementary $ O(k n) $-time algorithm).

4-15.
(a) Give an efficient algorithm to find the second-largest key among $ n $ keys.
You can do better than $ 2n-3 $ comparisons.
(b) Then, give an efficient algorithm to find the third-largest key among $ n $ keys.
How many key comparisons does your algorithm do in the worst case?
Must your algorithm determine which key is largest and second-largest in
the process?

**Quicksort**

4-16.
Use the partitioning idea of quicksort to give an algorithm that
finds the *median* element of an array of $ n $ integers in expected
$ O(n) $ time. (Hint: must you look at both sides of the partition?)

4-17.
The *median* of a set of $ n $ values is the $ \lceil n/2 \rceil $th
smallest value.

- Suppose quicksort always pivoted on the median of the current sub-array. How many comparisons would Quicksort make then in the worst case?
- Suppose quicksort were always to pivot on the $ \lceil n/3 \rceil $th smallest value of the current sub-array. How many comparisons would be made then in the worst case?

4-18.
Suppose an array $ A $ consists of $ n $ elements, each of which is
*red*, *white*, or *blue*.
We seek to sort the elements so that all the *reds*
come before all the *whites*, which
come before all the *blues*
The only operation permitted on the
keys are

*Examine(A,i)*-- report the color of the $ i $th element of $ A $.*Swap(A,i,j)*-- swap the $ i $th element of $ A $ with the $ j $th element.

Find a correct and efficient algorithm for red-white-blue sorting. There is a linear-time solution. This is also known as the Dutch national flag problem. The simplest linear time solution performs two passes of the partition operation from Quicksort. The first pass treats the red and white elements as indistinguishable, and separates them from the blue. The second pass is just separates the elements within the red/white sub-array.

4-19.
An *inversion* of a permutation is a pair of elements
that are out of order.

- Show that a permutation of $ n $ items has at most $ n(n-1)/2 $ inversions. Which permutation(s) have exactly $ n(n-1)/2 $ inversions?
- Let $ P $ be a permutation and $ P^r $ be the reversal of this permutation. Show that $ P $ and $ P^r $ have a total of exactly $ n(n-1)/2 $ inversions.
- Use the previous result to argue that the expected number of inversions in a random permutation is $ n(n-1)/4 $.

4-20.
Give an efficient algorithm to rearrange an array of $ n $ keys so
that all the negative keys precede all the nonnegative keys.
Your algorithm must be in-place, meaning you cannot allocate another
array to temporarily hold the items.
How fast is your algorithm?

**Other Sorting Algorithms**

4-21.
Stable sorting algorithms
leave equal-key items in the same relative order as
in the original permutation.
Explain what must be done to ensure that mergesort is a stable sorting algorithm.

4-22.
Show that $ n $ positive integers in the range 1 to $ k $ can be sorted in
$ O(n \log k) $ time.
The interesting case is when $ k << n $.

4-23.
We seek to sort a sequence $ S $ of $ n $ integers with many duplications,
such that the number of distinct integers in $ S $ is $ O(\log n) $.
Give an $ O(n \log \log n) $ worst-case time algorithm to sort such sequences.

4-24.
Let $ A[1..n] $ be an array such that the first $ n-\sqrt n $ elements
are already sorted (though we know nothing about the remaining
elements).
Give an algorithm that sorts $ A $ in substantially better than
$ n\log n $ steps.

4-25.
Assume that the array $ A[1..n] $ only has numbers
from $ \{1,\ldots, n^2\} $ but that at most $ \log\log n $ of
these numbers ever appear.
Devise an algorithm that sorts $ A $ in substantially less
than $ O(n\log n) $.

4-26.
Consider the problem of sorting a sequence of $ n $ 0's and 1's
using comparisons.
For each comparison of two values $ x $ and $ y $, the algorithm learns
which of $ x < y $, $ x = y $, or $ x > y $ holds.

- Give an algorithm to sort in $ n-1 $ comparisons in the worst case. Show that your algorithm is optimal.
- Give an algorithm to sort in $ 2n/3 $ comparisons in the average case (assuming each of the $ n $ inputs is 0 or 1 with equal probability). Show that your algorithm is optimal.

4-27.
Let $ P $ be a simple, but not necessarily convex, polygon and $ q $
an arbitrary point not necessarily in $ P $.
Design an efficient algorithm to find a line segment originating from $ q $
that intersects the maximum number of edges of $ P $.
In other words, if standing at point $ q $, in what direction should you aim
a gun so the bullet will go through the largest number of walls.
A bullet through a vertex of $ P $ gets credit for only one wall.
An $ O(n \log n) $ algorithm is possible.

**Lower Bounds**

4-28.
sorting algorithm that runs in $ O( n \log (\sqrt n) ) $.
Given the existence of an
$ \Omega(n \log n) $ lower bound for sorting, how can this be possible?
*lower bound*

4-29.
Mr. B. C. Dull claims to have developed a new data structure for
priority queues that supports the operations *Insert*,
*Maximum*, and *Extract-Max*---all in $ O(1) $ worst-case time.
Prove that he is mistaken.
(Hint: the argument does not involve a lot of gory details---just think
about what this would imply about the $ \Omega(n \log n) $ lower bound for
sorting.)

**Searching**

4-30.
A company database consists of 10,000 sorted names, 40% of whom are known as
good customers and who together account for 60% of the accesses to the
database.
There are two data structure options to consider for representing the database:

- Put all the names in a single array and use binary search.
- Put the good customers in one array and the rest of them in a second array.

Only if we do not find the query name on a binary search of the first array do we do a binary search of the second array. Demonstrate which option gives better expected performance. Does this change if linear search on an unsorted array is used instead of binary search for both options?

4-31.
Suppose you are given an array $ A $ of $ n $ sorted numbers that has been
*circularly shifted* $ k $ positions to the right.
For example, $ \{35, 42, 5, 15, 27, 29\} $ is a sorted array that has been
circularly shifted $ k=2 $ positions, while $ \{27, 29, 35, 42, 5, 15\} $
has been shifted $ k=4 $ positions.

- Suppose you know what $ k $ is. Give an $ O(1) $ algorithm to find the largest number in $ A $.
- Suppose you
*do not*know what $ k $ is. Give an $ O(\lg n) $ algorithm to find the largest number in $ A $. For partial credit, you may give an $ O(n) $ algorithm.

4-32.
Consider the numerical 20 Questions game.
In this game, Player 1 thinks of a number in the range 1 to $ n $.
Player 2 has to figure out this number by asking the fewest number of
true/false questions.
Assume that nobody cheats.

- What is an optimal strategy if $ n $ is known?
- What is a good strategy if $ n $ is not known?

4-33.
Suppose that you are given a sorted sequence of *distinct* integers
$ \{a_1, a_2, \ldots, a_n\} $.
Give an $ O(\lg n) $ algorithm to determine whether there exists an $ i $ index
such as $ a_i = i $.
For example, in $ \{-10,-3,3,5,7\} $, $ a_3 = 3 $.
In $ \{2,3,4,5,6,7\} $, there is no such $ i $.

4-34.
Suppose that you are given a sorted sequence of *distinct* integers
$ \{a_1, a_2, \ldots, a_n\} $, drawn from $ 1 $ to $ m $ where $ n < m $.
Give an $ O(\lg n) $ algorithm to find an integer $ \leq m $ that is not present in $ a $.
For full credit, find the smallest such integer.

4-35.
Let $ M $ be an $ n \times m $ integer matrix in which the entries
of each row are sorted in increasing order (from left to right)
and the entries in each column are
in increasing order (from top to bottom).
Give an efficient algorithm
to find the position of an integer $ x $ in $ M $, or to determine that $ x $ is
not there.
How many comparisons of $ x $ with matrix entries does your
algorithm use in worst case?

**Implementation Challenges**

4-36.
Consider an $ n \times n $ array $ A $ containing integer elements (positive,
negative, and zero).
Assume that the elements in each row of $ A $ are in strictly increasing
order, and the elements of each column of $ A $ are in strictly decreasing
order.
(Hence there cannot be two zeroes in the same row or the same column.)
Describe an efficient algorithm that counts the number of occurrences
of the element $ 0 $ in $ A $. Analyze its running time.

4-37.
Implement versions of several different
sorting algorithms, such as selection sort, insertion sort, heapsort,
mergesort, and quicksort.
Conduct experiments to assess the relative performance of these algorithms
in a simple application that reads a large text file
and reports exactly one instance of each word that appears within it.
This application can be efficiently implemented by sorting all the
words that occur in the text and then passing through the sorted sequence
to identify one instance of each distinct word.
Write a brief report with your conclusions.

4-38.
Implement an external sort, which uses intermediate files
to sort files bigger than main memory.
Mergesort is a good algorithm to base such an implementation on.
Test your program both on files with
small records and on files with large records.

4-39.
Design and implement a parallel sorting algorithm that distributes data
across several processors.
An appropriate variation of mergesort is a likely candidate.
Measure the speedup of this algorithm as the number of processors increases.
Later, compare the execution time to that of a purely sequential mergesort
implementation.
What are your experiences?

**Interview Problems**

4-40.
If you are given a million integers to sort,
what algorithm would you use to sort them?
How much time and
memory would that consume?

4-41.
Describe advantages and disadvantages of the most popular sorting
algorithms.

4-42.
Implement an algorithm that takes an input array and returns only the
unique elements in it.

4-43.
You have a computer with only 2Mb of main memory.
How do you use it to sort a large file of 500
Mb that is on disk?

4-44.
Design a stack that supports push, pop, and retrieving the minimum
element in constant time.
Can you do this?

4-45.
Given a search string of three words,
find the smallest snippet of the document that contains all three of the
search
words---i.e., the snippet with smallest number of words in it.
You are given the
index positions where these words occur in the document, such as
*word1: (1, 4, 5)*, *word2: (3, 9, 10)*, and *word3: (2, 6, 15)*.
Each of the lists are in sorted order, as above.

4-46.
You are given 12 coins. One of them is heavier or lighter than the rest.
Identify this coin in just three weighings.

Note - weighings are with a balance, where you can have a greater than, equal to, or less than result. You can't do this with a digital scale.