# Difference between revisions of "Sorting-searching-TADM2E"

(Recovering wiki) |
(Recovering wiki) |
||

Line 5: | Line 5: | ||

'''Applications of Sorting''' | '''Applications of Sorting''' | ||

− | + | <br>4-1. | |

− | The Grinch is given the job of partitioning | + | The Grinch is given the job of partitioning <math>2n</math> players into |

− | two teams of | + | two teams of <math>n</math> players each. |

Each player has a numerical rating that measures how good he/she is | Each player has a numerical rating that measures how good he/she is | ||

at the game. | at the game. | ||

He seeks to divide the players as ''unfairly'' | He seeks to divide the players as ''unfairly'' | ||

as possible, so as to create the biggest possible talent imbalance between team | as possible, so as to create the biggest possible talent imbalance between team | ||

− | + | <math>A</math> and team <math>B</math>. | |

− | Show how the Grinch can do the job in | + | Show how the Grinch can do the job in <math>O(n \log n)</math> time. |

''Grinch, The'' | ''Grinch, The'' | ||

[[TADM2E 4.1|(Solution 4.1)]] | [[TADM2E 4.1|(Solution 4.1)]] | ||

− | + | <br>4-2. | |

For each of the following problems, give an algorithm that finds the | For each of the following problems, give an algorithm that finds the | ||

desired numbers within the given amount of time. | desired numbers within the given amount of time. | ||

To keep your answers brief, feel free to use algorithms from the | To keep your answers brief, feel free to use algorithms from the | ||

book as subroutines. | book as subroutines. | ||

− | For the example, | + | For the example, <math>S = \{6, 13, 19, 3, 8\}</math>, <math>19-3</math> maximizes the difference, |

− | while | + | while <math>8-6</math> minimizes the difference. |

− | + | <br> | |

− | (a) Let | + | (a) Let <math>S</math> be an ''unsorted'' array of <math>n</math> integers. |

− | Give an algorithm that finds the pair | + | Give an algorithm that finds the pair <math>x, y \in S</math> that ''maximizes'' |

− | + | <math>|x-y|</math>. | |

− | Your algorithm must run in | + | Your algorithm must run in <math>O(n)</math> worst-case time. |

− | + | <br> | |

− | (b) Let | + | (b) Let <math>S</math> be a ''sorted'' array of <math>n</math> integers. |

− | Give an algorithm that finds the pair | + | Give an algorithm that finds the pair <math>x, y \in S</math> that ''maximizes'' |

− | + | <math>|x-y|</math>. | |

− | Your algorithm must run in | + | Your algorithm must run in <math>O(1)</math> worst-case time. |

− | + | <br> | |

− | (c) Let | + | (c) Let <math>S</math> be an ''unsorted'' array of <math>n</math> integers. |

− | Give an algorithm that finds the pair | + | Give an algorithm that finds the pair <math>x, y \in S</math> that ''minimizes'' |

− | + | <math>|x-y|</math>, for <math>x \neq y</math>. | |

− | Your algorithm must run in | + | Your algorithm must run in <math>O(n \log n)</math> worst-case time. |

− | + | <br> | |

− | (d) Let | + | (d) Let <math>S</math> be a ''sorted'' array of <math>n</math> integers. |

− | Give an algorithm that finds the pair | + | Give an algorithm that finds the pair <math>x, y \in S</math> that ''minimizes'' |

− | + | <math>|x-y|</math>, for <math>x \neq y</math>. | |

− | Your algorithm must run in | + | Your algorithm must run in <math>O(n)</math> worst-case time. |

[[TADM2E 4.2|(Solution 4.2)]] | [[TADM2E 4.2|(Solution 4.2)]] | ||

− | + | <br>4-3. | |

− | Take a sequence of | + | Take a sequence of <math>2n</math> real numbers as input. |

− | Design an | + | Design an <math>O(n \log n)</math> algorithm that partitions the numbers into <math>n</math> pairs, |

with the property that the partition minimizes the maximum sum of a pair. | with the property that the partition minimizes the maximum sum of a pair. | ||

For example, say we are given the numbers (1,3,5,9). | For example, say we are given the numbers (1,3,5,9). | ||

Line 60: | Line 60: | ||

[[TADM2E 4.3|(Solution 4.3)]] | [[TADM2E 4.3|(Solution 4.3)]] | ||

− | + | <br>4-4. | |

− | Assume that we are given | + | Assume that we are given <math>n</math> pairs of items as input, where the first |

item is a number and the second item is one of three colors | item is a number and the second item is one of three colors | ||

(red, blue, or yellow). Further assume that the items are sorted by number. | (red, blue, or yellow). Further assume that the items are sorted by number. | ||

− | Give an | + | Give an <math>O(n)</math> algorithm to sort the items by color (all reds before all blues |

before all yellows) such that the numbers for identical colors stay sorted. | before all yellows) such that the numbers for identical colors stay sorted. | ||

For example: (1,blue), (3,red), (4,blue), (6,yellow), (9,red) should | For example: (1,blue), (3,red), (4,blue), (6,yellow), (9,red) should | ||

Line 71: | Line 71: | ||

[[TADM2E 4.4| (Solution 4.4)]] | [[TADM2E 4.4| (Solution 4.4)]] | ||

− | + | <br>4-5. | |

The ''mode'' of a set of numbers is the number that occurs most frequently | The ''mode'' of a set of numbers is the number that occurs most frequently | ||

in the set. | in the set. | ||

− | The set | + | The set <math>(4,6,2,4,3,1)</math> has a mode of 4. |

Give an efficient and correct algorithm to compute the mode of a set of | Give an efficient and correct algorithm to compute the mode of a set of | ||

− | + | <math>n</math> numbers. | |

[[TADM2E 4.5|(Solution 4.5)]] | [[TADM2E 4.5|(Solution 4.5)]] | ||

− | + | <br>4-6. | |

− | Given two sets | + | Given two sets <math>S_1</math> and <math>S_2</math> (each of size <math>n</math>), and a number <math>x</math>, |

− | describe an | + | describe an <math>O(n \log n)</math> |

algorithm for finding whether there exists a pair of elements, one from | algorithm for finding whether there exists a pair of elements, one from | ||

− | + | <math>S_1</math> and one from <math>S_2</math>, that add up to <math>x</math>. | |

− | (For partial credit, give a | + | (For partial credit, give a <math>\Theta(n^2)</math> algorithm for this problem.) |

[[TADM2E 4.6|(Solution 4.6)]] | [[TADM2E 4.6|(Solution 4.6)]] | ||

− | + | <br>4-7. | |

Outline a reasonable method of solving each of the | Outline a reasonable method of solving each of the | ||

following problems. | following problems. | ||

Line 100: | Line 100: | ||

[[TADM2E 4.7|(Solution 4.7)]] | [[TADM2E 4.7|(Solution 4.7)]] | ||

− | + | <br>4-8. | |

− | Given a set of | + | Given a set of <math>S</math> containing <math>n</math> real numbers, and a real number <math>x</math>. |

We seek an algorithm to determine whether two elements | We seek an algorithm to determine whether two elements | ||

− | of | + | of <math>S</math> exist whose sum is exactly <math>x</math>. |

− | #Assume that | + | #Assume that <math>S</math> is unsorted. Give an <math>O(n \log n)</math> algorithm for the problem. |

− | #Assume that | + | #Assume that <math>S</math> is sorted. Give an <math>O(n)</math> algorithm for the problem. |

[[TADM2E 4.8|(Solution 4.8)]] | [[TADM2E 4.8|(Solution 4.8)]] | ||

− | + | <br>4-9. | |

− | Give an efficient algorithm to compute the union of sets | + | Give an efficient algorithm to compute the union of sets <math>A</math> and <math>B</math>, |

− | where | + | where <math>n = \max(|A|,|B|)</math>. |

The output should be an array of distinct | The output should be an array of distinct | ||

elements that form the union of the sets, such that | elements that form the union of the sets, such that | ||

they appear more than once in the union. | they appear more than once in the union. | ||

− | #Assume that | + | #Assume that <math>A</math> and <math>B</math> are unsorted. Give an <math>O(n \log n)</math> algorithm for the problem. |

− | #Assume that | + | #Assume that <math>A</math> and <math>B</math> are sorted. Give an <math>O(n)</math> algorithm for the problem. |

[[TADM2E 4.9|(Solution 4.9)]] | [[TADM2E 4.9|(Solution 4.9)]] | ||

− | + | <br>4-10. | |

− | Given a set | + | Given a set <math>S</math> of <math>n</math> integers and an integer <math>T</math>, give |

− | an | + | an <math>O( n^{k-1} \log n )</math> |

− | algorithm to test whether | + | algorithm to test whether <math>k</math> of the integers in <math>S</math> add up to <math>T</math>. |

− | + | <br>4-11. | |

− | Design an | + | Design an <math>O(n)</math> algorithm that, given a list of <math>n</math> elements, |

− | finds all the elements that appear more than | + | finds all the elements that appear more than <math>n/2</math> times in the list. |

''Then'', | ''Then'', | ||

− | design an | + | design an <math>O(n)</math> algorithm that, given a list of <math>n</math> elements, |

finds all the elements | finds all the elements | ||

− | that appear more than | + | that appear more than <math>n/4</math> times. |

[[TADM2E 4.11|(Solution 4.11)]] | [[TADM2E 4.11|(Solution 4.11)]] | ||

Line 139: | Line 139: | ||

'''Heaps''' | '''Heaps''' | ||

− | + | <br>4-12. | |

− | Devise an algorithm for finding the | + | Devise an algorithm for finding the <math>k</math> smallest elements of an unsorted |

− | set of | + | set of <math>n</math> integers in <math>O(n + k \log n)</math>. |

[[TADM2E 4.12|(Solution 4.12)]] | [[TADM2E 4.12|(Solution 4.12)]] | ||

− | + | <br>4-13. | |

− | You wish to store a set of | + | You wish to store a set of <math>n</math> numbers in |

either a max-heap or a sorted array. | either a max-heap or a sorted array. | ||

For each application below, | For each application below, | ||

Line 159: | Line 159: | ||

[[TADM2E 4.13|(Solution 4.13)]] | [[TADM2E 4.13|(Solution 4.13)]] | ||

− | + | <br>4-14. | |

− | Give an | + | Give an <math>O(n \log k)</math>-time algorithm that merges <math>k</math> sorted lists with a |

− | total of | + | total of <math>n</math> elements into one sorted list. |

− | (Hint: use a heap to speed up the elementary | + | (Hint: use a heap to speed up the elementary <math>O(k n)</math>-time algorithm). |

[[TADM2E 4.14|(Solution 4.14)]] | [[TADM2E 4.14|(Solution 4.14)]] | ||

− | + | <br>4-15. | |

− | (a) Give an efficient algorithm to find the second-largest key among | + | (a) Give an efficient algorithm to find the second-largest key among <math>n</math> keys. |

− | You can do better than | + | You can do better than <math>2n-3</math> comparisons. |

− | (b) Then, give an efficient algorithm to find the third-largest key among | + | (b) Then, give an efficient algorithm to find the third-largest key among <math>n</math> keys. |

How many key comparisons does your algorithm do in the worst case? | How many key comparisons does your algorithm do in the worst case? | ||

Must your algorithm determine which key is largest and second-largest in | Must your algorithm determine which key is largest and second-largest in | ||

Line 180: | Line 180: | ||

'''Quicksort''' | '''Quicksort''' | ||

− | + | <br>4-16. | |

Use the partitioning idea of quicksort to give an algorithm that | Use the partitioning idea of quicksort to give an algorithm that | ||

− | finds the ''median'' element of an array of | + | finds the ''median'' element of an array of <math>n</math> integers in expected |

− | + | <math>O(n)</math> time. (Hint: must you look at both sides of the partition?) | |

− | + | <br>4-17. | |

− | The ''median'' of a set of | + | The ''median'' of a set of <math>n</math> values is the <math>\lceil n/2 \rceil</math>th |

smallest value. | smallest value. | ||

#Suppose quicksort always pivoted on the median of the current sub-array. How many comparisons would Quicksort make then in the worst case? | #Suppose quicksort always pivoted on the median of the current sub-array. How many comparisons would Quicksort make then in the worst case? | ||

− | #Suppose quicksort were always to pivot on the | + | #Suppose quicksort were always to pivot on the <math>\lceil n/3 \rceil</math>th smallest value of the current sub-array. How many comparisons would be made then in the worst case? |

[[TADM2E 4.17|(Solution 4.17)]] | [[TADM2E 4.17|(Solution 4.17)]] | ||

− | + | <br>4-18. | |

− | Suppose an array | + | Suppose an array <math>A</math> consists of <math>n</math> elements, each of which is |

''red'', ''white'', or ''blue''. | ''red'', ''white'', or ''blue''. | ||

We seek to sort the elements so that all the ''reds'' | We seek to sort the elements so that all the ''reds'' | ||

Line 201: | Line 201: | ||

The only operation permitted on the | The only operation permitted on the | ||

keys are | keys are | ||

− | #''Examine(A,i)'' -- report the color of the | + | #''Examine(A,i)'' -- report the color of the <math>i</math>th element of <math>A</math>. |

− | #''Swap(A,i,j)'' -- swap the | + | #''Swap(A,i,j)'' -- swap the <math>i</math>th element of <math>A</math> with the <math>j</math>th element. |

Find a correct and efficient algorithm for red-white-blue sorting. | Find a correct and efficient algorithm for red-white-blue sorting. | ||

There is a | There is a | ||

Line 213: | Line 213: | ||

The second pass is just separates the elements within the red/white sub-array. | The second pass is just separates the elements within the red/white sub-array. | ||

− | + | <br>4-19. | |

An ''inversion'' of a permutation is a pair of elements | An ''inversion'' of a permutation is a pair of elements | ||

that are out of order. | that are out of order. | ||

− | #Show that a permutation of | + | #Show that a permutation of <math>n</math> items has at most <math>n(n-1)/2</math> inversions. Which permutation(s) have exactly <math>n(n-1)/2</math> inversions? |

− | #Let | + | #Let <math>P</math> be a permutation and <math>P^r</math> be the reversal of this permutation. Show that <math>P</math> and <math>P^r</math> have a total of exactly <math>n(n-1)/2</math> inversions. |

− | #Use the previous result to argue that the expected number of inversions in a random permutation is | + | #Use the previous result to argue that the expected number of inversions in a random permutation is <math>n(n-1)/4</math>. |

[[TADM2E 4.19|(Solution 4.19)]] | [[TADM2E 4.19|(Solution 4.19)]] | ||

− | + | <br>4-20. | |

− | Give an efficient algorithm to rearrange an array of | + | Give an efficient algorithm to rearrange an array of <math>n</math> keys so |

that all the negative keys precede all the nonnegative keys. | that all the negative keys precede all the nonnegative keys. | ||

Your algorithm must be in-place, meaning you cannot allocate another | Your algorithm must be in-place, meaning you cannot allocate another | ||

Line 233: | Line 233: | ||

'''Other Sorting Algorithms''' | '''Other Sorting Algorithms''' | ||

− | + | <br>4-21. | |

Stable sorting algorithms | Stable sorting algorithms | ||

leave equal-key items in the same relative order as | leave equal-key items in the same relative order as | ||

Line 241: | Line 241: | ||

[[TADM2E 4.21|(Solution 4.21)]] | [[TADM2E 4.21|(Solution 4.21)]] | ||

− | + | <br>4-22. | |

− | Show that | + | Show that <math>n</math> positive integers in the range 1 to <math>k</math> can be sorted in |

− | + | <math>O(n \log k)</math> time. | |

− | The interesting case is when | + | The interesting case is when <math>k << n</math>. |

− | + | <br>4-23. | |

− | We seek to sort a sequence | + | We seek to sort a sequence <math>S</math> of <math>n</math> integers with many duplications, |

− | such that the number of distinct integers in | + | such that the number of distinct integers in <math>S</math> is <math>O(\log n)</math>. |

− | Give an | + | Give an <math>O(n \log \log n)</math> worst-case time algorithm to sort such sequences. |

[[TADM2E 4.23|(Solution 4.23)]] | [[TADM2E 4.23|(Solution 4.23)]] | ||

− | + | <br>4-24. | |

− | Let | + | Let <math>A[1..n]</math> be an array such that the first <math>n-\sqrt n </math> elements |

are already sorted (though we know nothing about the remaining | are already sorted (though we know nothing about the remaining | ||

elements). | elements). | ||

− | Give an algorithm that sorts | + | Give an algorithm that sorts <math>A</math> in substantially better than |

− | + | <math>n\log n</math> steps. | |

[[TADM2E 4.24|(Solution 4.24)]] | [[TADM2E 4.24|(Solution 4.24)]] | ||

− | + | <br>4-25. | |

− | Assume that the array | + | Assume that the array <math>A[1..n]</math> only has numbers |

− | from | + | from <math>\{1,\ldots, n^2\}</math> but that at most <math>\log\log n</math> of |

these numbers ever appear. | these numbers ever appear. | ||

− | Devise an algorithm that sorts | + | Devise an algorithm that sorts <math>A</math> in substantially less |

− | than | + | than <math>O(n\log n)</math>. |

[[TADM2E 4.25|(Solution 4.25)]] | [[TADM2E 4.25|(Solution 4.25)]] | ||

− | + | <br>4-26. | |

− | Consider the problem of sorting a sequence of | + | Consider the problem of sorting a sequence of <math>n</math> 0's and 1's |

using comparisons. | using comparisons. | ||

− | For each comparison of two values | + | For each comparison of two values <math>x</math> and <math>y</math>, the algorithm learns |

− | which of | + | which of <math>x < y</math>, <math>x = y</math>, or <math>x > y</math> holds. |

− | #Give an algorithm to sort in | + | #Give an algorithm to sort in <math>n-1</math> comparisons in the worst case. Show that your algorithm is optimal. |

− | #Give an algorithm to sort in | + | #Give an algorithm to sort in <math>2n/3</math> comparisons in the average case (assuming each of the <math>n</math> inputs is 0 or 1 with equal probability). Show that your algorithm is optimal. |

− | + | <br>4-27. | |

− | Let | + | Let <math>P</math> be a simple, but not necessarily convex, polygon and <math>q</math> |

− | an arbitrary point not necessarily in | + | an arbitrary point not necessarily in <math>P</math>. |

− | Design an efficient algorithm to find a line segment originating from | + | Design an efficient algorithm to find a line segment originating from <math>q</math> |

− | that intersects the maximum number of edges of | + | that intersects the maximum number of edges of <math>P</math>. |

− | In other words, if standing at point | + | In other words, if standing at point <math>q</math>, in what direction should you aim |

a gun so the bullet will go through the largest number of walls. | a gun so the bullet will go through the largest number of walls. | ||

− | A bullet through a vertex of | + | A bullet through a vertex of <math>P</math> gets credit for only one wall. |

− | An | + | An <math>O(n \log n)</math> algorithm is possible. |

[[TADM2E 4.27|(Solution 4.27)]] | [[TADM2E 4.27|(Solution 4.27)]] | ||

Line 295: | Line 295: | ||

'''Lower Bounds''' | '''Lower Bounds''' | ||

− | + | <br>4-28. | |

− | sorting algorithm that runs in | + | sorting algorithm that runs in <math>O( n \log (\sqrt n) )</math>. |

Given the existence of an | Given the existence of an | ||

− | + | <math>\Omega(n \log n)</math> lower bound for sorting, how can this be possible? | |

''lower bound'' | ''lower bound'' | ||

− | + | <br>4-29. | |

Mr. B. C. Dull claims to have developed a new data structure for | Mr. B. C. Dull claims to have developed a new data structure for | ||

priority queues that supports the operations ''Insert'', | priority queues that supports the operations ''Insert'', | ||

− | ''Maximum'', and ''Extract-Max''---all in | + | ''Maximum'', and ''Extract-Max''---all in <math>O(1)</math> worst-case time. |

Prove that he is mistaken. | Prove that he is mistaken. | ||

(Hint: the argument does not involve a lot of gory details---just think | (Hint: the argument does not involve a lot of gory details---just think | ||

− | about what this would imply about the | + | about what this would imply about the <math>\Omega(n \log n)</math> lower bound for |

sorting.) | sorting.) | ||

Line 316: | Line 316: | ||

'''Searching''' | '''Searching''' | ||

− | + | <br>4-30. | |

A company database consists of 10,000 sorted names, 40% of whom are known as | A company database consists of 10,000 sorted names, 40% of whom are known as | ||

good customers and who together account for 60% of the accesses to the | good customers and who together account for 60% of the accesses to the | ||

Line 329: | Line 329: | ||

is used instead of binary search for both options? | is used instead of binary search for both options? | ||

− | + | <br>4-31. | |

− | Suppose you are given an array | + | Suppose you are given an array <math>A</math> of <math>n</math> sorted numbers that has been |

− | ''circularly shifted'' | + | ''circularly shifted'' <math>k</math> positions to the right. |

− | For example, | + | For example, <math>\{35, 42, 5, 15, 27, 29\}</math> is a sorted array that has been |

− | circularly shifted | + | circularly shifted <math>k=2</math> positions, while <math>\{27, 29, 35, 42, 5, 15\}</math> |

− | has been shifted | + | has been shifted <math>k=4</math> positions. |

− | #Suppose you know what | + | #Suppose you know what <math>k</math> is. Give an <math>O(1)</math> algorithm to find the largest number in <math>A</math>. |

− | #Suppose you ''do not'' know what | + | #Suppose you ''do not'' know what <math>k</math> is. Give an <math>O(\lg n)</math> algorithm to find the largest number in <math>A</math>. For partial credit, you may give an <math>O(n)</math> algorithm. |

[[TADM2E 4.31|(Solution 4.31)]] | [[TADM2E 4.31|(Solution 4.31)]] | ||

− | + | <br>4-32. | |

Consider the numerical 20 Questions game. | Consider the numerical 20 Questions game. | ||

− | In this game, Player 1 thinks of a number in the range 1 to | + | In this game, Player 1 thinks of a number in the range 1 to <math>n</math>. |

Player 2 has to figure out this number by asking the fewest number of | Player 2 has to figure out this number by asking the fewest number of | ||

true/false questions. | true/false questions. | ||

Assume that nobody cheats. | Assume that nobody cheats. | ||

− | #What is an optimal strategy if | + | #What is an optimal strategy if <math>n</math> is known? |

− | #What is a good strategy if | + | #What is a good strategy if <math>n</math> is not known? |

[[TADM2E 4.32|(Solution 4.32)]] | [[TADM2E 4.32|(Solution 4.32)]] | ||

− | + | <br>4-33. | |

Suppose that you are given a sorted sequence of ''distinct'' integers | Suppose that you are given a sorted sequence of ''distinct'' integers | ||

− | + | <math>\{a_1, a_2, \ldots, a_n\}</math>. | |

− | Give an | + | Give an <math>O(\lg n)</math> algorithm to determine whether there exists an <math>i</math> index |

− | such as | + | such as <math>a_i = i</math>. |

− | For example, in | + | For example, in <math>\{-10,-3,3,5,7\}</math>, <math>a_3 = 3</math>. |

− | In | + | In <math>\{2,3,4,5,6,7\}</math>, there is no such <math>i</math>. |

[[TADM2E 4.33|(Solution 4.33)]] | [[TADM2E 4.33|(Solution 4.33)]] | ||

− | + | <br>4-34. | |

Suppose that you are given a sorted sequence of ''distinct'' integers | Suppose that you are given a sorted sequence of ''distinct'' integers | ||

− | + | <math>\{a_1, a_2, \ldots, a_n\}</math>, drawn from <math>1</math> to <math>m</math> where <math>n < m</math>. | |

− | Give an | + | Give an <math>O(\lg n)</math> algorithm to find an integer <math>\leq m</math> that is not present in <math>a</math>. |

For full credit, find the smallest such integer. | For full credit, find the smallest such integer. | ||

− | + | <br>4-35. | |

− | Let | + | Let <math>M</math> be an <math>n \times m</math> integer matrix in which the entries |

of each row are sorted in increasing order (from left to right) | of each row are sorted in increasing order (from left to right) | ||

and the entries in each column are | and the entries in each column are | ||

in increasing order (from top to bottom). | in increasing order (from top to bottom). | ||

Give an efficient algorithm | Give an efficient algorithm | ||

− | to find the position of an integer | + | to find the position of an integer <math>x</math> in <math>M</math>, or to determine that <math>x</math> is |

not there. | not there. | ||

− | How many comparisons of | + | How many comparisons of <math>x</math> with matrix entries does your |

algorithm use in worst case? | algorithm use in worst case? | ||

Line 384: | Line 384: | ||

'''Implementation Challenges''' | '''Implementation Challenges''' | ||

− | + | <br>4-36. | |

− | Consider an | + | Consider an <math>n \times n</math> array <math>A</math> containing integer elements (positive, |

negative, and zero). | negative, and zero). | ||

− | Assume that the elements in each row of | + | Assume that the elements in each row of <math>A</math> are in strictly increasing |

− | order, and the elements of each column of | + | order, and the elements of each column of <math>A</math> are in strictly decreasing |

order. | order. | ||

(Hence there cannot be two zeroes in the same row or the same column.) | (Hence there cannot be two zeroes in the same row or the same column.) | ||

Describe an efficient algorithm that counts the number of occurrences | Describe an efficient algorithm that counts the number of occurrences | ||

− | of the element | + | of the element <math>0</math> in <math>A</math>. Analyze its running time. |

− | + | <br>4-37. | |

Implement versions of several different | Implement versions of several different | ||

sorting algorithms, such as selection sort, insertion sort, heapsort, | sorting algorithms, such as selection sort, insertion sort, heapsort, | ||

Line 408: | Line 408: | ||

[[TADM2E 4.37|(Solution 4.37)]] | [[TADM2E 4.37|(Solution 4.37)]] | ||

− | + | <br>4-38. | |

Implement an external sort, which uses intermediate files | Implement an external sort, which uses intermediate files | ||

to sort files bigger than main memory. | to sort files bigger than main memory. | ||

Line 415: | Line 415: | ||

small records and on files with large records. | small records and on files with large records. | ||

− | + | <br>4-39. | |

Design and implement a parallel sorting algorithm that distributes data | Design and implement a parallel sorting algorithm that distributes data | ||

across several processors. | across several processors. | ||

Line 428: | Line 428: | ||

'''Interview Problems''' | '''Interview Problems''' | ||

− | + | <br>4-40. | |

If you are given a million integers to sort, | If you are given a million integers to sort, | ||

what algorithm would you use to sort them? | what algorithm would you use to sort them? | ||

Line 434: | Line 434: | ||

memory would that consume? | memory would that consume? | ||

− | + | <br>4-41. | |

Describe advantages and disadvantages of the most popular sorting | Describe advantages and disadvantages of the most popular sorting | ||

algorithms. | algorithms. | ||

Line 440: | Line 440: | ||

[[TADM2E 4.41|(Solution 4.41)]] | [[TADM2E 4.41|(Solution 4.41)]] | ||

− | + | <br>4-42. | |

Implement an algorithm that takes an input array and returns only the | Implement an algorithm that takes an input array and returns only the | ||

unique elements in it. | unique elements in it. | ||

− | + | <br>4-43. | |

You have a computer with only 2Mb of main memory. | You have a computer with only 2Mb of main memory. | ||

How do you use it to sort a large file of 500 | How do you use it to sort a large file of 500 | ||

Line 451: | Line 451: | ||

[[TADM2E 4.43|(Solution 4.43)]] | [[TADM2E 4.43|(Solution 4.43)]] | ||

− | + | <br>4-44. | |

Design a stack that supports push, pop, and retrieving the minimum | Design a stack that supports push, pop, and retrieving the minimum | ||

element in constant time. | element in constant time. | ||

Line 458: | Line 458: | ||

[[TADM2E 4.44|(Solution 4.44)]] | [[TADM2E 4.44|(Solution 4.44)]] | ||

− | + | <br>4-45. | |

Given a search string of three words, | Given a search string of three words, | ||

find the smallest snippet of the document that contains all three of the | find the smallest snippet of the document that contains all three of the | ||

Line 470: | Line 470: | ||

[[TADM2E 4.45|(Solution 4.45)]] | [[TADM2E 4.45|(Solution 4.45)]] | ||

− | + | <br>4-46. | |

You are given 12 coins. One of them is heavier or lighter than the rest. | You are given 12 coins. One of them is heavier or lighter than the rest. | ||

Identify this coin in just three weighings. | Identify this coin in just three weighings. | ||

[[TADM2E 4.46|(Solution 4.46)]] | [[TADM2E 4.46|(Solution 4.46)]] |

## Revision as of 18:23, 11 September 2014

# Sorting and Searching

**Applications of Sorting**

4-1.
The Grinch is given the job of partitioning $ 2n $ players into
two teams of $ n $ players each.
Each player has a numerical rating that measures how good he/she is
at the game.
He seeks to divide the players as *unfairly*
as possible, so as to create the biggest possible talent imbalance between team
$ A $ and team $ B $.
Show how the Grinch can do the job in $ O(n \log n) $ time.
*Grinch, The*

4-2.
For each of the following problems, give an algorithm that finds the
desired numbers within the given amount of time.
To keep your answers brief, feel free to use algorithms from the
book as subroutines.
For the example, $ S = \{6, 13, 19, 3, 8\} $, $ 19-3 $ maximizes the difference,
while $ 8-6 $ minimizes the difference.

(a) Let $ S $ be an *unsorted* array of $ n $ integers.
Give an algorithm that finds the pair $ x, y \in S $ that *maximizes*
$ |x-y| $.
Your algorithm must run in $ O(n) $ worst-case time.

(b) Let $ S $ be a *sorted* array of $ n $ integers.
Give an algorithm that finds the pair $ x, y \in S $ that *maximizes*
$ |x-y| $.
Your algorithm must run in $ O(1) $ worst-case time.

(c) Let $ S $ be an *unsorted* array of $ n $ integers.
Give an algorithm that finds the pair $ x, y \in S $ that *minimizes*
$ |x-y| $, for $ x \neq y $.
Your algorithm must run in $ O(n \log n) $ worst-case time.

(d) Let $ S $ be a *sorted* array of $ n $ integers.
Give an algorithm that finds the pair $ x, y \in S $ that *minimizes*
$ |x-y| $, for $ x \neq y $.
Your algorithm must run in $ O(n) $ worst-case time.

4-3.
Take a sequence of $ 2n $ real numbers as input.
Design an $ O(n \log n) $ algorithm that partitions the numbers into $ n $ pairs,
with the property that the partition minimizes the maximum sum of a pair.
For example, say we are given the numbers (1,3,5,9).
The possible partitions are ((1,3),(5,9)), ((1,5),(3,9)), and ((1,9),(3,5)).
The pair sums for these partitions are (4,14), (6,12), and (10,8).
Thus the third partition has 10 as its maximum sum, which is the minimum
over the three partitions.

4-4.
Assume that we are given $ n $ pairs of items as input, where the first
item is a number and the second item is one of three colors
(red, blue, or yellow). Further assume that the items are sorted by number.
Give an $ O(n) $ algorithm to sort the items by color (all reds before all blues
before all yellows) such that the numbers for identical colors stay sorted.
For example: (1,blue), (3,red), (4,blue), (6,yellow), (9,red) should
become (3,red), (9,red), (1,blue), (4,blue), (6,yellow).

4-5.
The *mode* of a set of numbers is the number that occurs most frequently
in the set.
The set $ (4,6,2,4,3,1) $ has a mode of 4.
Give an efficient and correct algorithm to compute the mode of a set of
$ n $ numbers.

4-6.
Given two sets $ S_1 $ and $ S_2 $ (each of size $ n $), and a number $ x $,
describe an $ O(n \log n) $
algorithm for finding whether there exists a pair of elements, one from
$ S_1 $ and one from $ S_2 $, that add up to $ x $.
(For partial credit, give a $ \Theta(n^2) $ algorithm for this problem.)

4-7.
Outline a reasonable method of solving each of the
following problems.
Give the order of the worst-case complexity of your
methods.

- You are given a pile of thousands of telephone bills and thousands of checks sent in to pay the bills. Find out who did not pay.
- You are given a list containing the title, author, call number and publisher of all the books in a school library and another list of 30 publishers. Find out how many of the books in the library were published by each company.
- You are given all the book checkout cards used in the campus library during the past year, each of which contains the name of the person who took out the book. Determine how many distinct people checked out at least one book.

4-8.
Given a set of $ S $ containing $ n $ real numbers, and a real number $ x $.
We seek an algorithm to determine whether two elements
of $ S $ exist whose sum is exactly $ x $.

- Assume that $ S $ is unsorted. Give an $ O(n \log n) $ algorithm for the problem.
- Assume that $ S $ is sorted. Give an $ O(n) $ algorithm for the problem.

4-9.
Give an efficient algorithm to compute the union of sets $ A $ and $ B $,
where $ n = \max(|A|,|B|) $.
The output should be an array of distinct
elements that form the union of the sets, such that
they appear more than once in the union.

- Assume that $ A $ and $ B $ are unsorted. Give an $ O(n \log n) $ algorithm for the problem.
- Assume that $ A $ and $ B $ are sorted. Give an $ O(n) $ algorithm for the problem.

4-10.
Given a set $ S $ of $ n $ integers and an integer $ T $, give
an $ O( n^{k-1} \log n ) $
algorithm to test whether $ k $ of the integers in $ S $ add up to $ T $.

4-11.
Design an $ O(n) $ algorithm that, given a list of $ n $ elements,
finds all the elements that appear more than $ n/2 $ times in the list.
*Then*,
design an $ O(n) $ algorithm that, given a list of $ n $ elements,
finds all the elements
that appear more than $ n/4 $ times.

**Heaps**

4-12.
Devise an algorithm for finding the $ k $ smallest elements of an unsorted
set of $ n $ integers in $ O(n + k \log n) $.

4-13.
You wish to store a set of $ n $ numbers in
either a max-heap or a sorted array.
For each application below,
state which data structure is better, or
if it does not matter.
Explain your answers.

- Want to find the maximum element quickly.
- Want to be able to delete an element quickly.
- Want to be able to form the structure quickly.
- Want to find the minimum element quickly.

4-14.
Give an $ O(n \log k) $-time algorithm that merges $ k $ sorted lists with a
total of $ n $ elements into one sorted list.
(Hint: use a heap to speed up the elementary $ O(k n) $-time algorithm).

4-15.
(a) Give an efficient algorithm to find the second-largest key among $ n $ keys.
You can do better than $ 2n-3 $ comparisons.
(b) Then, give an efficient algorithm to find the third-largest key among $ n $ keys.
How many key comparisons does your algorithm do in the worst case?
Must your algorithm determine which key is largest and second-largest in
the process?

**Quicksort**

4-16.
Use the partitioning idea of quicksort to give an algorithm that
finds the *median* element of an array of $ n $ integers in expected
$ O(n) $ time. (Hint: must you look at both sides of the partition?)

4-17.
The *median* of a set of $ n $ values is the $ \lceil n/2 \rceil $th
smallest value.

- Suppose quicksort always pivoted on the median of the current sub-array. How many comparisons would Quicksort make then in the worst case?
- Suppose quicksort were always to pivot on the $ \lceil n/3 \rceil $th smallest value of the current sub-array. How many comparisons would be made then in the worst case?

4-18.
Suppose an array $ A $ consists of $ n $ elements, each of which is
*red*, *white*, or *blue*.
We seek to sort the elements so that all the *reds*
come before all the *whites*, which
come before all the *blues*
The only operation permitted on the
keys are

*Examine(A,i)*-- report the color of the $ i $th element of $ A $.*Swap(A,i,j)*-- swap the $ i $th element of $ A $ with the $ j $th element.

Find a correct and efficient algorithm for red-white-blue sorting. There is a linear-time solution. This is also known as the Dutch national flag problem. The simplest linear time solution performs two passes of the partition operation from Quicksort. The first pass treats the red and white elements as indistinguishable, and separates them from the blue. The second pass is just separates the elements within the red/white sub-array.

4-19.
An *inversion* of a permutation is a pair of elements
that are out of order.

- Show that a permutation of $ n $ items has at most $ n(n-1)/2 $ inversions. Which permutation(s) have exactly $ n(n-1)/2 $ inversions?
- Let $ P $ be a permutation and $ P^r $ be the reversal of this permutation. Show that $ P $ and $ P^r $ have a total of exactly $ n(n-1)/2 $ inversions.
- Use the previous result to argue that the expected number of inversions in a random permutation is $ n(n-1)/4 $.

4-20.
Give an efficient algorithm to rearrange an array of $ n $ keys so
that all the negative keys precede all the nonnegative keys.
Your algorithm must be in-place, meaning you cannot allocate another
array to temporarily hold the items.
How fast is your algorithm?

**Other Sorting Algorithms**

4-21.
Stable sorting algorithms
leave equal-key items in the same relative order as
in the original permutation.
Explain what must be done to ensure that mergesort is a stable sorting algorithm.

4-22.
Show that $ n $ positive integers in the range 1 to $ k $ can be sorted in
$ O(n \log k) $ time.
The interesting case is when $ k << n $.

4-23.
We seek to sort a sequence $ S $ of $ n $ integers with many duplications,
such that the number of distinct integers in $ S $ is $ O(\log n) $.
Give an $ O(n \log \log n) $ worst-case time algorithm to sort such sequences.

4-24.
Let $ A[1..n] $ be an array such that the first $ n-\sqrt n $ elements
are already sorted (though we know nothing about the remaining
elements).
Give an algorithm that sorts $ A $ in substantially better than
$ n\log n $ steps.

4-25.
Assume that the array $ A[1..n] $ only has numbers
from $ \{1,\ldots, n^2\} $ but that at most $ \log\log n $ of
these numbers ever appear.
Devise an algorithm that sorts $ A $ in substantially less
than $ O(n\log n) $.

4-26.
Consider the problem of sorting a sequence of $ n $ 0's and 1's
using comparisons.
For each comparison of two values $ x $ and $ y $, the algorithm learns
which of $ x < y $, $ x = y $, or $ x > y $ holds.

- Give an algorithm to sort in $ n-1 $ comparisons in the worst case. Show that your algorithm is optimal.
- Give an algorithm to sort in $ 2n/3 $ comparisons in the average case (assuming each of the $ n $ inputs is 0 or 1 with equal probability). Show that your algorithm is optimal.

4-27.
Let $ P $ be a simple, but not necessarily convex, polygon and $ q $
an arbitrary point not necessarily in $ P $.
Design an efficient algorithm to find a line segment originating from $ q $
that intersects the maximum number of edges of $ P $.
In other words, if standing at point $ q $, in what direction should you aim
a gun so the bullet will go through the largest number of walls.
A bullet through a vertex of $ P $ gets credit for only one wall.
An $ O(n \log n) $ algorithm is possible.

**Lower Bounds**

4-28.
sorting algorithm that runs in $ O( n \log (\sqrt n) ) $.
Given the existence of an
$ \Omega(n \log n) $ lower bound for sorting, how can this be possible?
*lower bound*

4-29.
Mr. B. C. Dull claims to have developed a new data structure for
priority queues that supports the operations *Insert*,
*Maximum*, and *Extract-Max*---all in $ O(1) $ worst-case time.
Prove that he is mistaken.
(Hint: the argument does not involve a lot of gory details---just think
about what this would imply about the $ \Omega(n \log n) $ lower bound for
sorting.)

**Searching**

4-30.
A company database consists of 10,000 sorted names, 40% of whom are known as
good customers and who together account for 60% of the accesses to the
database.
There are two data structure options to consider for representing the database:

- Put all the names in a single array and use binary search.
- Put the good customers in one array and the rest of them in a second array.

Only if we do not find the query name on a binary search of the first array do we do a binary search of the second array. Demonstrate which option gives better expected performance. Does this change if linear search on an unsorted array is used instead of binary search for both options?

4-31.
Suppose you are given an array $ A $ of $ n $ sorted numbers that has been
*circularly shifted* $ k $ positions to the right.
For example, $ \{35, 42, 5, 15, 27, 29\} $ is a sorted array that has been
circularly shifted $ k=2 $ positions, while $ \{27, 29, 35, 42, 5, 15\} $
has been shifted $ k=4 $ positions.

- Suppose you know what $ k $ is. Give an $ O(1) $ algorithm to find the largest number in $ A $.
- Suppose you
*do not*know what $ k $ is. Give an $ O(\lg n) $ algorithm to find the largest number in $ A $. For partial credit, you may give an $ O(n) $ algorithm.

4-32.
Consider the numerical 20 Questions game.
In this game, Player 1 thinks of a number in the range 1 to $ n $.
Player 2 has to figure out this number by asking the fewest number of
true/false questions.
Assume that nobody cheats.

- What is an optimal strategy if $ n $ is known?
- What is a good strategy if $ n $ is not known?

4-33.
Suppose that you are given a sorted sequence of *distinct* integers
$ \{a_1, a_2, \ldots, a_n\} $.
Give an $ O(\lg n) $ algorithm to determine whether there exists an $ i $ index
such as $ a_i = i $.
For example, in $ \{-10,-3,3,5,7\} $, $ a_3 = 3 $.
In $ \{2,3,4,5,6,7\} $, there is no such $ i $.

4-34.
Suppose that you are given a sorted sequence of *distinct* integers
$ \{a_1, a_2, \ldots, a_n\} $, drawn from $ 1 $ to $ m $ where $ n < m $.
Give an $ O(\lg n) $ algorithm to find an integer $ \leq m $ that is not present in $ a $.
For full credit, find the smallest such integer.

4-35.
Let $ M $ be an $ n \times m $ integer matrix in which the entries
of each row are sorted in increasing order (from left to right)
and the entries in each column are
in increasing order (from top to bottom).
Give an efficient algorithm
to find the position of an integer $ x $ in $ M $, or to determine that $ x $ is
not there.
How many comparisons of $ x $ with matrix entries does your
algorithm use in worst case?

**Implementation Challenges**

4-36.
Consider an $ n \times n $ array $ A $ containing integer elements (positive,
negative, and zero).
Assume that the elements in each row of $ A $ are in strictly increasing
order, and the elements of each column of $ A $ are in strictly decreasing
order.
(Hence there cannot be two zeroes in the same row or the same column.)
Describe an efficient algorithm that counts the number of occurrences
of the element $ 0 $ in $ A $. Analyze its running time.

4-37.
Implement versions of several different
sorting algorithms, such as selection sort, insertion sort, heapsort,
mergesort, and quicksort.
Conduct experiments to assess the relative performance of these algorithms
in a simple application that reads a large text file
and reports exactly one instance of each word that appears within it.
This application can be efficiently implemented by sorting all the
words that occur in the text and then passing through the sorted sequence
to identify one instance of each distinct word.
Write a brief report with your conclusions.

4-38.
Implement an external sort, which uses intermediate files
to sort files bigger than main memory.
Mergesort is a good algorithm to base such an implementation on.
Test your program both on files with
small records and on files with large records.

4-39.
Design and implement a parallel sorting algorithm that distributes data
across several processors.
An appropriate variation of mergesort is a likely candidate.
Measure the speedup of this algorithm as the number of processors increases.
Later, compare the execution time to that of a purely sequential mergesort
implementation.
What are your experiences?

**Interview Problems**

4-40.
If you are given a million integers to sort,
what algorithm would you use to sort them?
How much time and
memory would that consume?

4-41.
Describe advantages and disadvantages of the most popular sorting
algorithms.

4-42.
Implement an algorithm that takes an input array and returns only the
unique elements in it.

4-43.
You have a computer with only 2Mb of main memory.
How do you use it to sort a large file of 500
Mb that is on disk?

4-44.
Design a stack that supports push, pop, and retrieving the minimum
element in constant time.
Can you do this?

4-45.
Given a search string of three words,
find the smallest snippet of the document that contains all three of the
search
words---i.e., the snippet with smallest number of words in it.
You are given the
index positions where these words occur in the document, such as
*word1: (1, 4, 5)*, *word2: (3, 9, 10)*, and *word3: (2, 6, 15)*.
Each of the lists are in sorted order, as above.

4-46.
You are given 12 coins. One of them is heavier or lighter than the rest.
Identify this coin in just three weighings.