# Dynamic Programming

Edit Distance

8-1. Typists often make transposition errors exchanging neighboring characters, such as typing setve when you mean steve. This requires two substitutions to fix under the conventional definition of edit distance. Incorporate a swap operation into our edit distance function, so that such neighboring transposition errors can be fixed at the cost of one operation.

8-2. Suppose you are given three strings of characters: $X$, $Y$, and $Z$, where $|X| = n$, $|Y|=m$, and $|Z|=n+m$. $Z$ is said to be a shuffle of $X$ and $Y$ iff $Z$ can be formed by interleaving the characters from $X$ and $Y$ in a way that maintains the left-to-right ordering of the characters from each string.

1. Show that cchocohilaptes is a shuffle of chocolate and chips, but chocochilatspe is not.
2. Give an efficient dynamic-programming algorithm that determines whether $Z$ is a shuffle of $X$ and $Y$. Hint: the values of the dynamic programming matrix you construct should be Boolean, not numeric.

8-3. The longest common substring (not subsequence) of two strings $X$ and $Y$ is the longest string that appears as a run of consecutive letters in both strings. For example, the longest common substring of photograph and tomography is ograph.

1. Let $n=|X|$ and $m=|Y|$. Give a $\Theta(nm)$ dynamic programming algorithm for longest common substring based on the longest common subsequence/edit distance algorithm.
2. Give a simpler $\Theta(nm)$ algorithm that does not rely on dynamic programming.

8-4. The longest common subsequence (LCS) of two sequences $T$ and $P$ is the longest sequence $L$ such that $L$ is a subsequence of both $T$ and $P$. The shortest common supersequence (SCS) of $T$ and $P$ is the smallest sequence $L$ such that both $T$ and $P$ are a subsequence of $L$.

1. Give efficient algorithms to find the LCS and SCS of two given sequences.
2. Let $d(T,P)$ be the minimum edit distance between $T$ and $P$ when no substitutions are allowed (i.e., the only changes are character insertion and deletion). Prove that $d(T,P)=|SCS(T,P)|-|LCS(T,P)|$ where $|SCS(T,P)|$ ($|LCS(T,P)|$) is the size of the shortest $SCS$ (longest LCS) of $T$ and $P$.

Greedy Algorithms

8-5. Let $P_1 ,P_2, \ldots, P_n$ be $n$ programs to be stored on a disk with capacity $D$ megabytes. Program $P_i$ requires $s_i$ megabytes of storage. We cannot store them all because $D < \sum_{i=1}^n s_i$

1. Does a greedy algorithm that selects programs in order of nondecreasing $s_i$ maximize the number of programs held on the disk? Prove or give a counter-example.
2. Does a greedy algorithm that selects programs in order of nonincreasing order $s_i$ use as much of the capacity of the disk as possible? Prove or give a counter-example.

8-6. Coins in the United States are minted with denominations of 1, 5, 10, 25, and 50 cents. Now consider a country whose coins are minted with denominations of $\{d_1, \ldots, d_k \}$ units. We seek an algorithm to make change of $n$ units using the minimum number of coins for this country.
(a) The greedy algorithm repeatedly selects the biggest coin no bigger than the amount to be changed and repeats until it is zero. Show that the greedy algorithm does not always use the minimum number of coins in a country whose denominations are $\{1,6,10\}$.
(b) Give an efficient algorithm that correctly determines the minimum number of coins needed to make change of $n$ units using denominations $\{d_1, \ldots, d_k \}$. Analyze its running time.

8-7. In the United States, coins are minted with denominations of 1, 5, 10, 25, and 50 cents. Now consider a country whose coins are minted with denominations of $\{d_1, \ldots, d_k \}$ units. We want to count how many distinct ways $C(n)$ there are to make change of $n$ units. For example, in a country whose denominations are $\{1,6,10\}$, $C(5) = 1$, $C(6)$ to $C(9)=2$, $C(10)=3$, and $C(12)=4$.

1. How many ways are there to make change of 20 units from $\{1,6,10\}$?
2. Give an efficient algorithm to compute $C(n)$, and analyze its complexity. (Hint: think in terms of computing $C(n,d)$, the number of ways to make change of $n$ units with highest denomination $d$. Be careful to avoid overcounting.)

8-8. In the single-processor scheduling problem, we are given a set of $n$ jobs $J$. Each job $i$ has a processing time $t_i$, and a deadline $d_i$. A feasible schedule is a permutation of the jobs such that when the jobs are performed in that order, every job is finished before its deadline. The greedy algorithm for single-processor scheduling selects the job with the earliest deadline first. Show that if a feasible schedule exists, then the schedule produced by this greedy algorithm is feasible.

Number Problems

8-9. The knapsack problem is as follows: given a set of integers $S = \{s_1, s_2, \ldots, s_n\}$, and a given target number $T$, find a subset of $S$ that adds up exactly to $T$. For example, within $S = \{1, 2, 5, 9, 10\}$ there is a subset that adds up to $T=22$ but not $T=23$. Give a correct programming algorithm for knapsack that runs in $O(n T)$ time.

8-10. The integer partition takes a set of positive integers $S=s_1, \ldots, s_n$ and asks if there is a subset $I \in S$ such that $\sum_{i \in I} s_i= \sum_{ i \notin I} s_i$ Let $\sum_{i \in S} s_i = M$. Give an $O(nM)$ dynamic programming algorithm to solve the integer partition problem.

8-11. Assume that there are $n$ numbers (some possibly negative) on a circle, and we wish to find the maximum contiguous sum along an arc of the circle. Give an efficient algorithm for solving this problem.

8-12. A certain string processing language allows the programmer to break a string into two pieces. It costs $n$ units of time to break a string of $n$ characters into two pieces, since this involves copying the old string. A programmer wants to break a string into many pieces, and the order in which the breaks are made can affect the total amount of time used. For example, suppose we wish to break a 20-character string after characters 3, 8, and 10. If the breaks are made in left-right order, then the first break costs 20 units of time, the second break costs 17 units of time, and the third break costs 12 units of time, for a total of 49 steps. If the breaks are made in right-left order, the first break costs 20 units of time, the second break costs 10 units of time, and the third break costs 8 units of time, for a total of only 38 steps. Give a dynamic programming algorithm that takes a list of character positions after which to break and determines the cheapest break cost in $O(n^3)$ time.

8-13. Consider the following data compression technique. We have a table of $m$ text strings, each at most $k$ in length. We want to encode a data string $D$ of length $n$ using as few text strings as possible. For example, if our table contains {\em(a,ba,abab,b)} and the data string is bababbaababa, the best way to encode it is {\em(b,abab,ba,abab,a)}---a total of five code words. Give an $O(nmk)$ algorithm to find the length of the best encoding. You may assume that every string has at least one encoding in terms of the table.

8-14. The traditional world chess championship is a match of 24 games. The current champion retains the title in case the match is a tie. Each game ends in a win, loss, or draw (tie) where wins count as $1$, losses as $0$, and draws as $1/2$. The players take turns playing white and black. White has an advantage, because he moves first. The champion plays white in the first game. He has probabilities $w_{\mbox{w}}$, $w_{\mbox{d}}$, and $w_{\mbox{l}}$ of winning, drawing, and losing playing white, and has probabilities $b_{\mbox{w}}$, $b_{\mbox{d}}$, and $b_{\mbox{l}}$ of winning, drawing, and losing playing black.

1. Write a recurrence for the probability that the champion retains the title. Assume that there are $g$ games left to play in the match and that the champion needs to win $i$ games (which may end in a $1/2$).
2. Based on your recurrence, give a dynamic programming to calculate the champion's probability of retaining the title.
3. Analyze its running time for an $n$ game match.

8-15. Eggs break when dropped from great enough height. Specifically, there must be a floor $f$ in any sufficiently tall building such that an egg dropped from the $f$th floor breaks, but one dropped from the $(f-1)$st floor will not. If the egg always breaks, then $f=1$. If the egg never breaks, then $f=n+1$. You seek to find the critical floor $f$ using an $n$-story building. The only operation you can perform is to drop an egg off some floor and see what happens. You start out with $k$ eggs, and seek to drop eggs as few times as possible. Broken eggs cannot be reused. Let $E(k,n)$ be the minimum number of egg droppings that will always suffice.

1. Show that $E(1,n)=n$.
2. Show that $E(k,n)=\Theta(n^{\frac{1}{k}})$.
3. Find a recurrence for $E(k,n)$. What is the running time of the dynamic program to find $E(k,n)$?

Graph Problems

8-16. Consider a city whose streets are defined by an $X \times Y$ grid. We are interested in walking from the upper left-hand corner of the grid to the lower right-hand corner. Unfortunately, the city has bad neighborhoods, whose intersections we do not want to walk in. We are given an $X \times Y$ matrix BAD, where BAD[i,j] = yes' if and only if the intersection between streets $i$ and $j$ is in a neighborhood to avoid.
(a) Give an example of the contents of BAD such that there is no path across the grid avoiding bad neighborhoods.
(b) Give an $O( X Y )$ algorithm to find a path across the grid that avoids bad neighborhoods.
(c) Give an $O( X Y )$ algorithm to find the shortest path across the grid that avoids bad neighborhoods. You may assume that all blocks are of equal length. For partial credit, give an $O(X^2 Y^2)$ algorithm.

8-17. Consider the same situation as the previous problem. We have a city whose streets are defined by an $X \times Y$ grid. We are interested in walking from the upper left-hand corner of the grid to the lower right-hand corner. We are given an $X \times Y$ matrix BAD, where BAD[i,j] = yes' if and only if the intersection between streets $i$ and $j$ is somewhere we want to avoid. If there were no bad neighborhoods to contend with, the shortest path across the grid would have length $(X-1) + (Y-1)$ blocks, and indeed there would be many such paths across the grid. Each path would consist of only rightward and downward moves. Give an algorithm that takes the array BAD and returns the number of safe paths of length $X+Y-2$. For full credit, your algorithm must run in $O( X Y )$.

Design Problems

8-18. Consider the problem of storing $n$ books on shelves in a library. The order of the books is fixed by the cataloging system and so cannot be rearranged. Therefore, we can speak of a book $b_i$, where $1 \leq i \leq n$, that has a thickness $t_i$ and height $h_i$. The length of each bookshelf at this library is $L$. Suppose all the books have the same height $h$ (i.e., $h = h_i = h_j$ for all $i, j$) and the shelves are all separated by a distance of greater than $h$, so any book fits on any shelf. The greedy algorithm would fill the first shelf with as many books as we can until we get the smallest $i$ such that $b_i$ does not fit, and then repeat with subsequent shelves. Show that the greedy algorithm always finds the optimal shelf placement, and analyze its time complexity.

8-19. This is a generalization of the previous problem. Now consider the case where the height of the books is not constant, but we have the freedom to adjust the height of each shelf to that of the tallest book on the shelf. Thus the cost of a particular layout is the sum of the heights of the largest book on each shelf.

1. Give an example to show that the greedy algorithm of stuffing each shelf as full as possible does not always give the minimum overall height.
2. Give an algorithm for this problem, and analyze its time complexity. Hint: use dynamic programming.

8-20. We wish to compute the laziest way to dial given $n$-digit number on a standard push-button telephone using two fingers. We assume that the two fingers start out on the * and \# keys, and that the effort required to move a finger from one button to another is proportional to the Euclidean distance between them. Design an algorithm that computes the method of dialing that involves moving your fingers the smallest amount of total distance, where $k$ is the number of distinct keys on the keypad ($k=16$ for standard telephones). Try to use $O(n k^3)$ time.

8-21. Given an array of $n$ real numbers, consider the problem of finding the maximum sum in any contiguous subvector of the input. For example, in the array $\{31,-41,59,26,-53,58,97,-93,-23,84\}$ the maximum is achieved by summing the third through seventh elements, where $59+26+(-53)+58+97 = 187$. When all numbers are positive, the entire array is the answer, while when all numbers are negative, the empty array maximizes the total at 0.

1. Give a simple, clear, and correct $\Theta(n^2)$-time algorithm to find the maximum contiguous subvector.
2. Now give a $\Theta(n)$-time dynamic programming algorithm for this problem. To get partial credit, you may instead give a correct $O(n \log n)$ divide-and-conquer algorithm.

8-22. Consider the problem of examining a string $x = x_1 x_2 \ldots x_n$ from an alphabet of $k$ symbols, and a multiplication table over this alphabet. Decide whether or not it is possible to parenthesize $x$ in such a way that the value of the resulting expression is $a$, where $a$ belongs to the alphabet. The multiplication table is neither commutative or associative, so the order of multiplication matters. \vspace{0.1in}

$\begin{array}{c|ccc} & a & b & c \\ \hline a & a & c & c \\ b & a & a & b \\ c & c & c & c \\ \end{array}$

For example, consider the above multiplication table and the string $bbbba$. Parenthesizing it $(b(bb))(ba)$ gives $a$, but $((((bb)b)b)a)$ gives $c$. Give an algorithm, with time polynomial in $n$ and $k$, to decide whether such a parenthesization exists for a given string, multiplication table, and goal element.

8-23. Let $\alpha$ and $\beta$ be constants. Assume that it costs $\alpha$ to go left in a tree, and $\beta$ to go right. Devise an algorithm that builds a tree with optimal worst case cost, given keys $k_1,\ldots,k_n$ and the probabilities that each will be searched $p_1,\ldots,p_n$.

Interview Problems

8-24. Given a set of coin denominators, find the minimum number of coins to make a certain amount of change.

8-25. You are given an array of $n$ numbers, each of which may be positive, negative, or zero. Give an efficient algorithm to identify the index positions $i$ and $j$ to the maximum sum of the $i$th through $j$th numbers.

8-26. Observe that when you cut a character out of a magazine, the character on the reverse side of the page is also removed. Give an algorithm to determine whether you can generate a given string by pasting cutouts from a given magazine. Assume that you are given a function that will identify the character and its position on the reverse side of the page for any given character position.