In the algorithm SELECT, the input elements are divided into groups of 5. Will the algorithm work in linear time if they are divided into groups of 7? Argue that SELECT does not run in linear time if groups of 3 are used.
solution:
(1) groups of 7 are used
# of elements greater than the median of median is at least
$$4(\lceil\frac{1}{2}\lceil\frac{n}{7}\rceil\rceil - 2) \geq \frac{2n}{7} - 8$$.
The recurrence can be rewritten as
$$T(n) \leq T(\lceil\frac{n}{7}\rceil) + T(\frac{5n}{7} + 8) + O(n)$$.
The process of proving is similar to the textbook, and we will prove that $T(n) = O(n)$ finally.
(2) groups of 3 are used
# of elements greater than the median of median is at least
$$2(\lceil\frac{1}{2}\lceil\frac{n}{3}\rceil\rceil - 2) \geq \frac{n}{3} - 4$$.
The recurrence can be rewritten as
$$T(n) \leq T(\lceil\frac{n}{3}\rceil) + T(\frac{2n}{3} + 4) + O(n)$$.
Before we derive the following steps, we can glance at the equation and find out the right hand side $n/3 + 2n/3 + 4 \geq n$. Suppose that $T(n) \leq cn$ for some positive constant c. By substitution, we have
Since $5c + an \geq 0$, we cannot derive that $T(n) \leq cn$. Hence, SELECT does not run in linear time if groups of 3 are used.
Exercise 9.3-2
Analyze SELECT to show that if $n \geq 140$, then at least $\lceil n/4 \rceil$ elements are greater than the median-of-medians x and at least $\lceil n/4 \rceil$ elements are less than x.
solution:
Suppose that at most $\lceil n/4 \rceil$ elements are greater than x, then we can obtain that
$$\lceil n/4 \rceil \geq \frac{3n}{10} - 6 \Rightarrow \frac{n}{4} + 1 > \frac{3n}{10} - 6 \Rightarrow n < 140$$.
It contradicts $n \geq 140$! Hence, we can prove that if $n \geq 140$, then at least $\lceil n/4 \rceil$ elements are greater than the median-of-medians x and at least $\lceil n/4 \rceil$ elements are less than x.
Exercise 9.3-3
Show how quicksort can be made to run in $O(n\lg n)$ time in the worst case, assuming that all elements are distinct.
solution:
Before PARTITION is performed, we call SELECT to pick up the median of the input array. The median is used for a pivot to split the array into two subarrays. Because of the median, it guarantees the left hand side and right hand side must be of the same length. Let $T(n)$ be the running time needed by the modified quicksort to perform on input array of n elements. We can derive the recurrence as follows: $T(n) \leq 2T(n/2) + \Theta(n)$. By master theorem, $T(n) = \Theta(n\lg n)$.
MODIFIED-PARTITION(A, p, r)
1 n = r - p + 1
2 find an index k by calling SELECT(A, p, r, $\lfloor(n+1)/2\rfloor$) such that A[k] = x, where x is the median
3 exchange A[r] and A[k]
4 return PARTITION(A, p, r)
3 exchange A[r] and A[k]
4 return PARTITION(A, p, r)
Exercise 9.3-4
Suppose that an algorithm uses only comparisons to find the ith smallest element in a set of n elements. Show that it can also find the i -1 smaller elements and n - i larger elements without performing any additional comparisons.
solution:
Suppose that the algorithm uses m comparisons to find the ith smallest element x in a set of n elements. If we trace these m comparisons in the log, we can easily find the i -1 smaller elements by transitivity. If x > a and a > b, then x > b can be deduced without actually performing a comparison between x and b. Similarly, the n - i larger elements can be found, too. Is it possible there exists a number, say p, we cannot decide whether x is greater than p or not by the comparison log? That's impossible! Otherwise, the algorithm does not work correctly.
Exercise 9.3-5
Suppose that you have a "black-box" worst-case linear-time median subroutine. Give a simple, linear-time algorithm that solves the selection problem for an arbitrary order statistic.
solution:
MODIFIED-SELECT(A, p, r, i)
1 if p == r
2 return A[p]
3 x = MEDIAN(A, p, r)
4 q = PARTITION'(A, p, r, x)
5 k = q - p + 1
6 if i == k
7 return A[q]
8 else if i < k
9 return MODIFIED-SELECT(A, p, q-1, i)
10 else
11 return MODIFIED-SELECT(A, q+1, r, i-k)
Exercise 9.3-6
The kth quantiles of an n-element set are the k-1 order statistics that divide the sorted set into k equal-sized sets (to within 1). Give an $O(n\lg k)$-time algorithm to list the kth quantiles of a set.
solution:
The k quantiles of an n-element array A are
$A[\lceil 1\cdot n/k \rceil], A[\lceil 2\cdot n/k \rceil],\cdots,A[\lceil (k-1)\cdot n/k \rceil]$.
The following algorithm finds kth quantiles of an array A, and these (k-1) elements are put just like the positions they reside if entire range is sorted.
QUANTILE-BASE(A, p, r, Q, k)
1 if k <= 0
2 return
3 else if k == 1
4 x = SELECT(A, p, r, Q[1])
5 PARTITION(A, p, r, x)
6 else
7 i = $\lfloor (k + 1)/2 \rfloor$
8 x = SELECT(A, p, r, Q[i])
9 q = PARTITION(A, p, r, x)
10 QUANTILE-BASE(A, p, q - 1, Q[1..(i-1)], i - 1)
11 for j = (i + 1) to length[Q]
12 Q[j] = Q[j] - Q[i]
13 QUANTILE-BASE(A, q + 1, r, Q[(i+1)..length[Q]], k - i)
QUANTILE(A, p, r, k)
1 if k <= 1
2 return
3 n = r - p + 1
4 QUANTILE-BASE(A, p, r, $[\lceil 1\cdot n/k \rceil, \lceil 2\cdot n/k \rceil,\cdots,\lceil (k-1)\cdot n/k \rceil]$, k - 1)
Next, we prove the running time is $O(n\lg k)$. Let $T(n, k)$ be the running time of the algorithm needs to find k-quantiles of an n-element array. We have,
Have no energy to draw the recursion tree to illustrate the complexity. Maybe tomorrow...
Exercise 9.3-7
Describe an $O(n)$-time algorithm that, given a set S of n distinct numbers and a positive integer $k \leq n$, determines the k numbers in S that are closest to the median of S.
solution:
The median x can be found by the procedure SELECT in $O(n)$ time. After finding the median x, we call the a modified procedure SELECT' to find the kth element. In the modified procedure SELECT', the elements, for instance $a, b$ are compared by the distance to the median x($$|a-x| - |b-x|$$). The comparison takes $\Theta(1)$ time. So the modified SELECT' still runs in $O(n)$ time. After the kth element is found by SELECT', it is used as the pivot to partition the set S, and S[1..k] is the k numbers that are closeset to the median.
Exercise 9.3-8
Let $X[1..n]$ and $Y[1..n]$ be two arrays, each containing n numbers already in sorted order. Give an $O(\lg n)$-time algorithm to find the median of all 2n elements in array X and Y.
solution:
Let Z be the union of array X and array Y and in sorted order. The median m of Z is the nth order statistics, that is, m must be greater than exactly (n-1) numbers. Suppose that m is in the array X, then we can claim that there must exist an index p such that $X[p] = m$ and $Y[n-p] \leq X[p] \leq Y[n-p+1]$. Why? X[p] is greater than exactly p-1 numbers in X. And $Y[n-p] \leq X[p] \leq Y[n-p+1]$, then X[p] is greater than exactly n-p numbers in Y. Thus, X[p] is greater than exactly n-1 numbers in total and X[p] must be the median m.
The following algorithm can help us find the median m. It tries to find the median in X first. If it fails, the median must be in Y. The procedure then tries to find the median in Y. Since the median must be either in X or Y. The procedure would not be called infinitely.
FIND-MEDIAN(X, Y, start_index, end_index, n)
1 if start_index > end_index
2 return FIND-MEDIAN(Y, X, 1, n, n)
3 p = (start_index + end_index)/2
4 a = X[p]
5 if a >= Y[n - p] and a <= Y[n - p + 1]
6 return X[p]
7 else if a < Y[n - p]
8 return FIND-MEDIAN(X, Y, p+1, end_index, n)
9 else
10 return FIND-MEDIAN(X, Y, start_index, p-1, n)
Then, we are going to show that the algorithm needs $O(\lg n)$ time. Each time the procedure is called, the distance between start_index and end_index is divided by 2. If the median is in X, the running time is at most $O(\lg n)$. Otherwise, it continues to search into Y, and the running time is $O(\lg n) + O(\lg n) = O(\lg n)$.
Exercise 9.3-9
Professor Olay is consulting for an oil company, which is planning a large pipeline running east to west through an oil field of n wells. From each well, a spur pipeline is to be connected directly to the main pipeline along a shortest path (either north or south), as shown in Figure 9.2. Given x- and y-coordinates of the wells, how should the professor pick the optimal location of the main pipeline (the one that minimizes the total length of the spurs)? Show that the optimal location can be determined in linear time.
solution:
Let $(x_1, y_1), (x_2, y_2), \cdots, (x_n, y_n)$ be the coordinates of n wells. We want to find a number $y^*$ such that $$d(y^*) = \sum_{i = 1}^n |y_i - y^*|$$ achieves the minimum. Note that $$|y_i - y^*|$$ is the distance between $y_i$ and $y^*$. The median of $y_1, y_2, \ldots, y_n$ will always have less value than any other fixed number substituted into function $d(\cdot)$. So the optimal location can be determined by SELECT in $O(n)$ time.
Exercise 9.3.4
回覆刪除I don't know. The argument that it has to do it because it works, is kinda flaky. Besides, comparisons by the eye still count as comparisons and the brain as software.
I think what they are looking after is to use randomized selection. On success A[q] will return the ith smallest and q-p the i-1 smallest elements.Still trying to get the n-i largest elements, though.
Exercise 9.3.4
回覆刪除Pretty straight forward. To get the n-i largest elements you have to return the right subarray (r-q+1) concatted to the rest of the right subarrays created in the process.
Exercise 9.3.6
回覆刪除I think that this is a simpler solution:
QUANTILE-BASE(A, p, r, Q, k)
if k == 0
return
else
x = SELECT(A, p, r, Q[k]) Ti k-1
QUANTILE-BASE(A, p, A[Q[k]], Q, k-1)
QUANTILE(A, p, r, k)
Let Q[0...k-1] be a new array
for i = 0 to k-1
Q[i] = i*(r-p + 1)/k ci k-1
QUANTILE-BASE(A, p, r, Q, k-1)
Exercise 9.3.9
回覆刪除Good guess, but no cake!. In this 1D problem there are 5 spur lines north of the pipeline and 4 south of it. The more you move the pipeline north the more you save on spur lines, since you save 5x the same amount from the north wells and you loose only 4x the same amount from the south wells.
Until the situation changes. The minimum is when the pipeline passes through well 6 (closest one from the north). This all can be verified with first order derivatives.
This is a bad problem that doesn't have much to do with algorithms, but with math, where you can get an exact solution.
So..that is why it should be in the median, if its in the median there will only be 4 spurlines north and 4 spurlines south.. if you move the line north, there will be a 5x increase on the south and 4x decrease to the north, and viceversa..
刪除Hey RMan,
回覆刪除Nice job. Is this blog supported/sponsored by Google?
Also I don't see any Solutions to Ch 10 exercises. Why so?
Can i contribute them?
Nikos