2010年7月18日 星期日

[Algo] 7.3 A randomized version of quicksort

在分析 quicksort 的 average-case 的時候,我們明白到 pivot 選得好不好會決定整個 quicksort 的 running time。在這邊我們隨機的從 $A[p \ldots r]$ 中挑選一個 element 出來跟 $A[r]$ 做交換,因為 pivot 是隨機亂挑的,我們可以期待,以平均來講 split 會是 well balanced 的。


@{
\proc{Randomized-Partition}(A, p, r)
\li i = \proc{Random}(p, r)
\li \textup{exchange}\;A[r]\;\textup{and}\;A[i]
\li \Return \proc{Partition}(A, p, r)
}@


@{
\proc{Randomized-Quicksort}(A, p, r)
\li \If p < r
\li \;\;\;\;q = \proc{Randomized-Partition}(A, p, r)
\li \;\;\;\;\proc{Randomized-Quicksort}(A, p, q - 1)
\li \;\;\;\;\proc{Randomized-Quicksort}(A, q + 1, r)
}@

2010年7月5日 星期一

[Algo] Exercise 7.2

Exercise 7.2-1
Use the substitution method to prove that the recurrence $T(n) = T(n-1) + \Theta(n)$ has the solution $T(n) = \Theta(n^2)$, as claimed at the beginning of section 7.2.
solution:
For the upper bound, we guess that $T(n) \leq c_1 n^2$ for some constant $c_1$. Substituting this guess into recurrence, we obtain $T(n) \leq c_1(n-1)^2 + \Theta(n) \leq c_1 n^2 - c_1(2n-1) + \Theta(n) \leq c_1 n^2$, since we can pick the constant $c_1$ large enough so that the $c_1(2n - 1)$ term dominates the $\Theta(n)$ term. Thus, $T(n) = O(n^2)$.
For the lower bound, we guess that $T(n) \geq c_2 n^2$ for some constant $c_2$ in a similar manner as the upper bound. Substituting this guess into recurrence, we obtain $T(n) \geq c_2(n - 1)^2 + \Theta(n) \geq c_2 n^2 - c_2(2n - 1) + \Theta(n) \geq c_2 n^2$, since we can pick the constant $c_2$ small enough so that the $\Theta(n)$  term dominates $c_2(2n - 1)$ term. Thus, $T(n) = \Omega(n^2)$, and we can conclude that $T(n) = \Theta(n^2)$.


Exercise 7.2-2
What is the running time of QUICKSORT when all elements of array A have the same value?
solution:
Since all elements of array A have the same value, subroutine PARTITION will always split into a subproblem of size $n-1$ and a subproblem of size 0. This is the worst-case. So we expect that running time is $\Theta(n^2)$.


Exercise 7.2-3
Show the running time of QUICKSORT is $\Theta(n^2)$ when the array A contains distinct elements and is sorted in decreasing order.
solution:
Since array A is in decreasing order and all elements are distinct, the pivot will be the smallest element in the subroutine PARTITION. It will always split into a subproblem of size $n-1$ and a subproblem of size 0. This is the worst-case. So we expect that running time is $\Theta(n^2)$.


Exercise 7.2-4
Banks often record transactions on an account in order of the times of the transactions, but many people like to receive their bank statements with checks listed in order by check number. People usually write checks in order by check number, and merchants usually cash them with reasonable dispatch. The problem of converting time-of-transaction ordering to check-number ordering is therefore the problem of sorting almost-sorted input. Argue that the procedure INSERTION-SORT would tend to beat the procedure QUICKSORT on this problem.
solution:
INSERTION-SORT requires almost $O(n)$ running time to sort an almost-sorted input, however, QUICKSORT reuqires almost $O(n^2)$ running time. So we use INSERTION-SORT rather than QUICKSORT in this situation.


Exercise 7.2-5
Suppose that the splits at every level of quicksort are in the proportion $1 - \alpha$ to $\alpha$, where $0 < \alpha \leq 1/2$ is a constant. Show that the minimum depth of a leaf in the recursion tree is approximately $-\lg n / \lg \alpha$ and the maximum depth is approximately $-\lg n / \lg (1-\alpha)$. (Don't worry about integer round-off.)
solution:
Suppose that the minimum depth of a leaf is $d_m$, we can obtain the following equation: $n(\alpha)^{d_m} = 1$. So we can get $d_m = -\log_{\alpha} n = -\lg n / \lg \alpha$. Suppose that the maximum depth of a leaf is $d_M$. We can solve the following equation: $n(1 - \alpha)^{d_M} = 1$to get $d_M$. Thus, $d_M = -\log_{1 - \alpha} n = -\lg n / \lg (1 - \alpha)$.


Exercise 7.2-6
Argue that for any constant $0 < \alpha \leq 1/2$, the probability is approximately $1 - 2\alpha$ that on a random input array, PARTITION produces a split more balanced than $1 - \alpha$ to $\alpha$.
solution:
Suppose that the split is distributed uniformly, and it is more balanced if the split is 1:1. If the split is $1-\alpha$ : $\alpha$ now, then we can get a more balanced split with the probability $2(1/2 - \alpha) = 1 - 2\alpha$.

2010年7月4日 星期日

[Algo] 7.2 Performence of quicksort

worst-case partitioning
quicksort 的 worst-case 發生於 partitioning 完之後,產生一個 $n-1$ 跟 一個 0 的 subproblems,這代表切割點選的不是很好,這樣子我們可以把它的 running time 寫成如下的形式:
$T(n) = T(n - 1) + T(0) + \Theta(n) = T(n - 1) + \Theta(n)$
所以我們可以知道它的 running time 是 $\Theta(n^2)$,這代表 quicksort 並沒有比 insertion sort 好到哪裡去,而且更慘的是會發生在這種 worst-case 情況通常發生在 array 是已經呈現排序好的狀態,但是在這樣的情況下 insertion sort 只要 $O(n)$ 的 running time。


best-case partitioning
在 best-case 的情況下,代表 partitioning 完之後,產生一個 $\lfloor n/2 \rfloor$ 以及一個 $\lceil n/2 \rceil -1$ 的 subproblems,所以我們可以把它的 running time 寫成如下的形式:
$T(n) \leq 2T(n/2) + \Theta(n)$
根據 master theorem,我們可以知道 running time $T(n) = O(n \lg n)$


average-case partitioning
其實 average-case 跟 best-case 沒有差太多,我們來解釋一下為什麼。假設 PARTITION 演算法總是產生 9 比 1 的分割,我們可以得到如下的形式:
$T(n) \leq T(9n/10) + T(n/10) + cn$

看附圖可以知道整個 running time 還是 $O(n \lg n)$,即使是 99 比 1 的分割,整個 running time 還是 $O(n \lg n)$,因為產生任何常數比例的分割,都會讓整個 recursion tree 有 $\Theta(\lg n)$ 的高度,而每一層的 running time 都是 $O(n)$,所以 running time 都是 $O(n \lg n)$。