rightwayman's blog: [Algo] Problem 5

Problem 5-1 Probabilistic counting
With a b-bit counter, we can ordinarily only count up to $2^b - 1$. With R. Morris's probabilistic counting, we can count up to a much larger value at the expense of some loss of precision.
We let a counter value of i represent a count of $n_i$ for $i = 0, 1, \ldots , 2^b-1$, where the $n_i$ form an increasing sequence of nonnegative values. We assume that the initial value of the counter is 0, representing a count of $n_0 = 0$. The INCREMENT operation works on a counter containing the value i in a probabilistic manner. If $i = 2^b - 1$, then an overflow error is reported. Otherwise, the counter is increased by 1 with probability $1/(n_{i+1} - n_i)$, and it remains unchanged with probability $1 - 1/(n_{i+1} - n_i)$.
If we select $n_i = i$ for all $i \geq 0$, then the counter is an ordinary one. More interesting situations arise if we select, say, $n_i = 2^{i - 1}$ for $i > 0$ or $n_i = F_i$ (the ith Fibonacci number-see Section 3.2).
For this problem, assume that $n_{2^b - 1}$ is large enough that the probability of an overflow error is negligible.

a. Show that the expected value represented by the counter after n INCREMENT operations have been performed is exactly n.
b. The analysis of the variance of the count represented by the counter depends on the sequence of the $n_i$. Let us consider a simple case: $n_i = 100i$ for all $i \geq 0$. Estimate the variance in the value represented by the register after n INCREMENT operations have been performed.

a. 令 $X$ 為經過 n 次 INCREMENT operation 後 counter 所代表的值的 random variable，$X_i$為經過第 i 次 INCREMENT operation 的 counter 所代表值的增加量的random variable。注意：唯有把增加量當作一個 random variable，我們才有辦法利用到$E[X] = E[X_1 + X_2 + \cdots ]$的特性。所以我們可以得到：
$\begin{align*} E[X] &= 0 + E[X_1] + E[X_2] + \cdots + E[X_n]\\ &= (n_2 - n_1) \times \frac{1}{n_2 - n_1} + (n_3 - n_2) \times \frac{1}{n_3 - n_2} + \cdots + (n_{i+1} - n_i) \times \frac{1}{n_{i+1} - n_i}\\ &= 1 + 1 + \cdots + 1\\ &= n \end{align*}$

b. 根據上題我們可以假設類似的條件，令$X$為經過 n 次 INCREMENT operations 後 counter所代表的值的 random variable，$X_i$為經過第 i 次 INCREMENT operation 的 counter 所代表值的增加量的random variable。因為任兩個 random variable $X_i$, $X_j$都是彼此獨立的，所以我們也可以利用這個特性：
$Var(X) = Var(X_1) + Var(X_2) + \cdots + Var(X_n)$ 來計算 $Var(X)$。
$\begin{align*} Var(X_i) &= E[X_i^2] - E^2[X_i] \\ &= (n_{i+1} - n_i)^2 \cdot \frac{1}{n_{i+1} - n_i} - 1^2 \\ &= 100^2 \cdot \frac{1}{100} - 1 = 99\\ \end{align*}$
$\begin{align*} \Rightarrow Var(X) &= Var(X_1) + Var(X_2) + \cdots + Var(X_n)\\ &= 99 + 99 + \cdots + 99 \\ &= 99n \end{align*}$

Problem 5-2 Searching an unsorted array
This problem examines three algorithms for searching for a value x in an unsorted array A consisting of n elements.
Consider the following randomized strategy: pick a random index i into A. If $A[i] = x$, then we terminate; otherwise, we continue the search by picking a new random index into A. We continue picking random indices into A until we find an index j such that $A[j] = x$ or until we have checked every element of A. Note that we pick from the whole set of indices each time, so that we may examine a given element more than once.
a. Write pseudocode for a procedure RANDOM-SEARCH to implement the strategy above. Be sure that your algorithm terminates when all indices into A have been picked.
b. Suppose that there is exactly one index i such that $A[i] = x$. What is the expected number of indices into A that must be picked before x is found and RANDOM-SEARCH terminates?
c. Generalizing your solution to part (b), suppose that there are $k \geq 1$ indices i such that $A[i] = x$. What is the expected number of indices into A that must be picked before x is found and RANDOM-SEARCH terminates? Your answer should be a function of n and k.
d. Suppose that there are no indices i such that $A[i] = x$. What is the expected number of indices into A that must be picked before all elements of A have been checked and RANDOM-SEARCH terminates?

Now consider a deterministic linear search algorithm, which we refer to as DETERMINISTIC-SEARCH. Specifically, the algorithm searches A for x in order, considering $A[1], A[2], A[3], \ldots, A[n]$ until either $A[i] = x$ is found or the end of the array is reached. Assume that all possible permutations of the input array are equally likely.
e. Suppose that there is exactly one index i such that $A[i] = x$. What is the expected running time of DETERMINISTIC-SEARCH? What is the worst-case running time of DETERMINISTIC-SEARCH?
f. Generalizing your solution to part (e), suppose that there are $k \geq 1$ indices i such that $A[i] = x$. What is the expected running time of DETERMINISTIC-SEARCH? What is the worst-case running time of DETERMINISTIC-SEARCH? Your answer should be a function of n and k.
g. Suppose that there are no indices i such that $A[i] = x$. What is the expected running time of DETERMINISTIC-SEARCH? What is the worst-case running time of DETERMINISTIC-SEARCH?

Finally, consider a randomized algorithm SCRAMBLE-SEARCH that works by first randomly permuting the input array and then running the deterministic linear search given above on the resulting permuted array.
h. Letting k be the number of indices i such that $A[i] = k$, give the worst-case and expected running times of SCRAMBLE-SEARCH for the cases in which $k = 0$ and $k = 1$. Generalize your solution to handle the case in which $k \geq 1$.
i. Which of the three searching algorithms would you use? Explain your answer.

a.
@{
\proc{Random-Search}(A, n, x)
\li check = 0
\li \textup{create arrays}\;C[1..n]
\li \For i = 1 \;\To n
\li \;\;\;\;C[i] = \;\textup{false}
\li \While check \not= n
\li \;\;\;\;index = \proc{Random}(1, n)
\li \;\;\;\;\If A[index] = x
\li \;\;\;\;\;\;\;\;\Return \textup{true}
\li \;\;\;\;\Elif C[index] = \;\textup{false}
\li \;\;\;\;\;\;\;\;C[index] = \;\textup{true}
\li \;\;\;\;\;\;\;\;check = check + 1
\li \Return \textup{false}
}@

b.
$\begin{align*} E[X] &= \sum_{i = 1}^\infty (\frac{n-1}{n})^{i-1} \cdot \frac{1}{n} \cdot i\\ &= \frac{1}{n-1}\sum_{i = 1}^\infty (\frac{n-1}{n})^i \cdot i\\ &= \frac{1}{n-1} \cdot \frac{\frac{n-1}{n}}{(1 - \frac{n-1}{n})^2}\;\;\textup{(By Taylor theorem)}\\ &= n \end{align*}$

c.
$\begin{align*} E[X] &= \sum_{i = 1}^\infty (\frac{n-k}{n})^{i-1} \cdot \frac{k}{n} \cdot i\\ &= \frac{k}{n-k}\sum_{i = 1}^\infty i \cdot (\frac{n-k}{n})^i\\ &= \frac{k}{n-k} \cdot \frac{\frac{n-k}{n}}{(1 - \frac{n-k}{n})^2}\;\;\textup{(By Taylor theorem)}\\ &= \frac{n}{k} \end{align*}$

d.
Just like the coupon collector's problem, in order to "collect" each of $n$ different indices, it must acquire approximately $n\ln n$ randomly obtained indices to succeed.

e.
$E[X] = \sum_{i = 1}^n i \cdot \frac{1}{n} = \frac{1}{n} \cdot \frac{n(n+1)}{2} = \frac{n+1}{2}$
The worst-case running time of DETERMINISTIC-SEARCH is n.

f.
The expected running time is $\frac{n+1}{k+1}$.(From wikipedia)
The worst case is that all element x are put to the end of array A. So the worst-case running time is $n - k + 1$.

g.
The expected running time of DETERMINISTIC-SEARCH is n. It is the very worst-case.

h.
(1) k = 0
It is the very worst-case, and the expected running time of SCRAMBLE-SEARCH is the sum of the cost of shuffling the array and the cost of running DETERMINISTIC-SEARCH, $n + n = 2n$
(2) k = 1
The worst-case is that x is moved to the end of A after shuffling. The expected running time is $n + \frac{n+1}{2} = \frac{3n + 1}{2}$
(3) generalized case
The worst case is that all x are moved to the end of A after shuffling. The expected running time is $n + \frac{n+1}{k+1}$

rightwayman's blog

2010年6月8日星期二

[Algo] Problem 5

沒有留言:

張貼留言

關於我自己

2010年6月8日 星期二

[Algo] Problem 5

沒有留言:

張貼留言

2010年6月8日星期二