1

Assume uniform hashing. Given that $\frac{n}{m}<1$ where $n,m$ are integers representing number of elements and slots/buckets respectively. We are trying to find the number of probes in unsuccessful search for a free slot. Let us define the random variable $X$ to be the number of probes made in an unsuccessful search, and let us also define the event $A_i$ , for $i = 1, 2,\cdots,$ to be the event that there is an ith probe and it is to an occupied slot. $X$ is defined in terms of intersection of events $A_1, A_2, \cdots, A_{i-1}$, so $A_i$ means the event that there is an $i$th probe and it's to and occupied slot. Then probability of X or Probes is given as follows:

$$Pr[X(Probes)\ge i] = P[A_1 \cap A_2 \cdots A_{i-1}] = Pr[A_1] \times \cdots \times Pr[A_{i-1} | A_1 \cap \cdots A_{i-2}]$$

I would like to derive the formula below based on my question please here. If we have $n$ elements and $m$ slots to fill those elements in, where both $n, m$ are integers, such that $\frac{n}{m} < 1$, then

\begin{align} P[x\ge i]=\overset{probe>=1}{\frac nm}\times \overset{probe>=2}{\frac{(n-1)}{(m-1)}}\times\overset{probe>=3}{\frac{(n-2)}{(m-2)}}\times\overset{\cdots}{\cdots}\times\overset{Prbe>=i-1}{\frac{(n-i+2)}{(m-i+2)}}\\ Pr[X \ge i] = \frac{n}{m}\times \frac{n-1}{m-1} \times \cdots \frac{n-i+2}{m-i+2} \le (\frac{n}{m})^{i-1} \label{tag1}\tag1\\ \end{align}

The reason we decrease $m$ by $1$ each time above because once we use a slot we can not use it again. $Pr[X\ge i]$ means the probability of not hitting a slot as it's already occupied by another item. For example, $Pr[X\ge 1]$ means the probability of having at least one collision given $m$ slots.

Attempt: I see that for $i=1$ for example, we have

$$Pr[X \ge 1] = \frac{n}{m}$$

Problem 1: I am not sure why in the \ref{tag1} we have for the $(i-1)$ probe $\frac{n-i+2}{m-i+2}$ and not $\frac{n-i+1}{m-i+1}$ for the $i-1$ probe, where I think $Pr[X\ge (i-1)] = \frac{n-i+1}{m-i+1}$ and not $Pr[X\ge (i-1)] = \frac{n-i+2}{m-i+2}$ please?

Problem 2: Why we have $\le (\frac{n}{m})^{i-1}$ above in \ref{tag1} and not $\le (\frac{n}{m})^{i}$ please?

Problem 3: Why $Pr[X\ge i] = P[A_1 \cap A_2 \cdots A_{i-1}]$ and not $Pr[X\ge i] = P[A_1 \cap A_2 \cdots A_{m}]$? I see that if $Pr[X\ge 1] = P[A_1 \cap A_2 \cdots A_{m}]$, so I am not sure why it's $P[A_1 \cap A_2 \cdots A_{i-1}]$ please? So this means $A_m$ is the event that we have already searched all slots and arrived at the last slot that is also occupied as I understand based on how $X$ was defined.

Avv
  • 1,159
  • 1
    How is $X$ defined? You did not address Ben Grossman's comment in your other question. – angryavian Sep 02 '21 at 19:06
  • 1
    For your second question, there are $i-1$ terms on the left-hand side of the inequality, each of which is less than or equal to $n/m$. – angryavian Sep 02 '21 at 19:15
  • @angryavian. I added more details to $X$ – Avv Sep 02 '21 at 19:16
  • 1
    Can you be more specific about how $X$ is defined? You start with all $m$ slots empty, and then probe for an empty slot (how do you choose which slot to probe? randomly? or in order from left to right? why does the process not ignore slots that you've already used?) and put in an item one at a time? And $X$ is the number of times you probed a slot that was already occupied? – angryavian Sep 02 '21 at 19:24
  • @angryavian. Exactly. We start with empty; however, we want here to calculate the worst case meaning that all searches attempts are made to an occupied slots. – Avv Sep 02 '21 at 19:29
  • 1
    You still have not explained where the randomness is coming from. If you are considering the "worst case" scenario, then there is no randomness? How does the process choose which spot to probe? Is a slot chosen uniformly at random each time? Why doesn't the process avoid revisiting slots it has probed before? – angryavian Sep 02 '21 at 20:10
  • @angryavian. Thank you. We are assuming here uniform hashing and we want the find out the expected number of probes in unsuccessful search. That is the original question. – Avv Sep 02 '21 at 20:12
  • @angryavian. Is it clear now please? I added more details. – Avv Sep 02 '21 at 22:08
  • 1
    You are searching for an empty slot? that is not clear, but it seems to make sense. You are trying to put $n$ items into $m$ slots. Are we looking for how long, on the average, it takes to place those $n$ items? It is not clear what end result is desired. – robjohn Sep 02 '21 at 22:31
  • 1
    I still don't understand the setup, but there are several math/probability-related mistakes. First, above equation (1), you have addition, but equation (1) has multiplication. Second, $P(X \ge 1) = P(X=1) + P(X=2) + \cdots$ but in your section labeled "Attempt" you multiply these terms. – angryavian Sep 03 '21 at 01:00
  • @angryavian. Thanks. Corrected. – Avv Sep 03 '21 at 01:27
  • @robjohn. Exactly. We are looking to put an item in a slot (total $m$ slots), but we will assume that each time we probe through slots we got collision (not empty slot). So, $p(x\ge 1) =\frac{1}{m}+\frac{1}{m}+\frac{1}{m}+ \cdots + \frac{1}{m}=\frac{n}{m}$ means the probability to probe at least once to find an empty slot (though we will not because we are looking to find how on average we need to probe). So this is unsuccessful search analysis. – Avv Sep 03 '21 at 01:31
  • @angryavian. You right. Many mistakes were there. Is it good now please? – Avv Sep 03 '21 at 15:28

0 Answers0