1

I'm building a Hierarchical Agglomerative Clustering algorithm and I'm trying to estimate the time the computer will take to build a hierarchy of clusters for a given set of samples. For $m$ samples, I have to calculate $m-1$ levels in a binary dendrogram. For each level with $n$ elements, I have to calculate $\sum_i^ni-1=\frac{(n-1)*n}{2}$ distances. So for the total $m$ samples I need to calculate the following number of distances: $$\sum_{i=2}^m\frac{(i-1)*i}{2}$$ I need to obtain a function $f(x)$ that represents the above summation and that can be calculated in a single step, instead of iterating and summing.

  1. What's the approach to follow?
  2. How is it called in case I have to ask for similar things again

EDIT:

By kind request change the wrong use of summatory to summation

  • You want to calculate a sum. The process of doing so is called summation. There is no noun summatory in the English language that I am aware of. – Carsten S Dec 23 '13 at 10:03

4 Answers4

4

You can split the sum into two subsums, to get a closed form:

$$\Sigma \frac {(i-1)*i}{2}=\Sigma \frac {i^2}{2}-\Sigma \frac {i}{2}=\frac {n(n+1)(2n+1)}{12}-\frac {n(n+1)}{4}=\frac{n(n+1)(2n-2)}{12}=\frac {n^3-n}{6} $$

I don't know if there is a general name; I just subtracted the respective closed forms for $\Sigma \frac {i^2}{2}$ and for $\Sigma \frac {i}{2}$

EDIT: I accidentally started the sum at $i=1$ , instead of at $i=2$. You can just subtract the first term of the difference from the sum if you choose too, but notice that the difference $\frac { 1^2-1 }{2}=0$.

user99680
  • 6,708
  • This one works for $n=m-1$. Though it seems a better response in regards of HOW to calculate it's not totally right and @rasher response seems more elegant and simple. So let me set the other one as the right one. Thanks for your prompt response! – Daniel Cerecedo Dec 23 '13 at 09:29
  • @dcerecedo: You're right that rasher's is more elegant, but mine also adds up to $\frac {n^3-n}{6}$. – user99680 Dec 23 '13 at 09:35
  • There you go. Full explanation plus right elegant answer. You get the badge!! Thank all of you guys. – Daniel Cerecedo Dec 23 '13 at 22:15
3

You have mentioned the sum $\sum_{i=1}^n i = \sum_{i=1}^n {i\choose 1} = \frac{n(n+1)}{2} = {{n+1}\choose 2}$.

Your post is looking to compute $\sum_{i=2}^n \frac{i(i-1)}{2} = \sum_{i=2}^n {i\choose 2} = \frac{n(n+1)(n-1)}{6} = {{n+1}\choose 3}$.

We can even write out a trivial case: $\sum_{i=0}^n 1 = \sum_{i=0}^n {i\choose 0} = n+1 = {{n+1}\choose 1} $.

In general, we have the identity $\sum_{i=k}^n {i\choose k} = {{n+1}\choose {k+1}}$. If you look at this on Pascal's Triangle, you will see why it is sometimes call the "hockey stick identity".

(This formula is like a discrete version of the integral $\int_0^x t^n dt = \frac{x^n}{n+1}$. In fact, we can deduce this integral formula directly from the definition of the Riemann integral, and without using any mathematical facts at all besides the hockey stick identity!)

There are many ways to quickly see why this identity is true. Here are three major variants:

1) Directly from Pascal's Triangle: We can write ${i\choose k} = {{i+1}\choose {k+1}} - {i\choose{k+1}}$, which is just the summation identity of Pascal's Triangle. Then the sum telescopes: $$\sum_{i=k}^n {i\choose k} = \sum_{i=k}^n \left({{i+1}\choose {k+1}} - {i\choose{k+1}}\right) = {{n+1}\choose {k+1}} - {k\choose{k+1}} = {{n+1}\choose {k+1}}$$ Equivalently, we can add $0={k\choose{k+1}}$ to the sum $\sum_{i=k}^n {i\choose k}$ and watch the hockey stick collapse, one term at a time.

In the same spirit, we could also use addition, or show that $\sum_{i=k}^n {i\choose k}$ satisfies the same recursion formula as ${{n+1}\choose {k+1}}$. We could also get a different kind of telescoping series by writing ${i\choose k}$ as a sum of two binomial coefficients, rather than a difference.

2) Counting: Suppose that we have $n+1$ people of ages $1, 2, \ldots n+1$, and we want to choose a team of $k+1$ people from among them. Clearly, there are ${{n+1}\choose {k+1}}$ ways to do this. But we could count it differently: first we pick the oldest person on the team, then pick the remaining $k$ members from all the people younger than them. The oldest person on the team should have age at least $k+1$, so the number of ways to do this is ${k\choose k} + {{k+1}\choose k} + \ldots + {n\choose k}$.

3) Generating Functions: ${{n+1}\choose {k+1}}$ is the coefficient of $x^{k+1}$ in $(1+x)^{n+1}$. But $\sum_{i=k}^n {i\choose k}$ is the coefficient of $x^k$ in $\sum_{i=k}^n (1+x)^i = (1+x)^k \cdot \frac{(1+x)^{n-k+1} - 1}{(1+x)-1}$. Equality follows immediately.

For a different flavor, but the same basic idea, consider ${{n+1}\choose {k+1}}$ as the coefficient of $\frac{x^{n-k}}{(k+1)!}$ in the $(k+1)^{\operatorname{st}}$ derivative of $x^{n+1}$.

Andrew Dudzik
  • 30,074
2

$\binom{m+1}{3}=\frac{1}{6} (m-1) m (m+1)=\frac{1}{6} \left(m^3-m\right)$. It's just a binomial coefficient.

rasher
  • 517
  • Thanks, this one works. Can you explain the rationale behind your response? I'm also interested in HOW to do it. – Daniel Cerecedo Dec 23 '13 at 09:30
  • @dcerecedo: You can arrive at it via user99680's nice explanation below (I think we were answering pretty much at the same time, didn't see theirs before typing mine). In this case I recognized by eye, but you can always arrive at the solution decomposing the sum & rearranging/refining as needed. Edit- of course, you can plug things into places like WolframAlpha too - probably the way to go once you're satisfied you 'get' the mechanics... – rasher Dec 23 '13 at 09:49
1

Let's go back to where your sum came from. You want to calculate

$$\sum_{r=0}^m\frac{r(r-1)}2=\sum_{r=0}^m\sum_{s=1}^k s-1= \sum_{r=0}^m\sum_{s=0}^{r-1} s=\sum_{r=0}^m\sum_{s=0}^{r-1} \sum_{t=0}^{s-1}1. $$ Now, how many $1$s do you sum here? That is one for every triple $(r,s,t)$ with $0\le t<s<r\le m$. So this is one for every three-element subset of $\{0,1,\ldots,m\}$. Since the latter set has $m+1$ elements, there are $$\binom{m+1}3=\frac{(m+1)m(m-1)}{1\cdot2\cdot3}$$ of these.


In general an argument like this shows that $$\sum_{r=0}^{m-1}\binom rk=\binom m{k+1}.$$ There are many ways to justify this sum, look them up. Also note that these can be used to calculate other sums of polynomials. For example $$n^2=2\binom n2+n=2\binom n2+\binom n1, $$ and so we get $$\sum_{n=0}^{m-1} n^2=\sum_{n=0}^{m-1} 2\binom n2+\binom n1 = 2\binom m3+\binom m2=\frac{2m(m-1)(m-2)+3m(m-1)}6 =\frac{m(m-1)(2m-1)}6, $$ which is the sum that user99680 used in his answer.

Carsten S
  • 8,726