4

Suppose a set of elements of finite size.

E.g.:

$X = \left\lbrace a,b,c,d,e,f,g \right\rbrace$

There are several ways to partition $X$.

E.g:

$P_1 = \left\lbrace \left\lbrace a,b \right\rbrace, \left\lbrace c \right\rbrace, \left\lbrace d,e,f,g \right\rbrace \right\rbrace$

$P_2 = \left\lbrace \left\lbrace c,f,g \right\rbrace, \left\lbrace a,e \right\rbrace, \left\lbrace b,d \right\rbrace \right\rbrace$

$P_3 = \left\lbrace \left\lbrace a,b,c \right\rbrace, \left\lbrace d,e,f,g \right\rbrace \right\rbrace$

How can one measure the similarity of those partitionings?

In this example, $P_1$ and $P_3$ are similar.

I think I can measure similarity of individual partitions (e.g. using [Jaccard similarity](https://en.wikipedia.org/wiki/Jaccard_index $\rightarrow$ $J(\left\lbrace a,b \right\rbrace, \left\lbrace a,b,c \right\rbrace)=\frac{2}{3}$), but I have no idea on how to use this information to find which partitionings are similar.

Any idea?

Hint:

Given a pair of partitionings $P_1$ and $P_2$, we can construct a weighted bipartite graph from elements of $P_1$ to elements of $P_2$, that is $G=\left\langle P_1 \cup P_3, P_1 \times P_3 \right\rangle$, where each weight is:

$w_{i,j} = J(p_{1,i},p_{2,j}) = \frac{\left\vert\ p_{1,i}\ \cap\ p_{2,j}\ \right\vert}{\left\vert\ p_{1,i}\ \cup\ p_{2,j}\ \right\vert}$

where $p_{1,i} \in P_1$ and $p_{2,j} \in P_2$

The two most similar partitionings are the solution to the maximum weighted bipartite matching problem.


Sorry if my vocabulary is not correct, my background is not in Mathematics.

Peter Taylor
  • 13,425
fferri
  • 227
  • How do you define similarity? – Git Gud Jul 02 '15 at 16:37
  • @GitGud I gave an example of that; I am asking if you can provide a reasonable definition as well – fferri Jul 02 '15 at 16:39
  • Why do you consider $P_1$ and $P_3$ to be similar? Is it because $P_1$ is a refinement of $P_3$? – Ludolila Jul 02 '15 at 17:11
  • @Ludolila no, not for that reason. Maybe because the partitions are pairwise (Jaccard-) similar (not completely correct, I hope you get the intuition) – fferri Jul 02 '15 at 17:14
  • @Ludolila $P_1$ and $P_3$ are similar because if you consider the weighted bipartite graph $\left\langle P_1 \cup P_3, P_1 \times P_3 \right\rangle$ where the weights are tha Jaccard coefficients, the maximal matching has also maximum weights. So, how to quantify this? – fferri Jul 02 '15 at 20:24

2 Answers2

2

Based on your suggestion, given two partitions $P_1$ and $P_2$, you could construct a complete bipartite graph, where $P_1$ and $P_2$ are the parts, and assign to every edge the Jaccard similarity weight.

Taking the average weight (for example) of the edges in a maximum matching (which you can find using the augmented path algorithm or max-flow) will give you a number in $[0,1]$.

Partitions also correspond with Young tableaux and conjugacy classes of permutations which could also be used to describe similarity (e.g. distance in the Cayley graph of $S_n$, with transpositions as generators, between representatives of each conjugacy class).

Ludolila
  • 3,034
2

This Wikipedia page (https://en.wikipedia.org/wiki/Adjusted_mutual_information#Mutual_information_of_two_partitions) discusses a way to answer your question using the notion of mutual information, and the papers by Vinh et al. pointed to in the “References” section discuss several other options.

macleginn
  • 359