2

I have a set of sets $S=\{s_{1}, s_{2} \ldots s_{n}\}$ that I want to transform into a different set of sets $T=\{t_{1}, t_{2} \ldots t_{m}\}$, $\forall n,m$, where:

$\sum_{i=1}^{n}|s_{i}|=\sum_{j=1}^{m}|t_{j}|$,

$\forall i \ne j \ \ s_{i}\cap s_{j} = \emptyset , \ t_{i}\cap t_{j} = \emptyset$

$|t_{j}|_{j=1}^{m}$ is a given

and $\forall t$, elements are sourced from $\{s_{1}\cup s_{2} \cup \ldots \cup s_n\}$

I need to find a method/algorithm that produces $T$ with minimum dispersion of $s$ elements, ie, I need to keep $s_i$ elements as together as possible in $t_j$, say minimize $\sum_{i,j} [s_{i}\cap t_{j}\neq\emptyset]$ (for which I mean the count of all non empty intersections of elements from $S$ and $T$.)

I've tried to figure this out but currently I am at a loss. Any pointers to literature or a possible approach is most welcome.

TIA, Luis

Luis P
  • 21
  • 2
  • 2
    I don't understand your measure. We can understand this problem as a similarity of a set partition, where the source partition is given, and the cardinalities the target partition is given. See this question for potential partition similarity measures: https://math.stackexchange.com/questions/1347161/how-to-measure-similarity-of-partitions-partitioning – Larry B. May 24 '18 at 20:31
  • Are you sure about the constraint to minimize? Think about the situation that $\left|s_{i}\cap s_{j}\right|\neq0;\Leftrightarrow; i=j$ where there are no duplicate elements.

    In this situation there are no preferred elements to assign to the $t_{j}$:

    Since $\sum_{i}\left|s_{i}\right|=\sum_{j}\left|t_{j}\right|$ it is possible to assign any element of $\bigcup_{i}s_{i}$ exactly once and thus any partition of $\bigcup_{i}s_{i}$ satisfying the constraints on $\left|t_{j}\right|$ will minimize the above sum.

    Conclusion: I think you may want to define a stricter constraint.

    – mol3574710n0fN074710n May 24 '18 at 20:33
  • Thanks guys. $s_i \cap s_j \neq \emptyset$ does not occur. I changed the question and tried to better clarify the objective function. – Luis P May 25 '18 at 14:29
  • I'm not sure if the maximum weighted bipartite matching solves my problem, I'll have a look. In the meantime, let me give you a practical example. I have a train $S$ with $n$ passenger cars from which I need to transfer all passengers to another train $T$ with $m$ passenger cars. Passenger cars maximum occupancy is variable but overall the trains maximum occupancy is the same. How can I assign passengers from $S$ to $T$ so that - as much as possible - passengers travelling together in $S$ (defined as being in the same passenger car) end up together in $T$. – Luis P May 25 '18 at 14:49
  • @Larry B. I think I can see how the maximum weighted bipartite matching would give me a partition similarity metric. Same for many other methods of partition comparison, like counting pairs, Normalized Mutual Information or Variation of Information, that I am familiar with. The issue is that my problem space is large (hundreds of sets, millions of elements) and an exhaustive search of candidate partitionings is not computationally feasible. – Luis P May 26 '18 at 01:46
  • @LarryB. I can now see why you couldn't understand my measure... seems I can't get the problem properly stated :(. Hopefully it's ok now. – Luis P May 26 '18 at 15:45

1 Answers1

0

Say you have only one set $t$ and there are duplicate elements in the $s_{i}$: $\left|\cup_{i}s_{i}\right|<\sum\left|s_{i}\right|$ and $\left|t\right|\overset{!}{=}\sum\left|s_{i}\right|$. Your problem is not well posed in this case, as per the standard definition of a set, $t$ cannot contain any duplicates. You need to rethink your whole situation.