33

Given a couple of lists like a={1,2,3,4,6} and b={2,3,4,6,9} I can use the built-in Mathematica symbol Tuples [{a,b}] and get all the arrangements easilly enough. But I am struggling with how to eliminate the generation of replacement results, for example:

{2,2} and {4,2} and {4,4} are not desired but all the totally unique results like {1,2} and {1,3} are desired. So essentially I want to find all the combinations of numbers that do not have any ordering repeats such as {4,2} where I already have {2,4} and no number repeats such as {2,2} and {4,4}.

Any ideas would be appreaciated!

canadian_scholar
  • 3,754
  • 1
  • 27
  • 45
Sinistar
  • 1,029
  • 1
  • 10
  • 19

7 Answers7

31

Here is an alternative version of Mr. Wizard's uniqueTuples function, which is faster on the data I have tested.

The idea is to create a function f which has the following properties:

  • It returns an empty Sequence[] if two of its arguments are the same

  • For any other input it outputs a List of the arguments, but also sets a downvalue so that next time it is called with the same arguments, it returns an empty Sequence[]

  • It is Orderless so that "the same arguments" can be in any order

The two input lists are then processed by Outer, feeding each tuple (as a flattened sequence of arguments) to f.

For example,

  • The list elements provided by Outer are {1,2,3} and 2. We evaluate f[1,2,3,2] which returns Sequence[] because 2 is duplicated.

  • The next list elements provided by Outer are {4,5,6} and 7. We evaluate f[4,5,6,7] which returns {4,5,6,7} and sets f[4,5,6,7]=Sequence[].

  • The next list elements provided by Outer are {5,7,4} and 6. We evaluate f[5,7,4,6] which is the same as f[4,5,6,7] and therefore returns Sequence[].

So the output from these 3 calls to f is just {4,5,6,7} as the others are not considered unique.

The alternative uniqueTuples looks like this:

(* this is the re-written bit *)
uniqueTuples[a_List, b_List] := Module[{f},
  f[___, x_, x_, ___] = Sequence[];
  f[x_, y__] := (f[x, y] = Sequence[]; {x, y});
  SetAttributes[f, Orderless];
  Flatten[Outer[f @@ Flatten[{##}] &, a, b, 1], 1]]

(* this is the same as Mr. Wizard *)
uniqueTuples[{a_List, x__List}] := Fold[uniqueTuples, a, {x}] 

Testing on data = RandomInteger[40, {6, 11}] gave me a Timing of 47.7 seconds for Mr. Wizards original code, and 6.6 seconds using this. I have no idea how the timings and memory usage scale as you go to larger data sets.

Simon Woods
  • 84,945
  • 8
  • 175
  • 324
  • +1, I've always liked inner functions, plus a clever use of Sequence and Downvalues. – rcollyer Jul 06 '12 at 14:32
  • Simon, it's good have you aboard. Nicely done. – Mr.Wizard Jul 06 '12 at 15:00
  • 1
    I had missed this, nice one – Rojo Mar 03 '13 at 20:11
  • 1
    @Mr.Wizard Thank you and Simon very much for showing these two wonderful functions uniqueTuples. However, I could not reproduce the claim that Simon's function is considerably faster than Mr.Wizard's one. On my computer, it is even two times slower. This might very well be related to a possible bug in Module, as described here. – Fred Simons Jan 20 '15 at 10:10
  • @FredSimons Thanks for the notice. I confirm the performance you report in 10.0.2 under Windows using the data from Simon's answer above. My function takes only two second while his takes five. It seems that since version 7 (what I was using before) something used in my code has been greatly optimized. I'll try to track down precisely what it is. – Mr.Wizard Jan 20 '15 at 12:51
  • @FredSimons It seems that in version 7 DeleteDuplicates was very inefficient in this application. Using Union yields more than an order of magnitude improvement. In 10.0.2 Union is also faster but only fractionally. – Mr.Wizard Jan 20 '15 at 13:19
19

Edit 2015

While I found Simon Woods's code informative to the degree that I awarded it a bounty it seems that the only reason it performed better than mine was that DeleteDuplicates was slow. Fred Simons pointed out that my (original) code is no longer slow, and actually outperforms Simon Woods' function. Experimentation showed that using Union in version 7 would have improved the performance of my code by more than an order of magnitude. In version 10.0.2 the improvement is more subtle but still significant. Following this discovery I am rewriting my function. The output will be in a different order but it is otherwise identical. If you wish to compare the original code please see the edit history for the image before this edit was made.


I recommend this:

uniqueTuples[a_List, b_List] :=
  Union @@ Table[
    Sort @ Flatten[{i, #}] & /@ DeleteCases[b, Alternatives @@ i],
    {i, a}
  ]

uniqueTuples[a : {__List}] := Fold[uniqueTuples, a]

Usage:

uniqueTuples[ {{3, 7, 2, 6}, {7, 6, 2, 1}, {7, 2, 4, 6}} ]
{{1, 2, 3}, {1, 2, 4}, {1, 2, 6}, {1, 2, 7},
 {1, 3, 4}, {1, 3, 6}, {1, 3, 7}, {1, 4, 6},
 {1, 4, 7}, {1, 6, 7}, {2, 3, 4}, {2, 3, 6},
 {2, 3, 7}, {2, 4, 6}, {2, 4, 7}, {2, 6, 7},
 {3, 4, 6}, {3, 4, 7}, {3, 6, 7}, {4, 6, 7}}

Explanation

The idea is to append an element only if it is not in the existing list. This is done with DeleteCases.

  • Alternatives @@ (list) gives a pattern that matches any element in list.

  • Flatten[{i, #}] & is used to append to both a list or a single element.

  • This function is mapped (/@) to any remaining elements.

For multiple lists this process is repeated using the results of the previous lists as the new "a" list, using Fold. I am using the short syntax for Fold.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • That seems to work great with two lists, given my simple example question. In my real problem I am actually working with 6 lists with up to 25+ items in each list (and the item count may not be equal). I will try and study your code and see if I can work it out. The little manipulators like #,&,/,@ kind of seem kind of confusing, probably cause just because I am not clear on what they do in all cases. Thanks for the help! – Sinistar Feb 17 '12 at 17:02
  • Join @@ (Thread[{#, #, DeleteCases[cc, #]}] & /@ {aa, bb}) I tried this, expanding to three lists for a try, it keeps the 3rd list unique, but won't make the first two unique. I get almost the opposite! :) {1,1,4}, {2,2,8}, {3,3,12} etc. – Sinistar Feb 17 '12 at 17:17
  • @Sinistar please see my update – Mr.Wizard Feb 17 '12 at 17:21
  • Perhaps, I misunderstood the question, but for the example lists a and b your first function outputs a list that contains both {2, 3}, {2, 4}, {2, 6}, {3, 4}, {3, 6}, {4, 6} and their reversed versions. – kglr Feb 17 '12 at 17:37
  • @kguler indeed I misread the question; what exactly is an "ordering repeat" ? – Mr.Wizard Feb 17 '12 at 17:41
  • 2
    It's an Urn Ball problem. Remove a ball, and don't put it back in the choices (i.e. 'select without replacement') and ordering doesn't matter, Red, Blue, Black is the same as Blue, Red, Black. So with the list version, 1,2,4 is identical in this consideration to 4,2,1 (and lexicographically those are undesireable in this application). – Sinistar Feb 17 '12 at 18:03
  • @Sinistar if ordering doesn't matter I'll tweak my code a little. – Mr.Wizard Feb 17 '12 at 18:09
  • @Sinistar please test my current code ant tell me if the result matches your expectation. – Mr.Wizard Feb 17 '12 at 18:12
  • I will indeed! Thanks! And I do appreciate your help. The problem I am grappling with is hard enough for me and having to grapple with the intracacies of M! compounds the problem. I do love M! though, just a good bit different than anything I have worked with before. – Sinistar Feb 17 '12 at 19:25
  • @Mr.Wizard Very easy on memory! I started it a few minutes ago, I went ahead and threw all the monkeys and wrenches at your algorithm. I will let you know how long it took to run, but it doesnt look like it will be any faster, but has much less potential to crash the evaluation. I'm puzzled about 1 thing though, CPU is less 15% used, and less than 2GB of RAM. Where is the IO blocking at? The output should weigh in less than 10 million combinations. – Sinistar Feb 17 '12 at 20:15
  • Worked after about 40 hours. I havent been able to look the out put over yet, because once I wanted to view the output, M! gobbled up all my memory and it has been regurgitating the dataset now for about 12 hours so far. Looks like it worked as desired, but I only had a snippet of results to look at. Maybe I need to figure out how to spit the output to a text file as it gathers results. – Sinistar Feb 20 '12 at 14:33
4

Take 1

You can use the following function if you don't have too many lists with too many elements:

uniqueTuples1[lists_] := 
 Union[Select[Union /@ Tuples[lists], Length[#] == Length[lists] &]]

uniqueTuples1[{a,b}]

This works by generating all possible tuples, and filtering out those that we don't want. The disadvantage is that the number of all tuples is very large, and for many lists with many elements, this will quickly fill up all available memory.

Take 2

For two lists only, the following algorithm is more memory efficient:

Join[Subsets[Intersection[a, b], {2}], 
     Tuples[{Complement[a, b], b}], 
     Tuples[{Complement[b, a], Intersection[a, b]}]]

Unfortunately generalizing this to an arbitrary number of lists is not trivial.

Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
  • Yes, the nemesis of combinatorics is the exponential chaos – Sinistar Feb 17 '12 at 17:25
  • @Sinistar Please see my update. This solution does not use a lot of memory and is also reasonably fast. – Szabolcs Feb 17 '12 at 18:22
  • I will check it out, thanks so much! I have not finished working with the first one line solution I thought would work. The revision that was applied by another user runs about 6 times faster by removing some redundant function. It computed a 6 x 10 array in about 20 secs on my machine. That was over 65,000 results. So now I am throwing the full data set at it and see how long it takes. I like the simplicity of the code, but it may not be the most efficient offered. I'll try Mr. Wizard's too! – Sinistar Feb 17 '12 at 19:32
  • is the usage for this sets[lists] or sets[list]? – Sinistar Feb 17 '12 at 20:32
  • @Sinistar Taking you example of two lists, usage is sets[{a,b}] – Szabolcs Feb 17 '12 at 21:03
3
Union[Tuples[{a, b}] /. {(i_)..} :> Sequence[],SameTest -> (Length[Union[#1, #2]] == Length[#1] &)]  

not sure how fast it is compared to the other solutions though.

Spawn1701D
  • 1,871
  • 13
  • 14
2

DISCLAIMER (update): as you can see my simple code just filters tuples with strictly ascending ordering. That is why it works correctly only if there is no gaps between elements of initial lists. For example:

In[23]:= Select[Tuples@{{1, 2, 5}, {3, 4, 5}}, Less @@ # &]

Out[23]= {{1, 3}, {1, 4}, {1, 5}, {2, 3}, {2, 4}, {2, 5}}

As you can see, the {5,3} tuple is missing. So I must apologize for misleading all of you.


Here is my variant:

lists = {{1, 2, 3}, {2, 3, 4}, {3, 4, 5}, {4, 5, 6}, {5, 6, 7}, {6, 7, 8}};
Select[Tuples@lists, And @@ Less @@@ Partition[#, 2, 1] &]

{{1, 2, 3, 4, 5, 6}, {1, 2, 3, 4, 5, 7}, {1, 2, 3, 4, 5, 8}, 
{1, 2, 3, 4, 6, 7}, {1, 2, 3, 4, 6, 8}, {1, 2, 3, 4, 7, 8}, 
{1, 2, 3, 5, 6, 7}, {1, 2, 3, 5, 6, 8}, {1, 2, 3, 5, 7, 8}, 
{1, 2, 3, 6, 7, 8}, {1, 2, 4, 5, 6, 7}, {1, 2, 4, 5, 6, 8},
{1, 2, 4, 5, 7, 8}, {1, 2, 4, 6, 7, 8}, {1, 2, 5, 6, 7, 8},
{1, 3, 4, 5, 6, 7}, {1, 3, 4, 5, 6, 8}, {1, 3, 4, 5, 7, 8},
{1, 3, 4, 6, 7, 8}, {1, 3, 5, 6, 7, 8}, {1, 4, 5, 6, 7, 8}, 
{2, 3, 4, 5, 6, 7}, {2, 3, 4, 5, 6, 8}, {2, 3, 4, 5, 7, 8}, 
{2, 3, 4, 6, 7, 8}, {2, 3, 5, 6, 7, 8}, {2, 4, 5, 6, 7, 8},
{3, 4, 5, 6, 7, 8}}

EDIT

Partition and And are redundant:

lists = {{1, 2, 3, 5, 6, 7}, {2, 3, 4, 5, 6, 7}, {3, 4, 5, 6, 7, 
    8}, {4, 5, 6, 7, 8, 9}, {5, 6, 7, 8, 9, 10}, {6, 7, 8, 9, 10, 11, 12}};

a = Select[Tuples@lists, And @@ Less @@@ Partition[#, 2, 1] &]; // AbsoluteTiming
b = Select[Tuples@lists, Less @@ # &]; // AbsoluteTiming
a == b

---
{0.8125000, Null}
{0.1875000, Null}
True
faleichik
  • 12,651
  • 8
  • 43
  • 62
  • Yes, this is kind of where I started, generate all the output, then filter the results. I was having a hard time creating the filter. I will try this! – Sinistar Feb 17 '12 at 17:41
  • AND WE HAVE A WINNER! Beautiful! Thanks for everyone who offered advice! I'm glad others find this to be as interesting a question as me! – Sinistar Feb 17 '12 at 17:52
  • 1
    @Sinistar But this is still based on filtering, which means it generates all tuples first then filters out unnecessary ones. You won't be able to run it on say RandomInteger[100, {6,25}] simply because all tuples don't fit in memory. You said, "In my real problem I am actually working with 6 lists with up to 25+ items in each list". – Szabolcs Feb 17 '12 at 17:53
  • Yeah, it's been running for 5 minutes so far. We will see if it consumes all 16 GB of my RAM. I will try your code too, but this blasted through six lists with 5 elements in each list. Now, I am trying 6 lists with between 15 and 30 elements. :) – Sinistar Feb 17 '12 at 18:04
  • I also tried to speed-up this using Parallelize but eventually parallel version works slower... Anyway this is a brute-force implementation and Mr.Wizards' code seems to be much better. – faleichik Feb 17 '12 at 19:20
  • It ran for an hour used all the memory, but hadn't crashed yet (sometimes my notebooks will continue to run even with extreme memory usage, had one take over 24 hours once). I'm going to try again with this refined code that was offered, as it is about 6 times faster. – Sinistar Feb 17 '12 at 19:22
  • 4.333333333333333333333333 times faster, ;) – Sinistar Feb 17 '12 at 19:35
  • Okay, verdict must be revised. The code update by faleichik improved this significantly, so this nice compact one line of code is handy for arrays about 10x6, maybe a bit more. But with my full uneven array it really couldn't handle the problem unless I had a lot more physical RAM. 16GB and a 32GB swap file will allow it to run, but I gave up waiting after about an hour. Off to see the Wizard and try that one! – Sinistar Feb 17 '12 at 20:10
  • Considering my last update (the code does not work properly) should I vote to delete this post? – faleichik Feb 18 '12 at 08:31
  • I never got to see the output when I used the full dataset, it seemed to work for 10*6 array tupling, I didn't look at every result though. – Sinistar Feb 20 '12 at 15:26
  • The 5,3 tuple is supposed to be missing. We do not want an odering repeat, i.e. We already have 3,5 so 5,3 is redundant when order does not matter. – Sinistar Feb 20 '12 at 15:27
  • In the example we do NOT have 3,5 tuple because 3 is not in the first list. The code works correctly when every list don't contain elements less than elements of the preceding lists. If your lists satisfy this condition then it is probably ok. – faleichik Feb 20 '12 at 17:55
2

Method 1: (simple idea, but slow performance)

Here's another implementation that seems simpler (at least to me), and with a few runs on data = RandomInteger[40, {6, 11}], it took around 120 seconds of Timing.

data = RandomInteger[40, {6, 11}];
Y = Sort[#] & /@ Tuples[data];
Z = DeleteDuplicates /@ Y;
T = Select[Z, Length[#] == Length[data] &];
DeleteDuplicates[T];

I know I'm not being very creative with the variable names, but I didn't want to make it a "one-liner". I just think it is easier to read this way :)

Optional: You can add another Sort at the end if you want your tuples sorted. That obviously adds to the CPU time.

Method 2: (same idea, but improved performance by an order of magnitude)

Use Union instead of DeleteDuplicates. Only 12 seconds are needed to calculate the result!

data = RandomInteger[40, {6, 11}];
Y = Sort[#] & /@ Tuples[data];
Z = Union /@ Y;
T = Select[Z, Length[#] == Length[data] &];
Union@T;

Now that I read it more carefully, this is actually the same method as Szabolcs' answer above.

cartonn
  • 1,005
  • 6
  • 15
2

Focusing on the elimination of duplicates (vs. the generation of uniqueTuples), I have two common functions I use in my notebooks, which when used together without making any assumptions about the input (ie. pairwise sublists or the input even being a list), provide a simple solution.

(* Eliminate duplicates of any permutation in a list *)
noDups@in_:=in;
noDups@in_List:=DeleteDuplicates[DeleteDuplicates@in, 
   If[#2=={},True,MemberQ[Permutations@#2,#1]]&];

(* Eliminate Nulls or empty lists or list elements with duplicate entries *)
noNull@in_List:=Select[DeleteCases[in,Null|{},2], Length@First@in==Length@noDups@#&];

Unfortunately, the //Timing is about half as fast as Spawn1701D's general Union based function. So if we make the assumption that the input can't be a single list of atomic elements (or Nulls or empty lists), we can use the following (the GatherBy given by Mr.Wizard here) which has excellent //Timing comparable even to the fastest solutions:

noDups@in_List := GatherBy[in~DeleteCases~{}, Sort][[All, 1]];
noNull@in_List := Select[in, Length@First@in == Length@Union@# &];

data = RandomInteger[80, {3, 40}];
t = Tuples@data;

noDups@noNull@t //Timing
{0.593750, {{69, 26, 32}, {69, 26, 59},...

Comparing the //Timing for reducing (vs. generating unique) Tuples on t, we get 9.5 Seconds on this reducing algorithm (vs. 5.7 Seconds for Simon Woods generating algorithm) on:

data = RandomInteger[40, {6, 11}];

See the following for explicit comparison (**Note: the timing being the same between the two functions on this significantly smaller data set).

In[1]:= noDups@in_List := GatherBy[in~DeleteCases~{}, Sort][[All, 1]];
         noNull@in_List := Select[in, Length@First@in == Length@Union@# &];

In[3]:= uniqueTuples[a_List, b_List] := 
          Module[{f}, f[___, x_, x_, ___] = Sequence[];
            f[x_, y__] := (f[x, y] = Sequence[]; {x, y});
            SetAttributes[f, Orderless];
            Flatten[Outer[f @@ Flatten[{##}] &, a, b, 1], 1]]
         uniqueTuples[{a_List, x__List}] := Fold[uniqueTuples, a, {x}]

In[5]:= a = {1, 2, 3, 4, 6};
         b = {2, 3, 4, 6, 9};
         t = Tuples[{a, b}];

In[8]:= r1 = noDups@noNull@t
Out[8]= {{1, 2}, {1, 3}, {1, 4}, {1, 6}, {1, 9}, {2, 3}, {2, 4}, {2, 6}, {2, 9}, {3, 4}, {3, 6}, {3, 9}, {4, 6}, {4, 9}, {6, 9}}

In[9]:= r2 = uniqueTuples[{a, b}]
Out[9]= {{1, 2}, {1, 3}, {1, 4}, {1, 6}, {1, 9}, {2, 3}, {2, 4}, {2, 6}, {2, 9}, {3, 4}, {3, 6}, {3, 9}, {4, 6}, {4, 9}, {6, 9}}

In[10]:= r1 == r2
Out[10]= True

In[11]:= data = RandomInteger[40, {5, 11}];
In[12]:= t = Tuples@data;

In[13]:= r1 = noDups@noNull@t // Timing
Out[13]= {1.578125,{{34,38,20,0,23},{34,38,20,0,8}{34,38,20,0,2},{34,38,20,0,7},{34,38,20,0,25},{34,38,20,0,35},{34,38,20,0,14},<<54534>>,{5,10,38,11,2},{5,10,38,11,7},{5,10,38,11,25},{5,10,38,11,35},{5,10,38,11,14},{5,10,38,11,28},{5,10,38,11,17}}}

In[14]:= r2 = uniqueTuples@data // Timing
Out[14]= {1.578125,{{0,20,23,34,38},{0,8,20,34,38},{0,2,20,34,38},{0,7,20,34,38},{0,20,25,34,38},{0,20,34,35,38},{0,14,20,34,38},<<54534>>,{2,5,10,11,38},{5,7,10,11,38},{5,10,11,25,38},{5,10,11,35,38},{5,10,11,14,38},{5,10,11,28,38},{5,10,11,17,38}}}

In[15]:= Sort[Sort@# & /@ r1[[2]]] == Sort[Sort@# & /@ r2[[2]]]
Out[15]= True
Greg Moxness
  • 377
  • 1
  • 14
  • I'm confused; is this supposed to produce the same output as uniqueTuples? I'm not seeing that. – Mr.Wizard Mar 17 '13 at 11:47
  • Not sure why, except you need to sort the results for a True comparison. See last edits to my answer with explicit True comparison results. – Greg Moxness Mar 17 '13 at 19:35
  • Not sure why the timing difference in this comparison was the exactly the same to 7 digits (when the original test on RandomInteger[40,{6,11}] was almost double. Could it be memory paging or ?... – Greg Moxness Mar 17 '13 at 21:43