How to use Union on list of lists without sorting them first?

Question

If I do

ClearAll[a, d]
lsts = {{a, d}, {a, d}};
Union[lsts]

I get the expected answer

{{a, d}}

but if I do

lsts = {{a, d}, {d, a}};
Union[lsts]

I get

{{a, d}, {d, a}}

Since I am using Union, I thought the order of the lists would not matter. Hence to get around this, now I always add Sort first, like this

lsts = {{a, d}, {d, a}};
Union[Sort /@ lsts]

and now I get expected answer

{{a, d}}

Question: Is this the right way to approach this? or do you recommend a better way?

why is {{a,d}} the expected answer? I interpret lsts as a list of sets, which I want to join. So shouldn't it be {a,d}? — niklasfi, Jan 17 '12 at 22:58
sorry but what is the difference between this and Union @@ lsts? — Silvia, Jan 17 '12 at 23:00

score 14 · Accepted Answer · answered Jan 17 '12 at 23:02

14

Sorting of sub-lists seems unavoidable since this is what brings them to a "canonical form" in this problem. If you don't care about the order of your resulting sub-lists, you could used DeleteDuplicates in place of Union though - this should be faster for large lists.

answered Jan 17 '12 at 23:02

Leonid Shifrin

114,335
15
329
420

Good call, I didn't think of that. And in fact DeleteDuplicates allows a custom equality test, so you could only sort the list in the test if you wanted your results in unsorted form. – David Z Jan 17 '12 at 23:06
@David Be careful with these equality tests, they often lead to slowdowns which are not immediately obvious. For Union such custom-function-induced slowdowns will be much more severe though, see e.g. this discussion: http://www.mathprogramming-intro.org/book/node290.html – Leonid Shifrin Jan 17 '12 at 23:08
True, I should have mentioned that a custom equality test could make the command significantly slower. – David Z Jan 17 '12 at 23:10
Hmmm ... the example on mathprogramming-intro.org isn't even an equivalence relation. I'd expect all sorts of strange things to happen in such a case, and especially I'd not expect timings to be in any way representative for the timing you get with functions having the correct semantics. – celtschk Jan 18 '12 at 12:00
@celtschk My bad - the first example doesn't hold water indeed. You are actually the first person to point it out - thanks. The other examples down that pages are ok though, I believe, and still illustrate my main point. I also discussed this problem here: http://forums.wolfram.com/mathgroup/archive/2009/Jul/msg00057.html, and in that thread there are other answers with good points. – Leonid Shifrin Jan 18 '12 at 12:12
Actually, the best example in that post is the one by J. Siehler where he uses Equal: Here as far as I can tell the exact same operation is done as without SameTest, using directly the built-in function, therefore the difference cannot be in any inefficiency of the test itself. But the run time is a factor of 200 larger. OK, thinking again, without the test it might infer equality by the ordering relation (i.e. consider elements equivalent if neither is smaller than the other), just like the C++ STL routines do. – celtschk Jan 18 '12 at 14:10
1

@celtschk Whenever any user-defined function is supplied (even when it happens to be a built-in), Union switches to quadratic-time algorithm based on pairwise comparisons. This is because Union accepts sameness function, not a comparison function (the existence of the latter is a stronger requirement). Whenever it is not specified, it so happens that SameQ (default) has an accompanying comparison function based on canonical sort, so it uses n*log n sorting then. I actually explained this in considerable detail in my post in that thread. – Leonid Shifrin Jan 18 '12 at 14:29
Ah, thanks, I didn't yet have the time to read that post in detail. – celtschk Jan 18 '12 at 15:01

score 12 · Answer 2 · answered Jan 17 '12 at 23:33

12

You can provide a custom SameTest to Union where you can take advantage of your knowledge what should qualify as equal, for example:

In[1] := Union[{{a,d},{d,a}}, SameTest -> (Complement[##] === {}&)]
Out[1] = {{a, d}}

answered Jan 17 '12 at 23:33

Thies Heidecke

8,814
34
44

score 11 · Answer 3 · answered Jan 17 '12 at 23:00

It might be that you're slightly misunderstanding what Union does. It finds the union of the elements of the list that is passed to it, but it doesn't dig into lists within that list. So when you write Union[{{a,d},{a,d}}], the function sees a list with two elements, {a,d} (that's element 1) and {a,d} (that's element 2). They are the same, so it removes the duplicate and returns just {a,d}. But when you write Union[{{a,d},{d,a}}], it sees a list with two different elements: {a,d} (that's element 1) and {d,a} (that's element 2). The fact that those two lists contain the same items is irrelevant; they're not equal, according to an ordered element-by-element comparison, so Union has no duplicates to remove.

Now, it seems like what you're trying to do is get all lists which are distinct in terms of their content, irrespective of order - in other words, you're treating the lists as mathematical sets. I think Union[Sort/@lsts] should be a fine way to go, because that's the standard method of comparing sets for equality when you don't have an actual unordered set type. (If Mathematica does, I don't know about it.)

If he wants to treat the lists as mathematical sets (i.e. also consider duplicates as irrelevant), then using Union also for sorting would be a better idea (because it removes duplicates). That is, for the list lsts={{a, b}, {a, b, b}}, Union[Sort/@lsts] will give lsts back, while Union[Union/@lsts] will give just {{a,b}}. — celtschk, Jan 18 '12 at 07:23

score 7 · Answer 4 · answered Jan 17 '12 at 23:12

You could do the following:

lists = {{c, b, a}, {c, a, b}};
Union[lists, SameTest -> (Sort[#1] == Sort[#2] &)]

Note that the result is {c,a,b}, which is unsorted. The underlying algorithm can no longer take advantage of a linear comparison of the terms, however. As a result the time complexity is quadratic and will slow down your code considerably for very long lists. Thus, I'd advise against this approach. Ordering the lists first, as you've done, is preferable.

score 4 · Answer 5 · answered Jan 17 '12 at 23:31

Your problem is reminiscent of the thing one has to do when dealing with noncommutative monomials.

To add to previous answers:

Depending on the typical content of your lists (especially if you have a lot of strictly identical elements), it might be beneficial to apply Union twice, in this way

Union[ Sort/@(Union[ lst ]) ]

or

Union[ Union[ lst], SameTest -> (Equal[Sort[#1],Sort[#2]] &) ]

if you want to retain some of the diversity of the original instead of having everything mapped to a canonically sorted form.

The problem is more complex when you consider more deeply structured lists of course. You might end up with a very costly comparator function.

An object oriented approach would be to define for each object symbol a comparator function that would be called automatically as SameTest when a modified Union is called with arguments of a given head.

In testing (which I admit was not comprehensive, but it did cover very small through very large lists), Union[Sort /@ lst] was consistently the fastest approach. I tried various combinations of DeleteDuplicates, Tally, Orderless functions, sequences of tests (e.g., test on Max and then sort), and even Hash. — whuber, Jan 18 '12 at 00:53

score 3 · Answer 6 · edited Apr 13 '17 at 12:56

3

To get rid of items that are duplicates under Sort you may use this:

GatherBy[lsts, Sort][[All, 1]]

Afterward you may sort or manipulate that list as you see fit.
Be warned that there is apparently a bug in Mathematica 7 with this specific code.

New in Mathematica 10:

DeleteDuplicatesBy[lsts, Sort]

edited Apr 13 '17 at 12:56

Community

1

answered Jan 23 '12 at 14:08

Mr.Wizard

271,378
34
587
1,371

score 1 · Answer 7 · edited Jan 18 '12 at 05:36

1

I am not sure you are doing it right. Union expects a set per argument. As you only give it one argument, you are basically doing the union over one set A, which is incidentally A. What you want to do is Union @@ lsts which is Apply[Union,lsts]

edited Jan 18 '12 at 05:36

rm -rf

88,781
21
293
472

answered Jan 17 '12 at 23:03

niklasfi

2,613
3
22
18

How to use Union on list of lists without sorting them first?

7 Answers7

Linked