I am a novice in the world of algorithms, ignorant of the taxonomy used.Please pardon me. I have two large sets of numbers A and B where A = {x| 0< x< 9999999999 } B= {y | 0 < y < 9999999999 }. The cardinality of these sets is more than a million. Which is the best algorithm to compute the set differences $A \setminus B$ and $B \setminus A$ ? The sets are sorted ( should I say ordered )? With my pea sized brain , I arrived only at a comparison of every element of A against every element of B , in a linear fashion. Can someone point me to the best way? I would also appreciate and be thankful for "go and read this textbook first/pick a good so -and so textbook first " kind of answers.
3 Answers
Python sets are open-addressing hashtables with a prime probe. In other words every set value can be looked up quickly because it was inserted in such a way (hashed) to differentiate it and find it (not order it) from other values. So operations in python like:
a = set([1,2,3,4])
b = set([3,4,5,6]) #etc..
a&b
#gives you {3, 4}
a|b
#gives you {1,2,3,4,5,6}
a^b
#gives you {1,2,5,6}
a-b
#gives you {1, 2}
- 151
- 2
If you're using C++, there is support in the standard library for this (std::set_difference)
http://www.cplusplus.com/reference/algorithm/set_difference/
That documentation even includes equivalent "pseudocode" (really, just more C++) that you can use to port the idea to other languages. The algorithm is close to the "merging" part of mergesort. Note that std::set_difference operates upon sorted ranges, not std::sets (this is a good thing - means sorted std::vector's are adequate).
- 4,862
- 13
- 22
One cannot find the set differences with less than a linear search, as an entry appearing anywhere in either set might belong to $A\backslash B$ or to $B\backslash A$ (or neither).
An algorithm which achieves this limit (on sorted lists of data) is as follows:
Given input sets A,B, each a strictly ascending list.
Initialize A\B and B\A as empty lists.
Until A or B is empty, compare the heads of both lists
If the heads are equal, remove them.
If the head of A preceeds the head of B, remove the head from A
and include it in A\B.
If the head of B preceeds the head of A, remove the head from B
and include it in B\A.
Now A or B is empty.
Transfer entries left in A to A\B or entries left in B to B\A.
In Prolog, assuming sorted lists A,B of distinct numeric values:
/* set_differ(A,B,AminusB,BminusA) */
set_differ([ ],B,[ ],B) :- !.
set_differ(A,[ ],A,[ ]) :- !.
set_differ([H|A],[H|B],AminusB,BminusA) :-
!,
set_differ(A,B,AminusB,BminusA).
set_differ([Ha|A],[Hb|B],AminusB,BminusA) :-
( Ha < Hb )
-> ( AminusB = [Ha|A_B], set_differ(A,[Hb|B],A_B,BminusA) )
; ( BminusA = [Hb|B_A], set_differ([Ha|A],B,AminusB,B_A) ) .
With tail-recursion this is memory efficient.
- 3,359
- 2
- 21
- 40
AandBare not necessarily of equal length? You seem to be looking for something akin to what Matlab'ssetdiffdoes. – horchler Nov 13 '13 at 16:33