I have a giant list l1 and some smaller lists l21, l21, ..., l2n. The l1 is the superset and has $\tilde{}10^7$ elements. l2? is subset of l1 (if my other code does correct job). All lists only contain numbers (again, if my code does correct job). I need to take complement of l1 and each of the l2?. The number of subsets (i.e. n) is several hundreds. What's the most memory/time efficient way of doing this? I run this on 4GB machine and usually have 1.* GB of free memory when the program reaches that stage. I'd probably save numbers in file and use some command-line tool like grep to do this. But the original lists are all prepared in a Mathematica program. If there is a good solution in Mathematica then i'd like to avoid going out.
Asked
Active
Viewed 454 times
3
user13253
- 8,666
- 2
- 42
- 65
1 Answers
4
This depends.. How many numbers are in the subsets? In which range are you numbers? What do you want to do with each complement? Can you give a small example using RandomInteger to create sample-data? Generally, you could first try to calculate one complement by using something like this
l1 = RandomInteger[{0, 10^6}, {10^7}];
l21 = RandomInteger[{0, 10^6}, {10^5}];
compl = Complement[l1, l21];
and see whether you memory is sufficient.
To see how much memory is used you can tryMemoryInUse[]. ByteCount[expr] is able to find out how much memory is used by a variable (or expression in general). After the above command, I have wasted
MemoryInUse[]/2^20.
(* Out[9]= 101.952 *)
about 100 MB here.
halirutan
- 112,764
- 7
- 263
- 474
Complement[]already? – J. M.'s missing motivation Oct 27 '12 at 02:33BitSet, complement performed usingBitAnd[l1,BitNot[l2?]], and values collected throughBitGet. This is elegant, but slow. For a faster kludge, see [http://mathematica.stackexchange.com/a/13708/3056]. Both methods consume roughly one bit (not byte) per integer (present or not in the list) in list range. (I believeComplementshould be sufficient though.) – kirma Oct 27 '12 at 13:45Complementin the large scheme of things. If you had billions of integers, optimization could make sense, but otherwise... – kirma Oct 27 '12 at 13:54