I want to extract sorted data from a huge database based upon two (or more) keys in a very timely manner. Here is a reproducible toy example for two keys only:
n = 10^5;
keys = RandomInteger[{1, 100}, n];
vals = RandomReal[{0, 1}, n];
data = Transpose[{keys, vals}];
The fastest "traditional" way I' ve found:
result1 = Sort @ Cases[data, {25 | 73, r_} :> r];
Much much faster is a V10 solution:
assoc = Merge[Association /@ Rule @@@ data, Identity];
(I use Mergeto allow for duplicate keys, and the time cost of getting assoc is not important to me).
result2 = Sort[assoc[25] ~ Join ~ assoc[73]];
result1 == result2
True
Speed comparison:
Do[Sort @ Cases[data, {25 | 73, r_} :> r], {100}]; // AbsoluteTiming // First
2.017115
Do[Sort[assoc[25] ~ Join ~ assoc[73]], {100}]; // AbsoluteTiming // First
0.030002
Certainly one reason to upgrade, but two or more questions remain:
(a) Could this code be improved ?
(b) And, passing to n = 10^6, result1 still works, but result2 runs forever and has to be aborted.