I have data that is given as a list of ordered pairs mixed with scalars. The pairs can contain infinite bounds. My goal is to convert the data into an index used in future computations.
data = {{1, ∞}, {-∞, 2}, 3, {2, 2}, {2, 3}};
This gives me all of the unique values present in data.
udata = Sort[DeleteDuplicates[Flatten@data], Less]
==> {-∞, 1, 2, 3, ∞}
Now I use Dispatch to create replacement rules based on the unique values.
dsptch = Dispatch[Thread[udata -> Range[Length[udata]]]];
Finally I replace the values with their indices and expand scalars a such that they are also pairs {a,a}. This results in a matrix of indices which is what I'm after.
Replace[data /. dsptch, a_Integer :> {a, a}, 1]
==> {{2, 5}, {1, 3}, {4, 4}, {3, 3}, {3, 4}}
NOTES:
The number of unique values is generally small compared to the length of
databut this doesn't have to be the case.Any real numbers are possible. The
dataI've shown simply gives a sense of the structural possibilities.
Question: Is there a way to create the final matrix of indices that is much faster than what I'm doing here?
Edit: To test the how potential solutions scale I recommend using the following data. It is fairly representative of a true-to-life case.
inf = {#, ∞} & /@ RandomChoice[Range[1000], 3*10^5];
neginf = {-∞, #} & /@ RandomChoice[Range[1000], 10^5];
int = Sort /@ RandomChoice[Range[1000], {10^5, 2}];
num = RandomChoice[Range[1000], 5*10^5];
testData = RandomSample[Join[inf, neginf, int, num]];
Sort@DeleteDuplicates@Flattenis practically unbeatable. I tried. – rcollyer Apr 21 '12 at 22:19Sort...Flattenwas going to be next to impossible, I tried usingReapandSowto simultaneously collect the unique terms and substitute in a function that would later return the index. Twice as slow as your method. Tried using an implementation of a binary tree, it can't handle $10^6$ terms, e.g. $10^5$ terms on par with your implementation running $10^6$ terms. So, I don't know exactly optimize the bottleneck any further. – rcollyer Apr 22 '12 at 02:42If[Length[#] == {}, {#, #}, #] & /@ ArrayComponents[testData];– Cameron Murray Mar 23 '13 at 02:58