2

This is a kind of a followup to my previous question. I am starting with a long list of plane coordinates, like this one:

lst = {{0.399933, 0.0904781}, {0.449966, 0.545239}, {0.724983, 0.27262},
       {0.362492, 0.13631}, {0.681246, 0.0681549}, {0.840623, 0.0340774},
       {0.920311, 0.0170387}, {0.960156, 0.00851936}, {0.480078, 0.00425968},
       {0.240039, 0.00212984}, {0.370019, 0.501065}, {0.18501, 0.250532},
       {0.0925049, 0.125266}, {0.546252, 0.0626331}, {0.273126, 0.0313166},
       {0.636563, 0.0156583}, {0.568282, 0.507829}, {0.284141, 0.253915},
       {0.39207, 0.626957}, {0, 0}};

Then I apply Tally[] as follows:

lst2 = Tally[Map[Round[#/(0.5)^2] &, lst]];

(The particular values 0.5 and 2 above are just examples)

The result is a "ragged" list of the kind that compilers do not like:

{{{2, 0}, 3}, {{2, 2}, 2}, {{3, 1}, 1}, {{1, 1}, 3}, {{3, 0}, 3}, {{4, 0}, 2},
 {{1, 0}, 2}, {{1, 2}, 1}, {{0, 1}, 1}, {{2, 3}, 1}, {{0, 0}, 1}}  

My question is: What would be the most efficient way (compilable) to get the same result but in two separate uniform (tensor) lists rather than in one ragged list?

arkajad
  • 571
  • 1
  • 4
  • 13
  • Do you mean lst2[[All,1]] and lst2[[All,2]] or Flatten/@lst2 or something else? – Vitaliy Kaurov Sep 30 '12 at 07:39
  • ... or Transpose[lst2]? – kglr Sep 30 '12 at 08:08
  • lst2[[All,1]] and lst2[[All,2]] do the job, but aren't they creating two new lists? That means copying from a long (perhaps 100 mln long) already created list ... – arkajad Sep 30 '12 at 08:47
  • In addition: my Tally code does not want to compile. – arkajad Sep 30 '12 at 09:01
  • 1
    "In addition: my Tally[] code does not want to compile." - because Tally[] is not compilable. – J. M.'s missing motivation Sep 30 '12 at 09:40
  • No need to use Map: lst2=Tally[Round[lst/(0.5)^2]]. Also you could use BinCounts if you only want something like lst2[[All,2]] returned (although it isn't clear why lst2[[All,1]] and lst2[[All,2]] aren't satisfactory. – Mike Honeychurch Sep 30 '12 at 09:52
  • BinCounts would return me only lst2[[All,2]] without synchronizing it with lst2[[All,1]]. lst1 gives me the cell coordinates in which the count resides. I need them for next steps. Copying list, as far as I understand, takes time. And I will have to run it thousand times with different parameters delta^j (instead of (0.5)^2. Thanks for the hint with Map being unnecessary. Yet Tally would still not compile. I do not know, perhaps it is already optimized to the max? – arkajad Sep 30 '12 at 10:16
  • 1
    Compilation is not a panacea; it will not (by itself) eliminate copies, nor will it (ordinarily) speed up simple functions like Tally. The main reason one may want to do this is if the results are needed for further processing in compiled code and one seeks to avoid an (expensive) call out of the VM, so consider carefully whether you fall into this category. If so, recall that Sort is compilable, which means that this can be done easily in $O(n \log{n})$ time. It can be done in at best $O(n)$ time, but that will be more difficult to achieve. How efficient does this really need to be? – Oleksandr R. Sep 30 '12 at 11:33
  • Thanks, Oleksandr. Well, for a list of 100,000 my uncompiled code runs ca 10 sec to give me one point. Then I need to change the resolution level j etc. and run the same function again with the same initial coordinate data. I will need ca 1000 points just to see if what I get resembles that what I need. It may be the case that I will have to switch from Mathematica to C, where I can use those dangerous pointers? – arkajad Sep 30 '12 at 11:57

1 Answers1

6

Here is a simple construct that answers the question as asked, although I'm still unclear on what you intend to do with the results and so whether compilation is actually the right approach. In particular, after creating these two arrays separately one cannot then pass them back out of the VM without combining them in some way to form a full-rank tensor, so unless you are working entirely within compiled code there would not appear to be any benefit in doing this. As such, I provide a closure that needs to be inlined into other compiled code in order to function:

tally = Compile[{{lst, _Real, 2}, {delta, _Real, 0}, {j, _Integer, 0}},
   Block[{
     sorted = Sort@Round[lst/(delta^j)],
     itemBag = Internal`Bag@Most[{0}], 
     countBag = Internal`Bag@Most[{0}],
     lastItem, count = 0
    },
    lastItem = First[sorted];
    Do[
     If[i == lastItem, ++count; Continue[]];
     Internal`StuffBag[itemBag, lastItem, 1];
     Internal`StuffBag[countBag, count];
     lastItem = i; count = 1, {i, sorted}
    ];
    Internal`StuffBag[itemBag, lastItem, 1];
    Internal`StuffBag[countBag, count];
    items = Partition[Internal`BagPart[itemBag, All], Length@First[sorted]];
    counts = Internal`BagPart[countBag, All];
   ]
  ];

This should not hold any mystery if you understand the use of Internal`Bags inside compiled code, which halirutan and I discussed here, so I won't describe its working unless you find anything unclear.

Let's also define a wrapper that returns the result in the same format as Tally and demonstrates the inlining:

wrapper = With[{tally = tally},
   Compile[{{lst, _Real, 2}, {delta, _Real, 0}, {j, _Integer, 0}},
    Block[{items, counts},
     tally[lst, delta, j];
     result = Transpose[{items, counts}];
    ],
    "CompilationOptions" -> {"InlineCompiledFunctions" -> True},
   ]
  ];

Notice the call out of the VM to set result. This is injurious to performance and prevents parallelization over several inputs, so the above should be taken as an example only.

Since I don't know anything about your use case except that you want to use many different values of delta and j, I'll just present a simple benchmark for comparison with Tally. Note that Tally does not return its results sorted, so most likely it runs in $O(n)$ time, whereas due to the use of Sort (which makes the code much simpler--we do not have to implement our own hash tables, for example) this version runs in $O(n \log{n})$ time.

Let data = RandomChoice[RandomReal[{0, 1}, {100, 2}], 1*^6]. Then,

Timing[wrapper[data, 0.5, 2]; result]
(* -> {0.422, {
     {{0, 1}, 59656}, {{0, 2}, 9961}, {{0, 3}, 49722},
     {{0, 4}, 50165}, {{1, 0}, 29883}, {{1, 1}, 40065},
     {{1, 2}, 70153}, {{1, 3}, 80202}, {{1, 4}, 30174},
     {{2, 0}, 9814}, {{2, 1}, 30040}, {{2, 2}, 89744},
     {{2, 3}, 50014}, {{2, 4}, 40035}, {{3, 1}, 50041},
     {{3, 2}, 69897}, {{3, 3}, 60297}, {{3, 4}, 20018},
     {{4, 0}, 30129}, {{4, 1}, 39903}, {{4, 2}, 30106},
     {{4, 3}, 50037}, {{4, 4}, 9944}}
    } *)

Timing[Sort@Tally[Round[data/(0.5^2)]]]
(* -> {0.907, {
     {{0, 1}, 59656}, {{0, 2}, 9961}, {{0, 3}, 49722},
     {{0, 4}, 50165}, {{1, 0}, 29883}, {{1, 1}, 40065},
     {{1, 2}, 70153}, {{1, 3}, 80202}, {{1, 4}, 30174},
     {{2, 0}, 9814}, {{2, 1}, 30040}, {{2, 2}, 89744},
     {{2, 3}, 50014}, {{2, 4}, 40035}, {{3, 1}, 50041},
     {{3, 2}, 69897}, {{3, 3}, 60297}, {{3, 4}, 20018},
     {{4, 0}, 30129}, {{4, 1}, 39903}, {{4, 2}, 30106},
     {{4, 3}, 50037}, {{4, 4}, 9944}}
    } *)

So, the compiled version is actually better than I expected: for this input it is about twice as fast as Tally. Now let's assume that your usage of the results is implemented in compiled code as well:

parallelWrapper = With[{tally = tally},
   Compile[{{lst, _Real, 2}, {delta, _Real, 0}, {j, _Integer, 0}},
    Block[{items, counts},
     tally[lst, delta, j];
     (* Put your code here *)
    ],
    "CompilationOptions" -> {"InlineCompiledFunctions" -> True},
    "RuntimeAttributes" -> {Listable}, Parallelization -> True
   ]
  ];

AbsoluteTiming[
 wrapper[data, {0.5, 1.0, 2.0, 1.5}, {2, 1, 3, 2}]
]
(* -> {0.4843750, Null} *)

This version returns no result because the tallies are not used for anything, but we observe that one can perform the operation in parallel without any difficulties. (I have a quad-core CPU, so the timing is essentially the same as for the serial case when four combinations of delta and j are run at the same time.)

Oleksandr R.
  • 23,023
  • 4
  • 87
  • 125
  • +1. It's good that I did not start answering, since this would put to shame whatever I'd probably come up with :-). – Leonid Shifrin Sep 30 '12 at 13:53
  • Thanks once more Oleksandr! It will take a while for me to chew your answer through (I am really a newbie that have just ordered the Mathematica book and Trott's Programming book, and waiting for them to arrive). I think I know what I want to do in algorithmic terms. At each step (delta,j) I will also have to sum all tally[[2]]^q (given q) and take Log of the sum. But I will need all of the tally for each next step. I hope I will be able to learn how to do it efficiently from your answer. Thank you! – arkajad Sep 30 '12 at 14:06