Here is a simple construct that answers the question as asked, although I'm still unclear on what you intend to do with the results and so whether compilation is actually the right approach. In particular, after creating these two arrays separately one cannot then pass them back out of the VM without combining them in some way to form a full-rank tensor, so unless you are working entirely within compiled code there would not appear to be any benefit in doing this. As such, I provide a closure that needs to be inlined into other compiled code in order to function:
tally = Compile[{{lst, _Real, 2}, {delta, _Real, 0}, {j, _Integer, 0}},
Block[{
sorted = Sort@Round[lst/(delta^j)],
itemBag = Internal`Bag@Most[{0}],
countBag = Internal`Bag@Most[{0}],
lastItem, count = 0
},
lastItem = First[sorted];
Do[
If[i == lastItem, ++count; Continue[]];
Internal`StuffBag[itemBag, lastItem, 1];
Internal`StuffBag[countBag, count];
lastItem = i; count = 1, {i, sorted}
];
Internal`StuffBag[itemBag, lastItem, 1];
Internal`StuffBag[countBag, count];
items = Partition[Internal`BagPart[itemBag, All], Length@First[sorted]];
counts = Internal`BagPart[countBag, All];
]
];
This should not hold any mystery if you understand the use of Internal`Bags inside compiled code, which halirutan and I discussed here, so I won't describe its working unless you find anything unclear.
Let's also define a wrapper that returns the result in the same format as Tally and demonstrates the inlining:
wrapper = With[{tally = tally},
Compile[{{lst, _Real, 2}, {delta, _Real, 0}, {j, _Integer, 0}},
Block[{items, counts},
tally[lst, delta, j];
result = Transpose[{items, counts}];
],
"CompilationOptions" -> {"InlineCompiledFunctions" -> True},
]
];
Notice the call out of the VM to set result. This is injurious to performance and prevents parallelization over several inputs, so the above should be taken as an example only.
Since I don't know anything about your use case except that you want to use many different values of delta and j, I'll just present a simple benchmark for comparison with Tally. Note that Tally does not return its results sorted, so most likely it runs in $O(n)$ time, whereas due to the use of Sort (which makes the code much simpler--we do not have to implement our own hash tables, for example) this version runs in $O(n \log{n})$ time.
Let data = RandomChoice[RandomReal[{0, 1}, {100, 2}], 1*^6]. Then,
Timing[wrapper[data, 0.5, 2]; result]
(* -> {0.422, {
{{0, 1}, 59656}, {{0, 2}, 9961}, {{0, 3}, 49722},
{{0, 4}, 50165}, {{1, 0}, 29883}, {{1, 1}, 40065},
{{1, 2}, 70153}, {{1, 3}, 80202}, {{1, 4}, 30174},
{{2, 0}, 9814}, {{2, 1}, 30040}, {{2, 2}, 89744},
{{2, 3}, 50014}, {{2, 4}, 40035}, {{3, 1}, 50041},
{{3, 2}, 69897}, {{3, 3}, 60297}, {{3, 4}, 20018},
{{4, 0}, 30129}, {{4, 1}, 39903}, {{4, 2}, 30106},
{{4, 3}, 50037}, {{4, 4}, 9944}}
} *)
Timing[Sort@Tally[Round[data/(0.5^2)]]]
(* -> {0.907, {
{{0, 1}, 59656}, {{0, 2}, 9961}, {{0, 3}, 49722},
{{0, 4}, 50165}, {{1, 0}, 29883}, {{1, 1}, 40065},
{{1, 2}, 70153}, {{1, 3}, 80202}, {{1, 4}, 30174},
{{2, 0}, 9814}, {{2, 1}, 30040}, {{2, 2}, 89744},
{{2, 3}, 50014}, {{2, 4}, 40035}, {{3, 1}, 50041},
{{3, 2}, 69897}, {{3, 3}, 60297}, {{3, 4}, 20018},
{{4, 0}, 30129}, {{4, 1}, 39903}, {{4, 2}, 30106},
{{4, 3}, 50037}, {{4, 4}, 9944}}
} *)
So, the compiled version is actually better than I expected: for this input it is about twice as fast as Tally. Now let's assume that your usage of the results is implemented in compiled code as well:
parallelWrapper = With[{tally = tally},
Compile[{{lst, _Real, 2}, {delta, _Real, 0}, {j, _Integer, 0}},
Block[{items, counts},
tally[lst, delta, j];
(* Put your code here *)
],
"CompilationOptions" -> {"InlineCompiledFunctions" -> True},
"RuntimeAttributes" -> {Listable}, Parallelization -> True
]
];
AbsoluteTiming[
wrapper[data, {0.5, 1.0, 2.0, 1.5}, {2, 1, 3, 2}]
]
(* -> {0.4843750, Null} *)
This version returns no result because the tallies are not used for anything, but we observe that one can perform the operation in parallel without any difficulties. (I have a quad-core CPU, so the timing is essentially the same as for the serial case when four combinations of delta and j are run at the same time.)
lst2[[All,1]]andlst2[[All,2]]orFlatten/@lst2or something else? – Vitaliy Kaurov Sep 30 '12 at 07:39Transpose[lst2]? – kglr Sep 30 '12 at 08:08Tallycode does not want to compile. – arkajad Sep 30 '12 at 09:01Tally[]code does not want to compile." - becauseTally[]is not compilable. – J. M.'s missing motivation Sep 30 '12 at 09:40Map:lst2=Tally[Round[lst/(0.5)^2]]. Also you could useBinCountsif you only want something like lst2[[All,2]] returned (although it isn't clear why lst2[[All,1]] and lst2[[All,2]] aren't satisfactory. – Mike Honeychurch Sep 30 '12 at 09:52Tallywould still not compile. I do not know, perhaps it is already optimized to the max? – arkajad Sep 30 '12 at 10:16Tally. The main reason one may want to do this is if the results are needed for further processing in compiled code and one seeks to avoid an (expensive) call out of the VM, so consider carefully whether you fall into this category. If so, recall thatSortis compilable, which means that this can be done easily in $O(n \log{n})$ time. It can be done in at best $O(n)$ time, but that will be more difficult to achieve. How efficient does this really need to be? – Oleksandr R. Sep 30 '12 at 11:33jetc. and run the same function again with the same initial coordinate data. I will need ca 1000 points just to see if what I get resembles that what I need. It may be the case that I will have to switch from Mathematica to C, where I can use those dangerous pointers? – arkajad Sep 30 '12 at 11:57