Total by a criteria

Question

I am developing a weighted KNN algorithm. In a step, I need to do the sum of weights of each class. For example:

weightsPerclass = {{1, 0.2, .3, .4, .1, .3, .9, 0}, {1, 2, 3, 1, 3, 1, 1, 5}};

Here the first row contains the weights and the second row contains the class labels.

I need to do the sum for each class and I return

{{1, 2, 3, 5}, {2.6, .2, .6, 0}}

We can use a loop but I need a more compact solution. Any suggestion?

score 9 · Answer 1 · edited May 23 '17 at 12:35

Additional methods using WeightedData, EmpiricalDistribution, GatherBy and SparseArray:

{weights,classes} = weightsPerclass = {{1, 0.2, .3, .4, .1, .3, .9, 0}, {1, 2, 3, 1, 3, 1, 1, 5}};

WeightedData

wd = WeightedData[classes, weights];

The property "EmpiricalPDF" almost gives what we need

wd["EmpiricalPDF"]
(* {{1,2,3,5},{0.8125,0.0625,0.125,0.}} *)

except that it has to be de-normalized by Total[weights]:

{First@# , Total[weights] Last@#} &@wd["EmpiricalPDF"]
(* {{1,2,3,5},{2.6,0.2,0.4,0.}} *)

Defining a function you can get the result in one step:

wdF = With[{p = WeightedData[#2, #1]["EmpiricalPDF"], w = Total @ #}, {1, w} p] &;

wdF[weights, classes]
(* {{1,2,3,5},{2.6,0.2,0.4,0.}} *)

EmpiricalDistribution

ed = EmpiricalDistribution[Rule @@ weightsPerclass];
ed["Domain"]
(* {1,2,3,5} *)

Total[weights] ed["Weights"]
(* {2.6,0.2,0.4,0.} *)

{%%, %}
(* {{1, 2, 3, 5}, {2.6, 0.2, 0.4, 0.}} *)

You can also define a function that returns the desired output in one step:

ClearAll[wedF]
wedF = With[{d = EmpiricalDistribution[# -> #2], w = Total@#}, {d["Domain"], w d["Weights"]}] &;

wedF[weights, classes]
(* {{1, 2, 3, 5}, {2.6, 0.2, 0.4, 0.}} *)

GatherBy

ClearAll[gthrBy];
gthrBy = Transpose[{Last@First@#, Total[First /@ #]} & /@ GatherBy[Transpose@#, Last]] &;

gthrBy@weightsPerclass
(* {{1,2,3,5},{2.6,0.2,0.4,0}} *)

SparseArray

System`SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" ->Total}];
sa = SparseArray[classes -> weights];
System`SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> First}];
{Flatten[sa["NonzeroPositions"]], sa["NonzeroValues"]}
(* {{1,2,3,5},{2.6,0.2,0.4,0}} *)

See also: Optimizing 2D binning code and Fast 2D binning algorithm

GroupBy - Version 10

Through[{Keys,Values}[GroupBy[Transpose@weightsPerclass,Last -> First, Total]]]
 (* {{1,2,3,5}, {2.6,0.2,0.4,0}} *)

@kguler Why the empirical PDF does not give directly the best resuls. Why we have to multiply the results by the Total[weights]. Could you explain this point please? — BetterEnglish, Oct 09 '14 at 15:42
@Developer2000, PDF values are normalized so that they add up to 1 over the domain; i.e. pdf[i] is totalWeight [i] / totalWeight[j in Domain]. To get the raw weights we need to multiply pdf values by the total weight of all values in the domain. — kglr, Oct 09 '14 at 16:12

score 5 · Accepted Answer · answered Oct 09 '14 at 05:32

5

Many ways to do this.

w = {{1, 0.2, .3, .4, .1, .3, .9, 0}, {1, 2, 3, 1, 3, 1, 1, 5}};
class = Union[w[[2]]];
Map[Total, Cases[Transpose[w], {x_, y_} /; y == # :> x] & /@ class];
{class, %}

Mathematica graphics

answered Oct 09 '14 at 05:32

Nasser

143,286
11
154
359

1

FYI: There is a performance problem with this approach. The entire list must be rescanned for every element in class which will cause considerable slow-down as the length of that list increases. This is the problem I was referring to in (25591). – Mr.Wizard Oct 09 '14 at 12:50

score 5 · Answer 3 · edited Oct 09 '14 at 12:54

5

Using Reap and Sow:

Transpose[
 Last@Reap[
   MapThread[Sow, weightsPerclass], _, {#1, Total@#2} &]]

{{1, 2, 3, 5}, {2.6, 0.2, 0.4, 0}}

edited Oct 09 '14 at 12:54

Mr.Wizard

271,378
34
587
1,371

answered Oct 09 '14 at 10:01

ubpdqn

60,617
3
59
148

score 5 · Answer 4 · answered Oct 09 '14 at 12:47

If you have Association functionality there is a rather nice approach using Merge:

w = {{1, 0.2, .3, .4, .1, .3, .9, 0}, {1, 2, 3, 1, 3, 1, 1, 5}};

Thread[#2 -> # & @@ w] ~Merge~ Total

<|1 -> 2.6, 2 -> 0.2, 3 -> 0.4, 5 -> 0|>

The output is itself an association. It may be desirable to keep the format. If not:

List @@@ Normal[%]\[Transpose]

{{1, 2, 3, 5}, {2.6, 0.2, 0.4, 0}}

If you cannot use associations it is better is to use GatherBy:

{#[[1, 2]], Tr @ #[[All, 1]]} & /@ GatherBy[w\[Transpose], Last]\[Transpose]

{{1, 2, 3, 5}, {2.6, 0.2, 0.4, 0}}

Unlike Nasser's code this does not rescan the list for every unique element in the second part which would result in considerable slow-down as that number increases.

score 3 · Answer 5 · answered Oct 09 '14 at 13:44

3

Late to the party, but here's a way using PositionIndex (v10):

With[
 {pi = PositionIndex[weightsPerclass[[2]]]},
 {Keys@pi, Plus @@ (weightsPerclass[[1, #]]) & /@ Values@pi}
 ]

{{1, 2, 3, 5}, {2.6, 0.2, 0.4, 0}}

answered Oct 09 '14 at 13:44

kale

10,922
1
32
69

score 2 · Answer 6 · answered Oct 09 '14 at 05:24

2

Step for step:

data = {{1, 0.2, .3, .4, .1, .3, .9, 0}, {1, 2, 3, 1, 3, 1, 1, 5}};

mult = data[[1]] * data[[2]]

{1, 0.4, 0.9, 0.4, 0.3, 0.3, 0.9, 0}

union = Union @ data[[2]]

{1, 2, 3, 5}

pos = Flatten /@ Map[Position[data[[2]], #] &, union]

{{1, 4, 6, 7}, {2}, {3, 5}, {8}}

total = Total /@ Map[mult[[pos[[#]]]] &, Range @ Length @ pos]

{2.6, 0.4, 1.2, 0}

Join[{union, total}]

{{1, 2, 3, 5}, {2.6, 0.4, 1.2, 0}}

answered Oct 09 '14 at 05:24

eldo

67,911
5
60
168

Please see my comment under Nasser's answer. Mapping Position (or the equivalent) should be avoided whenever possible if performance is a concern. – Mr.Wizard Oct 09 '14 at 12:52

Total by a criteria

6 Answers6

Linked