7

I am developing a weighted KNN algorithm. In a step, I need to do the sum of weights of each class. For example:

weightsPerclass = {{1, 0.2, .3, .4, .1, .3, .9, 0}, {1, 2, 3, 1, 3, 1, 1, 5}};

Here the first row contains the weights and the second row contains the class labels.

I need to do the sum for each class and I return

{{1, 2, 3, 5}, {2.6, .2, .6, 0}}

We can use a loop but I need a more compact solution. Any suggestion?

Öskå
  • 8,587
  • 4
  • 30
  • 49
BetterEnglish
  • 2,026
  • 13
  • 19

6 Answers6

9

Additional methods using WeightedData, EmpiricalDistribution, GatherBy and SparseArray:

{weights,classes} = weightsPerclass = {{1, 0.2, .3, .4, .1, .3, .9, 0}, {1, 2, 3, 1, 3, 1, 1, 5}};

WeightedData

wd = WeightedData[classes, weights];

The property "EmpiricalPDF" almost gives what we need

wd["EmpiricalPDF"]
(* {{1,2,3,5},{0.8125,0.0625,0.125,0.}} *)

except that it has to be de-normalized by Total[weights]:

{First@# , Total[weights] Last@#} &@wd["EmpiricalPDF"]
(* {{1,2,3,5},{2.6,0.2,0.4,0.}} *)

Defining a function you can get the result in one step:

wdF = With[{p = WeightedData[#2, #1]["EmpiricalPDF"], w = Total @ #}, {1, w} p] &;

wdF[weights, classes]
(* {{1,2,3,5},{2.6,0.2,0.4,0.}} *)

EmpiricalDistribution

ed = EmpiricalDistribution[Rule @@ weightsPerclass];
ed["Domain"]
(* {1,2,3,5} *)

Total[weights] ed["Weights"]
(* {2.6,0.2,0.4,0.} *)

{%%, %}
(* {{1, 2, 3, 5}, {2.6, 0.2, 0.4, 0.}} *)

You can also define a function that returns the desired output in one step:

ClearAll[wedF]
wedF = With[{d = EmpiricalDistribution[# -> #2], w = Total@#}, {d["Domain"], w d["Weights"]}] &;

wedF[weights, classes]
(* {{1, 2, 3, 5}, {2.6, 0.2, 0.4, 0.}} *)

GatherBy

ClearAll[gthrBy];
gthrBy = Transpose[{Last@First@#, Total[First /@ #]} & /@ GatherBy[Transpose@#, Last]] &;

gthrBy@weightsPerclass
(* {{1,2,3,5},{2.6,0.2,0.4,0}} *)

SparseArray

System`SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" ->Total}];
sa = SparseArray[classes -> weights];
System`SetSystemOptions["SparseArrayOptions" -> {"TreatRepeatedEntries" -> First}];
{Flatten[sa["NonzeroPositions"]], sa["NonzeroValues"]}
(* {{1,2,3,5},{2.6,0.2,0.4,0}} *)

See also: Optimizing 2D binning code and Fast 2D binning algorithm

GroupBy - Version 10

Through[{Keys,Values}[GroupBy[Transpose@weightsPerclass,Last -> First, Total]]]
 (* {{1,2,3,5}, {2.6,0.2,0.4,0}} *)
kglr
  • 394,356
  • 18
  • 477
  • 896
  • 1
    You're a madman. – kale Oct 09 '14 at 13:38
  • @kguler, thanks – BetterEnglish Oct 09 '14 at 14:51
  • @kguler Why the empirical PDF does not give directly the best resuls. Why we have to multiply the results by the Total[weights]. Could you explain this point please? – BetterEnglish Oct 09 '14 at 15:42
  • 1
    @Developer2000, PDF values are normalized so that they add up to 1 over the domain; i.e. pdf[i] is totalWeight [i] / totalWeight[j in Domain]. To get the raw weights we need to multiply pdf values by the total weight of all values in the domain. – kglr Oct 09 '14 at 16:12
5

Many ways to do this.

w = {{1, 0.2, .3, .4, .1, .3, .9, 0}, {1, 2, 3, 1, 3, 1, 1, 5}};
class = Union[w[[2]]];
Map[Total, Cases[Transpose[w], {x_, y_} /; y == # :> x] & /@ class];
{class, %}

Mathematica graphics

Nasser
  • 143,286
  • 11
  • 154
  • 359
  • 1
    FYI: There is a performance problem with this approach. The entire list must be rescanned for every element in class which will cause considerable slow-down as the length of that list increases. This is the problem I was referring to in (25591). – Mr.Wizard Oct 09 '14 at 12:50
5

Using Reap and Sow:

Transpose[
 Last@Reap[
   MapThread[Sow, weightsPerclass], _, {#1, Total@#2} &]]
{{1, 2, 3, 5}, {2.6, 0.2, 0.4, 0}}
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
ubpdqn
  • 60,617
  • 3
  • 59
  • 148
5

If you have Association functionality there is a rather nice approach using Merge:

w = {{1, 0.2, .3, .4, .1, .3, .9, 0}, {1, 2, 3, 1, 3, 1, 1, 5}};

Thread[#2 -> # & @@ w] ~Merge~ Total
<|1 -> 2.6, 2 -> 0.2, 3 -> 0.4, 5 -> 0|>

The output is itself an association. It may be desirable to keep the format. If not:

List @@@ Normal[%]\[Transpose]
{{1, 2, 3, 5}, {2.6, 0.2, 0.4, 0}}

If you cannot use associations it is better is to use GatherBy:

{#[[1, 2]], Tr @ #[[All, 1]]} & /@ GatherBy[w\[Transpose], Last]\[Transpose]
{{1, 2, 3, 5}, {2.6, 0.2, 0.4, 0}}

Unlike Nasser's code this does not rescan the list for every unique element in the second part which would result in considerable slow-down as that number increases.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
3

Late to the party, but here's a way using PositionIndex (v10):

With[
 {pi = PositionIndex[weightsPerclass[[2]]]},
 {Keys@pi, Plus @@ (weightsPerclass[[1, #]]) & /@ Values@pi}
 ]

{{1, 2, 3, 5}, {2.6, 0.2, 0.4, 0}}

kale
  • 10,922
  • 1
  • 32
  • 69
2

Step for step:

data = {{1, 0.2, .3, .4, .1, .3, .9, 0}, {1, 2, 3, 1, 3, 1, 1, 5}};

mult = data[[1]] * data[[2]]

{1, 0.4, 0.9, 0.4, 0.3, 0.3, 0.9, 0}

union = Union @ data[[2]]

{1, 2, 3, 5}

pos = Flatten /@ Map[Position[data[[2]], #] &, union]

{{1, 4, 6, 7}, {2}, {3, 5}, {8}}

total = Total /@ Map[mult[[pos[[#]]]] &, Range @ Length @ pos]

{2.6, 0.4, 1.2, 0}

Join[{union, total}]

{{1, 2, 3, 5}, {2.6, 0.4, 1.2, 0}}

eldo
  • 67,911
  • 5
  • 60
  • 168
  • Please see my comment under Nasser's answer. Mapping Position (or the equivalent) should be avoided whenever possible if performance is a concern. – Mr.Wizard Oct 09 '14 at 12:52