8

I have two lists, one containing values, the other indices. Now I want to accumulate the values that have the same corresponding index. So for example:

values  = {2, 6, 3, 8, 3, 1, 3, 7, 1, 3, 5}

indices = {1, 3, 1, 2, 3, 1, 1, 2, 3, 2, 1}

should give

result = {2 + 3 + 1 + 3 + 5, 8 + 7 + 3, 6 + 3 + 1}

I need to do this for very large lists, so it should be efficient.

Any ideas?

m_goldberg
  • 107,779
  • 16
  • 103
  • 257
Beginner
  • 443
  • 2
  • 6
  • 1
    GatherBy[Transpose[{values, indices}], Last][[All, All, 1]] – user1066 Feb 20 '17 at 14:11
  • 2
    What's "very large lists"? Thousands of elements? Millions of elements? Billions? And how many distinct indices would be expected? 50% of elements? 10%? 1%? There will be very different ways of doing this depending on such things. – ciao Feb 20 '17 at 14:20

7 Answers7

7

Pick is usually fast, and parallel processing may help, depending on your computer.

ParallelTable[Total[Pick[values, indices, k]], {k, Union[indices]}]
KennyColnago
  • 15,209
  • 26
  • 62
6

Another possibility which is certainly quick for large sets:

GroupBy[Transpose[{values, indices}], Last -> First, Total]

This returns an association which can be converted back to a list ordered by index for no overhead with the frustratingly verbose

Normal@*SparseArray@*Normal@GroupBy[...]
Quantum_Oli
  • 7,964
  • 2
  • 21
  • 43
  • 1
    +1 Values@GroupBy[...] will also do the trick. – WReach Feb 20 '17 at 16:57
  • It will, but it won't guarantee that the values will be in the order dictated by indices. SparseArray is useful as it takes care of that. – Quantum_Oli Feb 20 '17 at 17:06
  • Yes, you are right... the implicit ordering of various association-related operations is unreliable as it has changed over the past few releases. Values@KeySort@GroupBy[...] is another possibility. – WReach Feb 20 '17 at 17:15
5

Possible duplicate of How to efficiently find positions of duplicates? or Gather list elements by labels

e.g.

positionDuplicates[list_] := GatherBy[Range@Length[list], list[[#]] &]

values[[#]] & /@ positionDuplicates[indices]

Total[%, {2}]
{{2, 3, 1, 3, 5}, {6, 3, 1}, {8, 7, 3}}

{14, 10, 18}

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
3

A Reap/Sow variant:

Reap[MapThread[Sow[#1, #2] &, {values, indices}], _, {#2, Total@#2} &][[-1]]

yields:

{{{2, 3, 1, 3, 5}, 14}, {{6, 3, 1}, 10}, {{8, 7, 3}, 18}}
ubpdqn
  • 60,617
  • 3
  • 59
  • 148
2
values = {2, 6, 3, 8, 3, 1, 3, 7, 1, 3, 5};

indices = {1, 3, 1, 2, 3, 1, 1, 2, 3, 2, 1};

Requested result

result = {2 + 3 + 1 + 3 + 5, 8 + 7 + 3, 6 + 3 + 1}

{14, 18, 10}

Using Merge

KeySort @ Merge[Total] @ Thread[indices -> values]

<|1 -> 14, 2 -> 18, 3 -> 10|>

Values[%]

{14, 18, 10}

eldo
  • 67,911
  • 5
  • 60
  • 168
2
groupMap = Extract[#, Map[List]@Values@PositionIndex@#2, #3] &;

Examples:

values = {2, 6, 3, 8, 3, 1, 3, 7, 1, 3, 5};

indices = {1, 3, 1, 2, 3, 1, 1, 2, 3, 2, 1};

groupMap[values, indices, Total]

{14, 10, 18}
groupMap[Array[x, Length@indices], indices, Apply[Times]]

enter image description here

kglr
  • 394,356
  • 18
  • 477
  • 896
1

Using SplitBy:

values = {2, 6, 3, 8, 3, 1, 3, 7, 1, 3, 5};
indices = {1, 3, 1, 2, 3, 1, 1, 2, 3, 2, 1};

Total /@ Values@SplitBy[Sort@Thread[indices -> values], First]

({14, 18, 10})

E. Chan-López
  • 23,117
  • 3
  • 21
  • 44