9

I want to find the n largest values in a matrix and replace all others with zero.

The solution I found uses ReplaceAll and becomes very slow as the size of matrices grows:

FindLargestValues[m_?MatrixQ, n_Integer] :=
  With[{v = (Union @ Flatten @ m)[[-n]]},
    m /. x_Real /; x < v :> 0]

Example:

(small = RandomReal[{1, 10}, {5, 5}]) // MatrixForm

enter image description here

FindLargestValues[small, 10] // MatrixForm

enter image description here

Timing example:

large = RandomReal[{1, 10}, {50, 50}];

Do[FindLargestValues[large, 50], {1000}]; // Timing // First

2.574016

Is there a faster way to do this?

eldo
  • 67,911
  • 5
  • 60
  • 168
  • Somewhat related: http://mathematica.stackexchange.com/q/41478/131 and http://mathematica.stackexchange.com/q/39113/131 – Yves Klett Sep 09 '14 at 14:18

4 Answers4

11

Assuming that the values of your matrix are all distinct, or that you don't count repetitions in n, you can do this:

ClearAll[largest];
largest[mat_, n_] := Clip[mat,{RankedMax[#, n], Max[#]}, {0, 0}] &[Flatten@mat]

So that

large = RandomReal[{1, 10}, {50, 50}];
Do[largest[large, 50], {1000}]; // Timing // First

(* 0.076633 *)
Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
9

I can't test the timing right now, but maybe it's worth mentioning

Threshold[large, {"LargestValues", 50}]
Simon Woods
  • 84,945
  • 8
  • 175
  • 324
4

Here is a relative minor variation on eldo's method that speeds it up considerably, but still falls well short of Lenonid Shifrin's better algorithm.

 keepMax[matrix_, n_] :=
   With[{threshold = (Union@Flatten@matrix)[[-n]]},
     Map[If[# < threshold, 0, #] &, matrix, {2}]]

Absolute timings

SeedRandom[42]; testData = RandomInteger[{1, 99}, {50, 50}];
eldo = (Do[FindLargestValues[testData, 50], {1000}]; // Timing // First)
1.218814
shif = (Do[largest[testData, 50], {1000}]; // Timing // First)
0.088629
mg = (Do[keepMax[testData, 50], {1000}]; // Timing // First)
0.446820

Comparative timings

{eldo/mg, mg/shif, eldo/shif}a 
{2.72775, 5.0415, 13.752}

Further, it would seem that my computer is faster than eldo's but a little slower than Shifrin's.

m_goldberg
  • 107,779
  • 16
  • 103
  • 257
1

I wondered if compiling was worth a shot...

You can go a little faster than @SimonWoods' answer of Threshold[large, {"LargestValues", 50}].

large = RandomReal[{1, 10}, {50, 50}];

FindLargestValues = Compile[{{matrix, _Real, 2}, {n, _Integer}},
   Module[{minvalue},
    minvalue = (Union@Flatten@matrix)[[-n]];
    Chop[matrix, minvalue]
    ],
   CompilationTarget -> "C"
   ];

Do[FindLargestValues[large, 50], {1000}]; // AbsoluteTiming // First
(* 0.151 seconds *)

Do[Threshold[large, {"LargestValues", 50}], {1000}]; // 
  AbsoluteTiming // First
(* 0.270 seconds *)

But still @LeonidShifrin's answer wins out (and contains functions that cannot be compiled anyway).

largest[mat_, n_] := 
 Clip[mat, {RankedMax[#, n], Max[#]}, {0, 0}] &[Flatten@mat]

Do[largest[large, 50], {1000}]; // AbsoluteTiming // First
(* 0.044 seconds *)
dr.blochwave
  • 8,768
  • 3
  • 42
  • 76