4

I've noticed that, while ListConvolve is very fast, it becomes much slower when you use anything but the standard Times,Plus as your functions (cf. ListConvolve documentation to see this in action; it looks like ListConvolve[ker,list,klist,padding,g,h]).

As an example, I predefine a 500x500 array and call it array500. I then define two functions, LC and LC2, as follows:

LC[ar_] := 
  ar ListConvolve[-{{1, 1, 1}, {1, 0, 1}, {1, 1, 1}}, ar, {2, -2}, 0, 
    Times, Plus];

LC2[ar_] :=
  ar ListConvolve[-{{1, 1, 1}, {1, 0, 1}, {1, 1, 1}}, ar, {2, -2}, 0, 
    Times, List];

(n.b. that the LC[ar] is also == ar ListConvolve[-{{1, 1, 1}, {1, 0, 1}, {1, 1, 1}}, ar, {2, -2}, 0];, the default)

When I run LC on array500, it's understandably very speedy: RepeatedTiming@LC@array500 returns a mere .16s. However, RepeatedTiming@LC2@array500 takes significantly longer at .974s. It's a 61x slowdown! I'm confused as to why it's so different, especially because it never even has to add the elements together. If I change the option from List to Times, it still takes about the same (long) time.

Is anybody able to clarify why this huge speed difference exists, and whether there's any way to work around it?

EDIT: I realized that I'm doing n products for an n x n array with LC and 9n with LC2 due to my multiplication out front by ar, but this doesn't seem to matter much (as I expected it wouldn't, but I wanted to check!)—removing this, there's still ~60-70x slowdown by using Times,List over Times,Plus.

corey979
  • 23,947
  • 7
  • 58
  • 101
Ben Kalziqi
  • 1,082
  • 8
  • 17
  • 1
    Unless there has been a recent upgrade there is not much you can do. ListConvolve and ListCorrelate could be used in some creative ways if the generalised versions were anywhere near as fast as the default but its never been the case unfortunately. – Mike Honeychurch Feb 08 '16 at 22:29
  • @MikeHoneychurch that's a real shame! I figured something like that was going on, but that I might as well check just in case. – Ben Kalziqi Feb 08 '16 at 22:47

1 Answers1

6

(1) Convolution with Plus and Times can be done via FFT.

(2) The overall speed complexity cannot be less than the size-of-result complexity.

Point (1) might help to explain why the standard ListConvolve is fast under most circumstances.

Point (2) on the examples in this post should help to explain why LC2 is likely to be slow. To make this clear we can check speed and result size on some examples.

LC[ar_] := 
 ListConvolve[-{{1, 1, 1}, {1, 0, 1}, {1, 1, 1}}, ar, {2, -2}, 0]
LC2[ar_] := 
 ListConvolve[-{{1, 1, 1}, {1, 0, 1}, {1, 1, 1}}, ar, {2, -2}, 0, 
  Times, List]

Here is our baseline test.

n = 500;
mat = RandomReal[1, {n, n}];
Timing[ByteCount[LC[mat]]]
Timing[ByteCount[LC2[mat]]]

(* Out[50]= {0.011331, 2000152}

Out[51]= {0.532487, 102056104} *)

Notice that the second is around 50 times larger (and 50 times slower) than the first. Now we double the example dimension.

n = 1000;
mat = RandomReal[1, {n, n}];
Timing[ByteCount[LC[mat]]]
Timing[ByteCount[LC2[mat]]]

(* Out[54]= {0.033061, 8000152}

Out[55]= {2.091857, 408208200} *

Notice the sizes and timings go up both by a factor of 4 or so. No surprise there I think. We'll double the dimension again.

n = 2000;
mat = RandomReal[1, {n, n}];
Timing[ByteCount[LC[mat]]]
Timing[ByteCount[LC2[mat]]]

(* Out[72]= {0.170051, 32000152}

Out[73]= {8.448733, 1632800392} *)

Again timings and sizes went up pretty much uniformly by a factor of 4.

What this means is that point (2) above is the one at play. The speeds are more or less linear in both cases since doubling input dimension means quadrupling output size and that is the same as what happens to the timings. Again, not much of a surprise because the convolution kernel is of fixed small size.

Daniel Lichtblau
  • 58,970
  • 2
  • 101
  • 199
  • 1
    What are the chances of getting the generalised versions of ListConvolve and ListCorrelate matching the speed of the default? Can this be added to the development pipeline? – Mike Honeychurch Feb 08 '16 at 23:41
  • I'm probably missing something, but if this is the case, why is Times,Times roughly the same factor slower when compared to Times,Plus?

    In any case, the way the input is passed to the second general function definitely doesn't make it easy to even try to avoid this problem if this is the real problem—at least not at my current proficiency level!

    – Ben Kalziqi Feb 08 '16 at 23:46
  • 1
    @MikeHoneychurch2 How might that be done? – Daniel Lichtblau Feb 08 '16 at 23:49
  • Right, there is more to it than the output size. I will surmise some use is made of fastDot code in the default (fast) case, whereas others cases in effect create lists and apply operators to those after the fact. – Daniel Lichtblau Feb 08 '16 at 23:54
  • 2
    "How might that be done?" -- you mean how can it get added to the development pipeline? While I acknowledge it is not as useful as being able to tweet programs I'm sure someone there might accept that it warrants investgation – Mike Honeychurch Feb 09 '16 at 01:58
  • 1
    @MikeHoneychurch2 No, I mean in terms of coding. I have no idea how to attain such an improvement algorithmically, at least not in any general way. For some special cases where the operations are restricted to List and arithmetic, maybe some improvement could be made. Though I doubt it would compete with the speed of MKL code, as is used for the default case. – Daniel Lichtblau Feb 09 '16 at 02:24
  • 1
    In the first instance I'd work on e.g. ListCorrelate[{x, y, z}, {1, 2, 3, 4, 5}, {-1, 1}, {}, f, g] maybe where f and g are listable functions and where the kernel and list are always lists. But I, like presumably others around here, are not privy to the underlying code. BTW are you the developer for these two functions? – Mike Honeychurch Feb 09 '16 at 03:11
  • 1
    @MikeHoneychurch2I am not the developer although I have done some work on special cases, mostly involving integer inputs (and the usual convolution). After looking some more at where the non-default-heads case goes, I will revise my pessimism somewhat and speculate that possibly the bottleneck has to do with unpacking rather than asymptotically faster underlying methods. For a proof of concept you might try to code a handler using Compile, wherein the dimensions and nondefault heads are fixed e.g. to do LC2 above but maybe with default padding. – Daniel Lichtblau Feb 09 '16 at 18:15
  • @DanielLichtblau I'll give that a shot, or at least give a shot at giving it a shot. By the way, do you have any resources where I might read more about packed/unpacked arrays? I see that terminology used very very frequently here, but I don't know anything about it! – Ben Kalziqi Feb 09 '16 at 21:42