4

I have the following small program:

(*Parameters*)
x0 = 4.;
g = 10.;
k = 0.5;
(*Define function*)    
fun[nTot_?IntegerQ, dt_?NumberQ] :=
 (fac = 1. - dt*k/g;
  list = Reap[Fold[Sow[fac*#] &, x0, Range[nTot]]][[2, 1]];
  list2 = ({(# - 1.)*dt, list[[#]]} & /@ Range[Length[list]]);)
(*Parameters*)
nTot = 10^7; dt = 10.^-3;
(*Computefunction*)
fun[nTot, dt]

which takes on my machine about 13 seconds. I want to speed it up and thought about Mathematica's Compile capabilities. Is there any way to Compile the function fun to C? For example here it is done for a Do loop, but I cannot apply this to my code. Any suggestions?

[Edit]

Thank you again for your answer. To illustrate what I mean see the following change of my initial program:

f[nTot_?IntegerQ, nMem_?IntegerQ, 
  dt_?NumberQ] :=
 (fun[t_] = -163.6 E^(-2. t) + 40. E^(-1. t);
  fac1 = 1. - dt*(41.8/0.1 + 1.);
  fac2 = dt^2/0.1;
  listExp = 
   Developer`ToPackedArray[
    fun[#*dt] & /@ Range[0, nMem - 1] // Reverse];
  list = Prepend[
    Reap[Fold[Append[Rest@#, Sow[fac1*Last[#] - fac2*listExp.#]] &, 
       ConstantArray[1., nMem], Range[nTot]]][[2, 1]], 1.];
  list2 = ({(# - 1.)*dt, list[[#]]} & /@ Range[Length[list]]);)
(*Define parameters*)
nTot = 10^4; nMem = 10^5; dt = 10.^-4;
(*Compute function*)
f[nTot, nMem, dt]

which takes 2.3 secs on my machine. And now the (equivalent) compiled version which takes 8.4 secs on my machine:

fun[t_] = -163.6 E^(-2.* t) + 40. E^(-1.* t);
cfun = Compile[{{x0, _Real}, {nTot, _Integer}, {dt, _Real}, {nMem, \
_Integer}}, 
   Block[{}, x = ConstantArray[x0, nMem]; 
    fac1 = 1. - dt*(41.8/0.1 + 1.);
    fac2 = dt^2/0.1; 
    listExp = Reverse[fun[#*dt] & /@ Range[0, nMem - 1]];
    Table[{i*dt, 
      Last[x = Append[Rest@x, fac1*Last[x] - fac2*(listExp.x)]]}, {i, 
      1, nTot}]], CompilationTarget -> "C"];
clist2 = Prepend[cfun[1., 10^4, 10.^-4, 10^5], {0., 1.}];

Why is the compiled version now so slow? Is there a way to speed it up? Or did I do some simple mistakes? Help is highly appreciated again!

[Edit2]:

Ok, a small modification in the compiled version speeds it up to around 3 secs on my machine. But still it is slower compared to the solution with Fold:

fac1 = 1. - dt*(41.8/0.1 + 1.);
fac2 = dt^2/0.1;
fun[t_] = -163.6 E^(-2.* t) + 40. E^(-1.* t);
cfun = Compile[{{x0, _Real}, {nTot, _Integer}, {dt, _Real}, {nMem, \
_Integer}}, Block[{fac2 = fac2, fac1 = fac1, x, listExp},
    x = Table[x0, {i, 1, 10^5}];
    listExp = Reverse[fun[#*dt] & /@ Range[0, nMem - 1]];
    Table[{i*dt, 
      Last[x = Append[Rest@x, fac1*Last[x] - fac2*listExp.x]]}, {i, 1,
       nTot}]], {{listExp, _Real, 1}}, 
   CompilationOptions -> {"InlineExternalDefinitions" -> True, 
     "InlineCompiledFunctions" -> True, 
     "ExpressionOptimization" -> True}, 
   RuntimeAttributes -> {Listable}, Parallelization -> True];
clist2 = Prepend[cfun[1, 10^4, 10^-4, 10^5], {0, 1}];

[Edit3]

Getting rid of Append and using Part instead is a little bit helpful:

dt = 10.^-4;
nMem = 10^5;
nTot = 10^5;
fac1 = 1. - dt*(41.8/0.1 + 1.);
fac2 = dt^2/0.1;
x0 = 1.;
x = Array[x0 &, nTot + nMem];
fun[t_] = -163.6 E^(-2.* t) + 40. E^(-1.* t);
listExp = 
  Developer`ToPackedArray[Reverse[fun[#*dt] & /@ Range[0, nMem - 1]]];
list = Table[{i*dt, 
    x[[i + nMem]] = 
     fac1*x[[i + nMem - 1]] - 
      fac2*listExp.x[[i ;; nMem + i - 1]]}, {i, 1, nTot}];

and a compiled version:

fac1 = 1. - dt*(41.8/0.1 + 1.);
fac2 = dt^2/0.1;
fun[t_] = -163.6 E^(-2.* t) + 40. E^(-1.* t);
cfun = Compile[{{x0, _Real}, {nTot, _Integer}, {dt, _Real}, {nMem, \
_Integer}}, Block[{fac2 = fac2, fac1 = fac1, x, listExp},
    x = Array[x0 &, nTot + nMem];
    listExp = Reverse[fun[#*dt] & /@ Range[0, nMem - 1]];
    Table[{i*dt, 
      x[[i + nMem]] = 
       fac1*x[[i + nMem - 1]] - 
        fac2*listExp.x[[i ;; nMem + i - 1]]}, {i, 1, 
      nTot}]], {{listExp, _Real, 1}}, 
   CompilationOptions -> {"InlineExternalDefinitions" -> True, 
     "InlineCompiledFunctions" -> True, 
     "ExpressionOptimization" -> True}, 
   RuntimeAttributes -> {Listable}, Parallelization -> True];
clist2 = cfun[1, 10^5, 10^-4, 10^5];

[Edit4]

Managed to get rid of the MainEvaluate code by using pure function definition:

Clear["Global`*"]
fac1 = 1. - dt*(41.8/0.1 + 1.);
fac2 = dt^2/0.1;
fun = -163.6 E^(-2.* #) + 40. E^(-1.* #) &;
cfun = Compile[{{x0, _Real}, {nTot, _Integer}, {dt, _Real}, {nMem, \
_Integer}}, Block[{fac2 = fac2, fac1 = fac1, x, listExp},
    x = Array[x0 &, nTot + nMem];
    listExp = Reverse[fun[#*dt] & /@ Range[0, nMem - 1]];
    Table[{i*dt, 
      x[[i + nMem]] = 
       fac1*x[[i + nMem - 1]] - 
        fac2*listExp.x[[i ;; nMem + i - 1]]}, {i, 1, 
      nTot}]], {{listExp, _Real, 1}}, 
   CompilationOptions -> {"InlineExternalDefinitions" -> True, 
     "InlineCompiledFunctions" -> True, 
     "ExpressionOptimization" -> True}];
clist3 = cfun[1., 10^4, 10.^-4.5, 4*10^5];

Any further recommendations how to speed it up?

NeverMind
  • 1,201
  • 7
  • 9
  • 1
    You want to avoid MainEvaluate. See the output of Needs["CompiledFunctionTools`"]; CompilePrint@cfun. – Michael E2 Jan 15 '18 at 14:58
  • Developer`ToPackedArray is basically an expensive no-op inside Compile, because the only arrays Compile deals with are packed arrays. – Michael E2 Jan 15 '18 at 15:00
  • Can you elaborate a little bit more on how to avoid MainEvaluate in my specific case? I inserted these commands but what does the output exactly tells me? – NeverMind Jan 15 '18 at 15:06
  • Ok, so DeveloperToPackedArrayis basically a useless function withinCompile` if I get you right. But why is there an effect in my program? – NeverMind Jan 15 '18 at 15:07
  • Yep, ToPackedArray is useless inside Compile. MainEvaluate is a call out of the compiler (actually WVM) back to the main kernel to evaluate something the compiler cannot do. For more explanation, explore some of the Q&A in the link. – Michael E2 Jan 15 '18 at 15:17
  • For some reason ConstantArray is not compilable. Replace ConstantArray[x0, nMem] by Table[x0,{i,1,nMem}]. 2. Don't use Append in time critical loops. It involves a copy operation which is very slow. Try to build a suitable array with zeros first and then fill it with entries in the loop. If you cannot predict the length of the output, you may have a look at [Internal`Bag](https://mathematica.stackexchange.com/questions/845/internalbag-inside-compile).
  • – Henrik Schumacher Jan 15 '18 at 15:19
  • @Micheal E2: I read about pre-defining constants and use them as a=a in the With[] environment. But how can I do this with a vector like x or listExp? I don't see another way to avoid MainEvaluate. @Henrik Schumacher: Your first suggestion doesn't seem to have a huge effect. Your second tip is worth a try, but I think I will end up with Part then, and I don't know whether it is really faster to keep the very long list. I actually can predict the length of the output list since it shall be pre-defined by nTot. – NeverMind Jan 15 '18 at 15:37
  • External constants can be injected with With or passed as arguments. For variables like x, localize them inside Block or Module (e.g. Block[{x, listExp}, <code>], as Henrik did in his answer). – Michael E2 Jan 15 '18 at 15:49
  • Alright, there is one MainEvaluate left, related to R8 = MainEvaluate[ Hold[fun][ R6]]. How to treat this function fun[t]. Cannot be localized within Block, can it? – NeverMind Jan 15 '18 at 16:21
  • @Henrik Schumacher: In my [Edit3] I got rid of Append and used indexing of a pre-defined list instead. I used Array[] here to create this list since according to Compile\CompilerFunctions[]` it is a compilable function. Still, I think one can improve the performance. If you any further idea let me please know. – NeverMind Jan 15 '18 at 18:53
  • @Michael E2: Managed now to get rid of the MainEvaluate by using a pure function definition. See my [Edit4] in the OP. However, it still is very slow unfortunately. So far your tips and the ones from Henrik Schumacher have been very helpful, so I wonder whether there is more I can do to improve the overall performance of the code. – NeverMind Jan 15 '18 at 23:47