I have the following small program:
(*Parameters*)
x0 = 4.;
g = 10.;
k = 0.5;
(*Define function*)
fun[nTot_?IntegerQ, dt_?NumberQ] :=
(fac = 1. - dt*k/g;
list = Reap[Fold[Sow[fac*#] &, x0, Range[nTot]]][[2, 1]];
list2 = ({(# - 1.)*dt, list[[#]]} & /@ Range[Length[list]]);)
(*Parameters*)
nTot = 10^7; dt = 10.^-3;
(*Computefunction*)
fun[nTot, dt]
which takes on my machine about 13 seconds. I want to speed it up and thought about Mathematica's Compile capabilities. Is there any way to Compile the function fun to C? For example here it is done for a Do loop, but I cannot apply this to my code. Any suggestions?
[Edit]
Thank you again for your answer. To illustrate what I mean see the following change of my initial program:
f[nTot_?IntegerQ, nMem_?IntegerQ,
dt_?NumberQ] :=
(fun[t_] = -163.6 E^(-2. t) + 40. E^(-1. t);
fac1 = 1. - dt*(41.8/0.1 + 1.);
fac2 = dt^2/0.1;
listExp =
Developer`ToPackedArray[
fun[#*dt] & /@ Range[0, nMem - 1] // Reverse];
list = Prepend[
Reap[Fold[Append[Rest@#, Sow[fac1*Last[#] - fac2*listExp.#]] &,
ConstantArray[1., nMem], Range[nTot]]][[2, 1]], 1.];
list2 = ({(# - 1.)*dt, list[[#]]} & /@ Range[Length[list]]);)
(*Define parameters*)
nTot = 10^4; nMem = 10^5; dt = 10.^-4;
(*Compute function*)
f[nTot, nMem, dt]
which takes 2.3 secs on my machine. And now the (equivalent) compiled version which takes 8.4 secs on my machine:
fun[t_] = -163.6 E^(-2.* t) + 40. E^(-1.* t);
cfun = Compile[{{x0, _Real}, {nTot, _Integer}, {dt, _Real}, {nMem, \
_Integer}},
Block[{}, x = ConstantArray[x0, nMem];
fac1 = 1. - dt*(41.8/0.1 + 1.);
fac2 = dt^2/0.1;
listExp = Reverse[fun[#*dt] & /@ Range[0, nMem - 1]];
Table[{i*dt,
Last[x = Append[Rest@x, fac1*Last[x] - fac2*(listExp.x)]]}, {i,
1, nTot}]], CompilationTarget -> "C"];
clist2 = Prepend[cfun[1., 10^4, 10.^-4, 10^5], {0., 1.}];
Why is the compiled version now so slow? Is there a way to speed it up? Or did I do some simple mistakes? Help is highly appreciated again!
[Edit2]:
Ok, a small modification in the compiled version speeds it up to around 3 secs on my machine. But still it is slower compared to the solution with Fold:
fac1 = 1. - dt*(41.8/0.1 + 1.);
fac2 = dt^2/0.1;
fun[t_] = -163.6 E^(-2.* t) + 40. E^(-1.* t);
cfun = Compile[{{x0, _Real}, {nTot, _Integer}, {dt, _Real}, {nMem, \
_Integer}}, Block[{fac2 = fac2, fac1 = fac1, x, listExp},
x = Table[x0, {i, 1, 10^5}];
listExp = Reverse[fun[#*dt] & /@ Range[0, nMem - 1]];
Table[{i*dt,
Last[x = Append[Rest@x, fac1*Last[x] - fac2*listExp.x]]}, {i, 1,
nTot}]], {{listExp, _Real, 1}},
CompilationOptions -> {"InlineExternalDefinitions" -> True,
"InlineCompiledFunctions" -> True,
"ExpressionOptimization" -> True},
RuntimeAttributes -> {Listable}, Parallelization -> True];
clist2 = Prepend[cfun[1, 10^4, 10^-4, 10^5], {0, 1}];
[Edit3]
Getting rid of Append and using Part instead is a little bit helpful:
dt = 10.^-4;
nMem = 10^5;
nTot = 10^5;
fac1 = 1. - dt*(41.8/0.1 + 1.);
fac2 = dt^2/0.1;
x0 = 1.;
x = Array[x0 &, nTot + nMem];
fun[t_] = -163.6 E^(-2.* t) + 40. E^(-1.* t);
listExp =
Developer`ToPackedArray[Reverse[fun[#*dt] & /@ Range[0, nMem - 1]]];
list = Table[{i*dt,
x[[i + nMem]] =
fac1*x[[i + nMem - 1]] -
fac2*listExp.x[[i ;; nMem + i - 1]]}, {i, 1, nTot}];
and a compiled version:
fac1 = 1. - dt*(41.8/0.1 + 1.);
fac2 = dt^2/0.1;
fun[t_] = -163.6 E^(-2.* t) + 40. E^(-1.* t);
cfun = Compile[{{x0, _Real}, {nTot, _Integer}, {dt, _Real}, {nMem, \
_Integer}}, Block[{fac2 = fac2, fac1 = fac1, x, listExp},
x = Array[x0 &, nTot + nMem];
listExp = Reverse[fun[#*dt] & /@ Range[0, nMem - 1]];
Table[{i*dt,
x[[i + nMem]] =
fac1*x[[i + nMem - 1]] -
fac2*listExp.x[[i ;; nMem + i - 1]]}, {i, 1,
nTot}]], {{listExp, _Real, 1}},
CompilationOptions -> {"InlineExternalDefinitions" -> True,
"InlineCompiledFunctions" -> True,
"ExpressionOptimization" -> True},
RuntimeAttributes -> {Listable}, Parallelization -> True];
clist2 = cfun[1, 10^5, 10^-4, 10^5];
[Edit4]
Managed to get rid of the MainEvaluate code by using pure function definition:
Clear["Global`*"]
fac1 = 1. - dt*(41.8/0.1 + 1.);
fac2 = dt^2/0.1;
fun = -163.6 E^(-2.* #) + 40. E^(-1.* #) &;
cfun = Compile[{{x0, _Real}, {nTot, _Integer}, {dt, _Real}, {nMem, \
_Integer}}, Block[{fac2 = fac2, fac1 = fac1, x, listExp},
x = Array[x0 &, nTot + nMem];
listExp = Reverse[fun[#*dt] & /@ Range[0, nMem - 1]];
Table[{i*dt,
x[[i + nMem]] =
fac1*x[[i + nMem - 1]] -
fac2*listExp.x[[i ;; nMem + i - 1]]}, {i, 1,
nTot}]], {{listExp, _Real, 1}},
CompilationOptions -> {"InlineExternalDefinitions" -> True,
"InlineCompiledFunctions" -> True,
"ExpressionOptimization" -> True}];
clist3 = cfun[1., 10^4, 10.^-4.5, 4*10^5];
Any further recommendations how to speed it up?
MainEvaluate. See the output ofNeeds["CompiledFunctionTools`"]; CompilePrint@cfun. – Michael E2 Jan 15 '18 at 14:58Developer`ToPackedArrayis basically an expensive no-op insideCompile, because the only arraysCompiledeals with are packed arrays. – Michael E2 Jan 15 '18 at 15:00MainEvaluatein my specific case? I inserted these commands but what does the output exactly tells me? – NeverMind Jan 15 '18 at 15:06DeveloperToPackedArrayis basically a useless function withinCompile` if I get you right. But why is there an effect in my program? – NeverMind Jan 15 '18 at 15:07ToPackedArrayis useless insideCompile.MainEvaluateis a call out of the compiler (actually WVM) back to the main kernel to evaluate something the compiler cannot do. For more explanation, explore some of the Q&A in the link. – Michael E2 Jan 15 '18 at 15:17ConstantArrayis not compilable. ReplaceConstantArray[x0, nMem]byTable[x0,{i,1,nMem}]. 2. Don't useAppendin time critical loops. It involves a copy operation which is very slow. Try to build a suitable array with zeros first and then fill it with entries in the loop. If you cannot predict the length of the output, you may have a look at [Internal`Bag](https://mathematica.stackexchange.com/questions/845/internalbag-inside-compile).a=ain theWith[]environment. But how can I do this with a vector likexorlistExp? I don't see another way to avoidMainEvaluate. @Henrik Schumacher: Your first suggestion doesn't seem to have a huge effect. Your second tip is worth a try, but I think I will end up withPartthen, and I don't know whether it is really faster to keep the very long list. I actually can predict the length of the output list since it shall be pre-defined bynTot. – NeverMind Jan 15 '18 at 15:37Withor passed as arguments. For variables likex, localize them insideBlockorModule(e.g.Block[{x, listExp}, <code>], as Henrik did in his answer). – Michael E2 Jan 15 '18 at 15:49MainEvaluateleft, related toR8 = MainEvaluate[ Hold[fun][ R6]]. How to treat this functionfun[t]. Cannot be localized withinBlock, can it? – NeverMind Jan 15 '18 at 16:21Appendand used indexing of a pre-defined list instead. I usedArray[]here to create this list since according toCompile\CompilerFunctions[]` it is a compilable function. Still, I think one can improve the performance. If you any further idea let me please know. – NeverMind Jan 15 '18 at 18:53MainEvaluateby using a pure function definition. See my [Edit4] in the OP. However, it still is very slow unfortunately. So far your tips and the ones from Henrik Schumacher have been very helpful, so I wonder whether there is more I can do to improve the overall performance of the code. – NeverMind Jan 15 '18 at 23:47