In order to tune the performance of noticeably slow code of mine, I had a look at many different excellent posts on this topic here on M.SE, e.g. here to name the most interesting one.
Now I ask myself if I understood everything correctly or (perhaps) if there is even more tuning potential (for now and for future use cases). As the Latin proverb says "Verba docent, exempla trahunt" (= words teach but only examples really help) I am going to both describe what I learned and did according to that and to copy the optimized code here. Note, that sol is the solution of NDSolve with a vector of interpolation functions, W is a list of indices of the form i or {i,j,k} that are paired to be used inside ExpVal (for mathematical reasons {i, {k,l,m}} = {{k,l,m}, i} holds inside ExpVal so we can calculate only one half of the pairings).
I greatly acknowledge hints regarding further optimization potential or criticism regarding the code.
Optimization ideas
Vectorization: Try to use machine numbers as early as possible - I tried to do so by creating a vector of pure numbers inside n0[t] and work with it afterwards. Moreover, I have to pair each entry of h with each entry of h's conjugate so I used Abs for pairings of the same indices in h as well as hAdj.
Memoisation: Trade memory for performance - I tried to do so by including memoisation into the calculation of ExpVal.
Compilation: As the calculation of ExpVal happens extremely often, this part of the code is a major bottleneck. I compiled it to gain further performance.
Scoping constructs: With is faster as it doesn't create new local variables but only replaces given occurences so I used it instead of Module.
Detailed code description
On request, I will describe the code in more detail here. What the code does is to numerically simulate nonequilibrium physics via iterated equations of motion. Assume we have an operator $A(t) = \sum_i h_i(t) W_i$ whose time-dependence is completely mapped to the prefactors $h_i(t)$. Those prefactors are calculated via NDSolve whose solution is saved to sol. The $W_i$ are (quantum mechanical) basis operators consisting of one (i) or three elementary fermion operators ({k,l,m}). The typical BasisSize equals about 5000 different operators $W_i$.
The code has to calculate each possibility to pair two of the $W$ (say $W_i$ and $W_j$) and calculate their expectation value ExpVal. Those results are multiplied by the individual prefactors ($h_i$ and $h_j^*$) and summed up for the final result.
ev = Compile[{{i, _Integer}, {j, _Integer}, {kF, _Real}}, 2*If[i == j, kF/\[Pi], 1/\[Pi]*Sin[kF (i - j)]/(i - j)], RuntimeAttributes -> {Listable}, Parallelization -> True, RuntimeOptions -> "Speed", CompilationTarget -> "C"];
ExpVal[i_, j_, kF_] := ExpVal[i, j, kF] = ev[i, j, kF];
ExpVal[i_, b_List, kF_] := With[{m = b[[1]], n = b[[2]], o = b[[3]]}, 2 ev[i, m, kF] (2 ev[o, n, kF] - If[o == n, 1, 0])];
ExpVal[a_List, b_List, kF_] := With[{i = a[[1]], j = a[[2]], k = a[[3]], m = b[[1]], n = b[[2]], o = b[[3]]}, 8 ev[i, m, kF] (ev[j, k, kF] ev[o, n, kF] + ev[j, n, kF] (If[o == k, 1, 0] - ev[o, k, kF]))- If[j == k, 4 ev[i, m, kF] ev[o, n, kF], 0] - If[n == o, 4 ev[i, m, kF] ev[j, k, kF], 0] + If[j == k && n == o, 2 ev[i, m, kF], 0]];
hi[x_] = h[x] /. sol;
expAbs = Table[ExpVal[W[[i]], W[[i]], kF], {i, BasisSize}];
expPairs = Table[ExpVal[W[[i]], W[[j]], kF], {i, BasisSize - 1}, {j, i + 1, BasisSize}];
n0[t_] := Module[{x = t, h, hAdj, hAbs, hPairs, \[Mu], \[Xi]},
h = Flatten[hi[x]];
hAdj = Conjugate[h];
hAbs = Abs[h]^2;
hPairs = Table[h[[i]]*hAdj[[j]], {i, BasisSize - 1}, {j, i + 1, BasisSize}];
\[Mu] = Plus @@ Flatten[hAbs*expAbs];
\[Xi] = Plus @@ Flatten[hPairs*expPairs];
Re[.5 (\[Mu] + \[Xi] + Conjugate[\[Xi]])]];
sol,Wand some values forkFandBasisSize. But first question: can you explain whyW[[i]]is sometimes one and sometimes three indices? This will preventWfrom being a packed array. If this can be modified,ExpValcould probably be made completely vectorized, i.e. we could doExpVal[W,W,kF]. Finally, can you explain in words what the code is doing? – Marius Ladegård Meyer Dec 14 '16 at 19:23kFto be $\pi/2$ but it doesn't matter, in principle. The rest of your questions is answered in the initial question under Detailed code description now. The fact that sometimes one and sometimes three indices participate is due to the commutator in the Heisenberg equations of motion and can't be change physically seen (but maybe in the code there is a possibility I don't see). – pbx Dec 14 '16 at 21:29Lengthsin order to vectorize everything. But I am not sure if this would help gaining performance. – pbx Dec 15 '16 at 08:51