7

I have a module that I need to call 1-10 million times in my program. Currently, it is taking several hours to run so I am hoping that I can cut down some runtime with your help.

r = RandomReal[NormalDistribution[0., 1./2.], 6];

es = Eigensystem[H0[ω0, r[[1]], r[[2]], r[[3]], r[[4]], r[[5]], r[[6]] ];

ε = es[[1]];
v = es[[2]];
vS = Conjugate[v];

(*elements of v and vS are called later; v[[1]], vS[[1]] etc...*)

H0 is a compiled function which sped things up a little. It looks like this:

enter image description here.

In copy-paste form,

 H0={{0, (ωz1 - ωz2)/
 2, (-ωx1 + ωx2)/(2 Sqrt[2]) - (I ωy1)/(
 2 Sqrt[2]) + (I ωy2)/(
 2 Sqrt[2]), (ωx1 - ωx2)/(2 Sqrt[2]) - (
 I ωy1)/(2 Sqrt[2]) + (I ωy2)/(
 2 Sqrt[2])}, {(ωz1 - ωz2)/2, 
 0, (ωx1 + ωx2)/(2 Sqrt[2]) + (I ωy1)/(
 2 Sqrt[2]) + (I ωy2)/(
 2 Sqrt[2]), (ωx1 + ωx2)/(2 Sqrt[2]) - (
 I ωy1)/(2 Sqrt[2]) - (I ωy2)/(
 2 Sqrt[2])}, {(-ωx1 + ωx2)/(2 Sqrt[2]) + (
 I ωy1)/(2 Sqrt[2]) - (I ωy2)/(
 2 Sqrt[2]), (ωx1 + ωx2)/(2 Sqrt[2]) - (
 I ωy1)/(2 Sqrt[2]) - (I ωy2)/(
 2 Sqrt[2]), ω0 + (ωz1 + ωz2)/2, 
0}, {(ωx1 - ωx2)/(2 Sqrt[2]) + (I ωy1)/(
 2 Sqrt[2]) - (I ωy2)/(
 2 Sqrt[2]), (ωx1 + ωx2)/(2 Sqrt[2]) + (
 I ωy1)/(2 Sqrt[2]) + (I ωy2)/(2 Sqrt[2]), 
0, -ω0 + 1/2 (-ωz1 - ωz2)}}

Is there anything else that can be optimized here?

L.K.
  • 683
  • 1
  • 7
  • 17
BeauGeste
  • 2,815
  • 2
  • 29
  • 32
  • 2
    Without you saying more on H0[], there's really nothing else to say, except that the computation of an eigensystem can be simplified a bit, if the matrix given to it has some structure (e.g. symmetry, sparseness). – J. M.'s missing motivation Oct 22 '12 at 02:24
  • Just to check - is whatever loop you have running 10 mil times parallellized? – VF1 Oct 22 '12 at 02:43
  • @J. M. H0[] is 4x4 Hermitian. Four elements of the matrix are 0. – BeauGeste Oct 22 '12 at 03:06
  • VF1 Well, the loop runs millions of times because I'm doing an average. I run the loop for several different parameters. I use ParallelTable for the computation of the loop for each of the parameters. The actual loop (e.g. Sum) is not in parallel though. It's a choice between using ParallelSum or ParallelTable. Though I'm not sure which one will be faster in this particular case, in prior instances ParallelTable has done a better job for me. – BeauGeste Oct 22 '12 at 03:10
  • 3
    "Four elements of the matrix are 0." - which ones? You'll have to be more explicit than that for answerers to get anywhere... – J. M.'s missing motivation Oct 22 '12 at 03:53
  • J. M. I have added an image of the matrix. – BeauGeste Oct 22 '12 at 04:28
  • 2
    @BeauGeste Are you calculating the eigensystem for each choice of random parameters (rather than doing it once symbolically and then substitute specific values) ? – b.gates.you.know.what Oct 22 '12 at 07:43
  • Exactly! Solve the system symbolically (if possible), compile the resulting equations and this will be as quick as possible. Include copy-pastable code of the matrix will perhaps prompt somebody to post this as an answer. – Ajasja Oct 22 '12 at 14:46
  • b.gatessucks Ajasja I have not been doing it symbolically. The eigenvectors are huge. Plus they need to be normalized which makes the expressions even larger. – BeauGeste Oct 23 '12 at 00:11
  • I compared using symbolic compiled eigenvectors (non-normalized) versus the numerical approach. The numerical approach was identical or slightly faster. – BeauGeste Oct 23 '12 at 00:51
  • The title of this question is not really helpful. Would you mind choosing a more informative one (e.g. mentioning eigensystems) now the problem is clearer? – Yves Klett Oct 23 '12 at 06:48

3 Answers3

18

This is too long for a comment and honestly, to give a real answer, there is more information required in your question. Isn't it possible, that you give a working example, so that we see what takes long and how you implemented it?

If you are calling Eigensystem for many different input values which are know, there is still some place for speed-up. Since your expressions are very lengthy, please find the initialization in an extra section.

First we measure how long it takes to calculate the Eigenvectors of H0 for 1 Million random values

data = RandomReal[{-1, 1}, {1000000, 7}];
First@AbsoluteTiming[Eigenvectors[H0[#]] & /@ data]

This took 44.4 sec here. The next thing you can try is to distribute H0 over parallel kernels and use ParallelMap

DistributeDefinitions[H0];
First@AbsoluteTiming[ParallelMap[Eigenvectors[H0[#]] &, data]]

This took 25.3 sec with 4 subkernels. Let's test the compiled code. First when we apply it non-parallel

First@AbsoluteTiming[evectors @@@ data]  

This took 2.4 sec which is almost 20 times faster then the initial version. Let's see what we can get if we call it parallel

First@AbsoluteTiming[evectors @@ Transpose[data]]

This took only 0.24 sec. If this scales well, that it means I can run 10 million samples in about 2.5 seconds. An indeed, a test with $10^7$ runs required 2.75 sec.

Now you might ask, whaat??, why is evectors @@@ data a serial call while evectors @@ Transpose[data] is parallel? It's because of the Listable attribute in the Compile-call and since we turn on Parallelization. Sure is

evectors @@@ data == evectors @@ Transpose[data]

(* Out[21]= True *)

Initialization

Compiled parallel "C" versions of Eigenvalues and Eigenvectors

{evalues, evectors} = 
  Compile[{{ω0, _Complex, 0}, {ωx1, _Complex, 
       0}, {ωx2, _Complex, 0}, {ωy1, _Complex, 
       0}, {ωy2, _Complex, 0}, {ωz1, _Complex, 
       0}, {ωz2, _Complex, 0}}, #, Parallelization -> True, 
     CompilationTarget -> "C", RuntimeAttributes -> {Listable}] & /@ 
   Eigensystem[{{0, (ωz1 - ωz2)/
       2, (-ωx1 + ωx2)/(2 Sqrt[
           2]) - (I ωy1)/(2 Sqrt[2]) + (I ωy2)/(2 Sqrt[
           2]), (ωx1 - ωx2)/(2 Sqrt[
           2]) - (I ωy1)/(2 Sqrt[2]) + (I ωy2)/(2 Sqrt[
           2])}, {(ωz1 - ωz2)/2, 
      0, (ωx1 + ωx2)/(2 Sqrt[
           2]) + (I ωy1)/(2 Sqrt[2]) + (I ωy2)/(2 Sqrt[
           2]), (ωx1 + ωx2)/(2 Sqrt[
           2]) - (I ωy1)/(2 Sqrt[2]) - (I ωy2)/(2 Sqrt[
           2])}, {(-ωx1 + ωx2)/(2 Sqrt[
           2]) + (I ωy1)/(2 Sqrt[2]) - (I ωy2)/(2 Sqrt[
           2]), (ωx1 + ωx2)/(2 Sqrt[
           2]) - (I ωy1)/(2 Sqrt[2]) - (I ωy2)/(2 Sqrt[
           2]), ω0 + (ωz1 + ωz2)/2, 
      0}, {(ωx1 - ωx2)/(2 Sqrt[
           2]) + (I ωy1)/(2 Sqrt[2]) - (I ωy2)/(2 Sqrt[
           2]), (ωx1 + ωx2)/(2 Sqrt[
           2]) + (I ωy1)/(2 Sqrt[2]) + (I ωy2)/(2 Sqrt[
           2]), 0, -ω0 + 1/2 (-ωz1 - ωz2)}}];

Furthermore, I try to copy your approach by defining H0

H0[{ω0_, ωx1_, ωx2_, ωy1_, ωy2_, 
ωz1_, ωz2_}] = 
  N[{{0, (ωz1 - ωz2)/
      2, (-ωx1 + ωx2)/(2 Sqrt[
          2]) - (I ωy1)/(2 Sqrt[2]) + (I ωy2)/(2 Sqrt[
          2]), (ωx1 - ωx2)/(2 Sqrt[
          2]) - (I ωy1)/(2 Sqrt[2]) + (I ωy2)/(2 Sqrt[
          2])}, {(ωz1 - ωz2)/2, 
     0, (ωx1 + ωx2)/(2 Sqrt[
          2]) + (I ωy1)/(2 Sqrt[2]) + (I ωy2)/(2 Sqrt[
          2]), (ωx1 + ωx2)/(2 Sqrt[
          2]) - (I ωy1)/(2 Sqrt[2]) - (I ωy2)/(2 Sqrt[
          2])}, {(-ωx1 + ωx2)/(2 Sqrt[
          2]) + (I ωy1)/(2 Sqrt[2]) - (I ωy2)/(2 Sqrt[
          2]), (ωx1 + ωx2)/(2 Sqrt[
          2]) - (I ωy1)/(2 Sqrt[2]) - (I ωy2)/(2 Sqrt[
          2]), ω0 + (ωz1 + ωz2)/2, 
     0}, {(ωx1 - ωx2)/(2 Sqrt[
          2]) + (I ωy1)/(2 Sqrt[2]) - (I ωy2)/(2 Sqrt[
          2]), (ωx1 + ωx2)/(2 Sqrt[
          2]) + (I ωy1)/(2 Sqrt[2]) + (I ωy2)/(2 Sqrt[
          2]), 0, -ω0 + 1/2 (-ωz1 - ωz2)}}];
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
halirutan
  • 112,764
  • 7
  • 263
  • 474
  • thanks for you analysis. The times look very promising. I need the eigenvectors to be normalized. Is it possible to get normalized vectors out of Eigensystem? – BeauGeste Oct 23 '12 at 03:59
  • 1
    @Beau, look up Normalize[] and Orthogonalize[]. – J. M.'s missing motivation Oct 23 '12 at 04:05
  • @halirutan This code is still of great help to me(for fast processing). Just one query(which will solve my problem of asking similar question in MMSE), in intialization {evalues, evectors}, you wrote whole matrix again inside Eigensystem, is it required. As in my case, matrix is insanely large and its size changes(by changing size parameter), do I need to copy my matrix again. Error I am gettingo,compiledFunction1,{{Real,0,Constant},{Real,0,Constant},{Real,0,\ Constant}},Real]]] are not the same shape – L.K. Jan 30 '17 at 10:22
  • 1
    The main reason for the speed up here is that the OP wanted eigenvalues and eigenvectors of the same matrix for different matrix elements. So we calculated the Eigensystem only one time and built a compiled function from it. When your matrix changes on each call, this method won't work for you. Do you call {evalues, evectors} for a specific matrix several times or only exactly once? – halirutan Jan 30 '17 at 11:48
  • @halirutan Thanks. I am calling for particular size only once. Ex: If I have a 11X11 then I am only evaluating(evectors and evalues) for that size once. Storing them inside some matrix, to say. Will this work? – L.K. Jan 30 '17 at 12:29
  • @halirutan In initialization does it matter for {evalues, evectors} to be written as {evalues, evectors}:=(set delayed) instead of {evalues, evectors}=? In my case using first I am getting result(with an error) but using second it shows running(longer than the time it takes for former intialization). Your help will be of great use to me(I am really stuck, trying from past two days). Thanks – L.K. Jan 30 '17 at 13:19
  • If you only calculate each matrix once, then don't compile it. It will slow down everything. Just use Eigensystem to calculate what you need. – halirutan Jan 30 '17 at 14:13
  • @halirutan I am seeing your helpful replies very late(as you didn't add @L.K., in the comments). This will be last nudge, as here after initialization you have added different ways(differing by time consumption). I only need to care about the fastest one, so after initialization I can copy evectors @@ Transpose[data]. Forgive me for asking obvious things. – L.K. Jan 30 '17 at 17:05
  • @L.K. Why don't you stick around in the [chat] for a while? I'm usually online there and we can discuss where exactly you have problems. The other choice is to ask a separate question with a small example pointing out how your situation is different. – halirutan Jan 31 '17 at 02:49
3

You may want to consider simplifying your matrix by substituting the [Omega] pairs that occur often with a dummy. This will simplify your matrix as there are only very few variables left. All you need to do is design a pattern to replace the most common [Omega] pairs (in that order). After the Eigensystem calculation, you can substitute them back.

I did something similar where several component were frequently occuring however were nested in divisions on several levels. Standard Simplify did not work, so I wrote my own which substantially increased performance of the simplified matrix.

The pattern I used was the simple Power[x_,y__] How you identify what combination to substitute depends on what's underlying your system, I am not really up to speed on Fourier transformations but this seems to be underlying what you are analysing here.

Below example gives your a transformastionRules fir the matrix and a reverseTransformation after the Eigensystem is calculated.

dummycount=1;
Position[HO,(*pattern*)];
H0[[#/.List->Sequence]]&/@Position[H0,(*pattern*)];
Sort[Cases[Tally[%],Except[{_,1}]],#1[[2]]>#2[[2]]&];
transformationRules=Rule[#[[1]],dummy[dummycount++]]&/@%;
reverseTransformation=transformationRules/.Rule[x_,y_]->Rule[y,x];

@halirutan : Thanks for the excellent example, I achieved several orders of magnitude speed-up for a program I was working on; however when I tested the execution of the compiled code without CompilationTarget->"C" something changes:

evectors@@@data==Eigenvectors[H0[#]]&/@data
(* False *)
Eigenvectors[H0[#]]&/@data==ParallelMap[Eigenvectors[H0[#]]&,data]
(* True *)
ParallelMap[Eigenvectors[H0[#]]&,data]==evectors@@Transpose[data]
(* False *)
evectors@@@data==evectors@@Transpose[data]
(* True *)

It seems to me the compiled functions behave differently from the non-compiled but I cannot discover where the difference is. I did create a trivial (2 dimension) example which simply swapped output and hence was easily corrected by

Reverse/@(evectors@@Transpose[data]) 

but for higher dimensions it is unclear what causes it to change its output.

Also, removing the RuntimeAttributes -> {Listable} attribute did not change anything. The problem seems to be that parameters are passed on to EigenSystem in listable manner and Eigensystem seems not to work in Listable mode; Eigensystem::matsq : "Argument {…} is not a non-empty square matrix.

However rewriting the code in what I believe to be functional identical by:

cpfnc = Compile[{{ω0,ωx1,ωx2,ωy1,ωy2,ωz1,ωz2},
Module[{eval,evec},{eval,evec} = Eigensystem[{{0, (ωz1 - ωz2)/
   2, (-ωx1 + ωx2)/(2 Sqrt[
       2]) - (I ωy1)/(2 Sqrt[2]) + (I ωy2)/(2 Sqrt[
       2]), (ωx1 - ωx2)/(2 Sqrt[
       2]) - (I ωy1)/(2 Sqrt[2]) + (I ωy2)/(2 Sqrt[
       2])}, {(ωz1 - ωz2)/2, 
  0, (ωx1 + ωx2)/(2 Sqrt[
       2]) + (I ωy1)/(2 Sqrt[2]) + (I ωy2)/(2 Sqrt[
       2]), (ωx1 + ωx2)/(2 Sqrt[
       2]) - (I ωy1)/(2 Sqrt[2]) - (I ωy2)/(2 Sqrt[
       2])}, {(-ωx1 + ωx2)/(2 Sqrt[
       2]) + (I ωy1)/(2 Sqrt[2]) - (I ωy2)/(2 Sqrt[
       2]), (ωx1 + ωx2)/(2 Sqrt[
       2]) - (I ωy1)/(2 Sqrt[2]) - (I ωy2)/(2 Sqrt[
       2]), ω0 + (ωz1 + ωz2)/2, 
  0}, {(ωx1 - ωx2)/(2 Sqrt[
       2]) + (I ωy1)/(2 Sqrt[2]) - (I ωy2)/(2 Sqrt[
       2]), (ωx1 + ωx2)/(2 Sqrt[
       2]) + (I ωy1)/(2 Sqrt[2]) + (I ωy2)/(2 Sqrt[
       2]), 0, -ω0 + 1/2 (-ωz1 - ωz2)}}];
{eval,evec}],
RuntimeAttributes -> {Listable}],Parallelization -> True];

does give the correct output, however again Listable provides problems for EigenSystem[] once parallel input is provided.

Love to have your views!

Sander
  • 1,866
  • 11
  • 15
  • 1
    I can't test this in version 7, but the difference in behavior that you observe may be caused by RuntimeAttributes -> {Listable}. Have you considered that? – Mr.Wizard May 13 '14 at 12:01
  • You are right, The parameters are passed on, assuming EigenSystem has attribute Listable. I have updated my answer. – Sander May 15 '14 at 03:34
1

If Length[r]==6 the following will be faster.

es = Apply[Eigensystem[H0[\[Omega]0,##]]&,r];

Well I may not have the syntax right because I am doing this with my iPhone. My point is use one call to Apply instead of six calls to Part.

If Length[r]>6, then change ## to (#1,#2,#3,#4,#5,#6).

halirutan
  • 112,764
  • 7
  • 263
  • 474
Ted Ersek
  • 7,124
  • 19
  • 41