How can I speed up my code?

Question

For[n = 2500, n < 10000, n += 500, 
  sum = 0;
  pp = N[1/HarmonicNumber[n, 2]];
  A = Reverse[IdentityMatrix[n]];
  H = IdentityMatrix[n];
  For[i = 0, i < n, i++, 
    For[j = 1, j < n + 1 - i, j++, 
      H[[j, j + i]] = Sqrt[pp]/(N[Sqrt[n - i]*(i + 1)])]];
  T := A.Orthogonalize[N[H].A, Dot];
  Z = T.N[H].A.T\[Transpose].A;
  sum = Total[(Flatten[UpperTriangularize[Z,1]])^2];
  Print[sum]
]

I have run this program for a long time to get the result for n = 2500 (about 12 hours) and no more result till now (more than 3 days). I want to know how to make it more efficient. Any help or suggestion will be appreciated!

An obvious suggestion: produce a short version of your routine that takes just 30 seconds. Next, work out where most time is spent. Optimize that piece. Run it for a bit longer, see what's changed. Then, Rinse and repeat. — cormullion, Aug 08 '13 at 11:59
for example Sum[1/i^2, {i, 1, n}] == HarmonicNumber[n, 2] and is always better to work with built in functions. I have also bad experience with For I would use Do if there is no way to implement Nest, Map etc. — Kuba, Aug 08 '13 at 12:02
Thank you all! Does N[Sqrt[..]] will help? And how to learn about the built in functions? — Eden Harder, Aug 08 '13 at 12:11
@user8934 sometimes it helps to keep symbolic expression, if it is going to reduce something but usualy for "real" data it is going to slow down calculations. You can learn by analyzing your code and testing what can be replaced by checking this in documentation center. Also there are a lot of useful Q&A here. avoiding procedural loops — Kuba, Aug 08 '13 at 12:20
I think A = Reverse[IdentityMatrix[n]] should provide the same result as your two lines initializing A, but it should a little faster/cleaner. — Jacob Akkerboom, Aug 08 '13 at 12:26
You can use UpperTriangularize[Z,1] to zero everything on the main and lower diagonals. And sum it like: Total[(Flatten[UpperTriangularize[Z,1]])^2] — ssch, Aug 08 '13 at 12:37
@Kuba there are exceptions (1, 2) to that rule. But, you are correct, in general built-in is faster, just not always. — rcollyer, Aug 09 '13 at 13:38

score 14 · Accepted Answer · edited Apr 13 '17 at 12:55

14

There is a lot going on with your code, as already pointed out by others. But, I would like to point out a few more things, and this was much, much to long for a comment.

First, I would use SparseArray to generate H, instead of the pre-allocate, fill-in method you are using, as follows:

H = SparseArray[
 {l_, m_} /; m >= l :> Sqrt[pp]/(Sqrt[n + l - m] (m - l + 1)),
 {n, n}
];

Edit: looking at the timing for this, and as pointed out in the comments by ssch, the calculation of H can be sped up significantly by using Array, instead:

diags = Sqrt[1./HarmonicNumber[n, 2]]/(Sqrt[n - #] (# + 1)& @ 
     Range[0., n - 1]); 
H = Array[If[#2 < #1, 0., diags[[#2 - #1 + 1]]] &, {n, n}];

For large matrices, this scales significantly better for large n. For instance, at n = 500 the SparseArray method takes 3.3 secs v. 0.028 secs for the Array method.

End Edit

This generates an upper-triangular matrix, with m - l taking the place of i in your loops. Also, I got rid of the extra N as it is not necessary because pp is already a machine number from your use of N in its definition. Similarly, all N[H] can be replaced with H, further down in the code. While it may not provide much of a speed up, do not overuse a function if you don't have to.

Second, in terms of overuse, the way you have defined T means it will be regenerated every time it is used. Orthogonalize is not a fast operation, so don't make more work for yourself. I would define T as

T = A.Orthogonalize[H.A];

where I am using Set (=), instead of SetDelayed (:=), and I have removed the second argument to Orthogonalize. By default, Orthogonalize effectively uses Dot, but by specifying the argument, you may be forgoing potential speed ups. If this is true in this case, I do not know, I have not tested it. But, it holds true in a lot of other cases, so be cautious about overriding defaults.

Third, A.M.A == M. This simplifies the definition of Z to

Z = T.H.Transpose[T]

Edit: That is not true, in general. But, $A^T = A$ and $AA = I$, so

$$A(ABA)^TA = AAB^TAA = B^T,$$

and combined with

Othogonalize[H.A] == Orthogonalize[H].A

as discussed by Daniel, both T and Z can be rewritten:

T = Orthogonalize[H];
Z = A.T.A.H.Transpose[T];

But, as A is just a permutation, it can be best effected by using Part, and using Daniel's notation:

negrange = -Range[n];
Z = T[[negrange, negrange]].H.Transpose[T];

Fourth, timing shows that the longest running part of the code is Orthogonalize. On my machine, at n = 2500 this step along can take over a minute. An alternative is to use QR decomposition which generates an upper triangular matrix, $R$, and an orthogonal matrix, $Q$, that forms a basis for the column space of the matrix being decomposed. So, to get an orthogonal basis out of the row space, you take the transpose. The definition of T becomes

T = QRDecomposition[Transpose[H]][[1]];

The function below has been edited with these changes.

End Edit

Lastly, I would put the entire contents of the outer For into a separate function, sans the Print statement. Print does not provide results in a form usable for further processing, but as you were using For, its use is understandable. Additionally, putting it into a separate function allows you to use any number of methods for generating your data. With the changes I have suggested above, this looks like

myFunction[n_Integer?Positive] :=
Block[{negrange, diags, H, T, Z},
  diags = Sqrt[1./HarmonicNumber[n, 2]]/(Sqrt[n - #] (# + 1)& @ 
     Range[0., n - 1]); 
  H = Array[If[#2 < #1, 0., diags[[#2 - #1 + 1]]] &, {n, n}]; 
  T = QRDecomposition[Transpose[H]][[1]];
  negrange = -Range[n];
  Z = T[[negrange, negrange]].H.Transpose[T];
  Total[(Flatten[UpperTriangularize[Z,1]])^2]
]

Edit: Timings on my computer:

n     Timing (s)
2500  15.05
3000  24.31
3500  37.61
4000  54.56

So, while not fast, it scales significantly better than the previous versions.

End Edit

As this is likely to be a long running process, I would run it as follows:

Internal`WithLocalSettings[
  strm = OpenWrite["output.log"];
  AppendTo[ $Output, strm ],

  Scan[Print[#, "\t", myFunction[#]]&, Range[2500, 10000, 500]],

  $Output = Drop[$Output, -1];
  Close[strm]
]

where a Print will output to both a log-file and the screen. This allows you to save the output as you go. If you don't need this, then just use Table.

Edit - Last words:

The key here is simply to profile your code. Examine each step and find out which is slow. As shown above, while I like the simplicity of using SparseArray, it scales badly compared to using Array, so for efficiency sake SparseArray is out. The same thing goes for Orthogonalize v. QRDecomposition. As you might not be familiar with the properties of QRDecomposition, that is when you ask: Orthogonalize scales badly, are there alternatives?

The other advice is to reduce redundancy. Only do something once, if you can get away with it. There are times where you won't be able to, like you run out of memory, but if you can, it often will save time.

edited Apr 13 '17 at 12:55

Community

1

answered Aug 08 '13 at 14:04

rcollyer

33,976
7
92
191

H can be sped up significantly by first calculating the diagonals and then using Array to construct the matrix by picking appropriate diagonal (or 0) diags = Sqrt[1./HarmonicNumber[n, 2]]/(Sqrt[n - #] (# + 1)&@Range[0., n - 1]); H = Array[If[#2 < #1, 0., diags[[#2 - #1 + 1]]] &, {n, n}]; – ssch Aug 08 '13 at 15:20
@ssch interesting timing results. At n = 500, H takes 3.3 v. 0.027 and T takes 0.37 v. 0.25. Array is the clear winner. (I was looking at that, but separating out the diagonals eluded me.) I did not try the SparseArray version n = 2000, as I don't like locking up my computer, but the Array version comes in at 0.36 for H, 7.15 for T, and 3.58 for Z (up from 0.13 at n = 500). At n = 2500, T balloons out to 53.5 and Z takes 7.31. Clearly, not scaling well. – rcollyer Aug 08 '13 at 15:50
Interesting. A can be replaced with Reverse, that is: T =Reverse[Orthogonalize[(Reverse /@ H)]]; And Householder orthogonalization seems way faster: T = Reverse[Orthogonalize[(Reverse /@ H), Method -> "Householder"]]; – ssch Aug 08 '13 at 16:42
@ssch the speed up on my machine comes from replacing A with Reverse, but Method -> "Householder" takes over a second longer. – rcollyer Aug 08 '13 at 16:53
Thanks everyone! A.M.A is not same as M, for example, M={{a,b},{c,d}}, then A.M.A={{d,c},{b,a}} – Eden Harder Aug 09 '13 at 00:10
@user8934 you are correct. I addressed that above in my edit, alongside a massive speed-up for Orthogonalize. – rcollyer Aug 09 '13 at 13:18
1

Is that really sped up from OP's 12 hours to 15 seconds?! That's what I call a good answer! – cormullion Aug 09 '13 at 13:29
@cormullion the trick is not use Orthogonalize which scales very badly. I don't know how fast it will run on the OPs computer, but it should be significant. – rcollyer Aug 09 '13 at 13:31
Thanks a lot! But it only output'n myFunction[n]' in screen and 'n"\t"myFunction[n]' in the output.log when I adopt the second suggestion. – Eden Harder Aug 09 '13 at 13:52
@user8934 that's because I'm an idiot. Fixed the code. – rcollyer Aug 09 '13 at 13:54
whatif this version? hh = ({ {a, b, c, d, e, f, g}, {0, h, i, j, k, l, f}, {0, 0, m, n, o, k, e}, {0, 0, 0, p, n, j, d}, {0, 0, 0, 0, m, i, c}, {0, 0, 0, 0, 0, h, b}, {0, 0, 0, 0, 0, 0, a} }); orth = Orthogonalize[hh]; negrange = Range[Length[orth], 1, -1]; z3 = orth[[negrange, negrange]].hh.Transpose[orth]; sum = Total[(Flatten[UpperTriangularize[z3, 1]])^2]; FindMaximum[{sum, 2a^2 + 2b^2 + 2c^2 + 2d^2 + 2e^2 + 2f^2 + g^2 + 2h^2 + 2i^2 + 2j^2 + 2k^2 + l^2 + 2m^2 + 2n^2 + o^2 + p^2 == 1}, {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p}] – Eden Harder Aug 09 '13 at 14:10
@user8934 First, QRDecomposition requires a numeric matrix, so you'd have to use Orthogonalize. Second, you lose a lot of the other speed ups, too. So, I wouldn't except with very small matrices. – rcollyer Aug 09 '13 at 14:14
The above one takes 14844.055031s. – Eden Harder Aug 09 '13 at 14:15
@user8934 First question: where is it slow? – rcollyer Aug 09 '13 at 14:16
Too many variables. For a four dimesion version it only take a few seconds! Reseting accuracy may be helpful? How to do it? – Eden Harder Aug 09 '13 at 14:20
@EdenHarder is it slow in Orthogonalize of FindMaximum? And, accuracy is irrelevant in symbolic calcs. – rcollyer Aug 09 '13 at 14:23
@rcollyer Orthogonalize will not take a minute, it is FindMaximum that take a lot of time. – Eden Harder Aug 09 '13 at 14:34
@EdenHarder I don't know. Sounds like another question to me. – rcollyer Aug 09 '13 at 14:51
Yeah, it is another question. Thank you and other answers very much! – Eden Harder Aug 09 '13 at 15:05
@EdenHarder don't forget to mark the best answer as accepted, as per the faq. – rcollyer Aug 09 '13 at 15:15
@EdenHarder thanks for the accept. Would you mind posting a few timings of the code on your machine? – rcollyer Aug 09 '13 at 16:49
For the numerical(original) version, 'n' and 'timing(s)' are '2500, 7.160000', '3000, 11.272000', '3500, 16.968000', '4000, 24.328000'. – Eden Harder Aug 10 '13 at 03:43
For the symbolic version, 'n' and 'timing(s)' for Orthogonize and FindMaximum are '4, 0.048000, 0.296000', '5, 0.500000, 9.200000' '6, 10.824000, 338.924000', '7, unknown, 14844.055031'. – Eden Harder Aug 10 '13 at 04:01

Daniel Lichtblau · Answer 2 · 2013-08-08T17:53:15.570

9

I'm not sure one can get the larger cases due to memory needs. The code below, which uses the array generation of @rcollyer, seems reasonably effective at least for the smaller end.

n = 2500;
Timing[sum = 0;
 diags = Sqrt[
    1./HarmonicNumber[n, 2]]/(Sqrt[n - #] (# + 1) &@Range[0., n - 1]);
  hh = Array[If[#2 < #1, 0., diags[[#2 - #1 + 1]]] &, {n, n}]; 
 orth = Orthogonalize[hh];
 negrange = Range[Length[orth], 1, -1]; 
 z3 = orth[[negrange, negrange]].hh.Transpose[orth]; 
 sum = Total[(Flatten[UpperTriangularize[z3, 1]])^2]]

(* {16.630000, 0.415096142869} *)

--- edit ---

This shows more timings at the lower end of the desired range.

Do[
 time = Timing[sum = 0;
   diags = 
    Sqrt[1./HarmonicNumber[n, 2]]/(Sqrt[n - #] (# + 1) &@
       Range[0., n - 1]); 
   hh = Array[If[#2 < #1, 0., diags[[#2 - #1 + 1]]] &, {n, n}]; 
   orth = Orthogonalize[hh];
   negrange = Range[Length[orth], 1, -1]; 
   z3 = orth[[negrange, negrange]].hh.Transpose[orth];
   sum = Total[(Flatten[UpperTriangularize[z3, 1]])^2]
   ];
 Print[{n, time, sum}];
 {time, sum},
 {n, 2500, 5000, 500}]

{2500,{17.230000,0.415096142869},0.415096142869}

{3000,{29.170000,0.415356679612},0.415356679612}

{3500,{45.930000,0.41554482016},0.41554482016}

{4000,{68.400000,0.415687180722},0.415687180722}

{4500,{96.070000,0.415798728445},0.415798728445}

{5000,{130.300000,0.415888533453},0.415888533453}

--- end edit ---

edited Aug 08 '13 at 17:53

answered Aug 08 '13 at 17:34

Daniel Lichtblau

58,970
2
101
199

Orthogonalize[H.A] per the OP gives a different result than Orthogonalize[H] which may or may not be okay, depending on the purpose of Orthogonalize here. – rcollyer Aug 08 '13 at 17:39
@rcollyer Orthogonalize[H.A] == Orthogonalize[H] . A because A just rearranges rows or columns, depending on which side you multiply from. This also is how i got rid of some dot products (I removed others from the fact that A.A is the identity matrix). I actually tested/validated my results against the code from the OP. – Daniel Lichtblau Aug 08 '13 at 17:49
@rcollyer Let me slightly correct myself. Multiplying by A on the right rearranges columns. This is an important distinction because a row rearrangement would not commute with orthogonalization. – Daniel Lichtblau Aug 08 '13 at 17:52
On Orthogonalize[H.A] == Orthogonalize[H].A, you seem to be correct. I get a fairly low Frobenius norm on their differences, $\sim 10^{-14}$. But, I get a substantial norm with A.Orthogonalize[H].A - Orthogonalize[H], $\sim 68.8$. I don't have a ready explanation as to why, but it explains the differences I get between the calculated values our codes generate. I get $0.391274$, but changing your code to use A.Orthogonalize[H].A, I get $0.246054$. – rcollyer Aug 08 '13 at 19:12
@rcollyer Note though that I did not use Orthogonalize[H] in lieu of A.Orthogonalize[H].A. I instead used a row and column reversed form of Orthogonalize[H]. It is the same as A.Orthogonalize[H].A. – Daniel Lichtblau Aug 08 '13 at 19:38
Really? Your construction of hh looks identical to my construction of H. Where do the reversals come in? Never mind, I see it, at least in part ... – rcollyer Aug 08 '13 at 19:45
I edited my comment above to reflect the fact that I noticed it after I commented. – rcollyer Aug 08 '13 at 19:48
I think you're missing a term, then. Transpose[orth] should be Transpose[orth[[negrange, negrange]]]. Then it gives the same results I get. – rcollyer Aug 08 '13 at 19:50
@rcollyer No, in my code that second one loses the surrounding A factors because Transpose[A]==A and A.A is the identity. – Daniel Lichtblau Aug 08 '13 at 19:53
You're correct. I hadn't fully worked through the math. Next time, I'll try not to be as hasty. :) – rcollyer Aug 11 '13 at 17:05
@rcollyer I notice you put QRDecomposition to good use. I had been meaning to try that but never quite got around to it. Very nice (and an upvote). – Daniel Lichtblau Aug 11 '13 at 17:59

How can I speed up my code?

2 Answers2