16

I have some data generated from some program, and it appears that matrix multiplication on these data are about 10 times slower than on some random data:

Get["tb.dat"];

xls = Range[-500, 500, 1000/(1000 - 1)] // N;

Re[Conjugate[#].(xls*#)] & /@ tb; // AbsoluteTiming
(* {0.067147, Null} *)

tb2 = RandomComplex[{0., 1. + I}, Dimensions[tb]];

Re[Conjugate[#].(xls*#)] & /@ tb2; // AbsoluteTiming
(* {0.004564, Null} *)

So what are the reasons for the performance problem with my data?

The data files are here (6MB).


Update

Here is the same problem with the packed array. Since Save unpacks data, so I have to use DumpSave to demonstrate the problem:

DumpGet["tb3PK.dump"];

Developer`PackedArrayQ@tb3PK
(* True *)

xls = Range[-500, 500, 1000/(1000 - 1)] // N;

Re[Conjugate[#].(xls*#)] & /@ tb3PK; // AbsoluteTiming
(* {0.065747, Null} *)

tb2 = RandomComplex[{0., 1. + I}, Dimensions[tb3PK]];

Re[Conjugate[#].(xls*#)] & /@ tb2; // AbsoluteTiming
(* {0.003482, Null} *)

The file tb3PK.dump is here . I'm using OS X 10 and Mathematica version 10.0.1

xslittlegrass
  • 27,549
  • 9
  • 97
  • 186
  • 7
    It is because your data is not packed, but data generated by RandomComplex is. Try this: {{Developer`PackedArrayQ@tb, ByteCount@tb},{Developer`PackedArrayQ@tb2, ByteCount@tb2}}. However according to what I have learned about packed arrays, your data should be packable because Equal @@ Map[Head, tb, {2}] is true, i.e. all of your elements are of the same type. However, Developer`PackedArrayQ@Developer`ToPackedArray[tb] returns False. I am not writing this as an answer as I hope that an answer will address how to pack your data. – C. E. Jan 09 '15 at 04:35
  • @Pickett No wonder tb3=Developer``ToPackedArray[tb] is as slow as the unpacked data. I didn't know that Developer`ToPackedArray could fail to pack an array without giving any error message. – xslittlegrass Jan 09 '15 at 05:04
  • Computing the following Tally[ByteCount /@ Flatten[tb]] shows that some of your numbers have different byte counts due to large exponents. I doubt the data can be packed. – Andy Ross Jan 09 '15 at 14:17
  • Re update: It's because the calculation is resulting in numbers that are not machine-sized numbers. In such a case, Mathematica automatically converts them to arbitrary precision numbers, which take longer to calculate. Also the result is not packed. -- Compare with tb2 = RandomComplex[{$MinMachineNumber, 1. + I $MinMachineNumber}, Dimensions[tb3PK]] – Michael E2 Jan 09 '15 at 19:32
  • 1
    @MichaelE2 but why the problem still exists even when I set SetSystemOptions["CatchMachineUnderflow" -> False], it seems to me if the problem is caused by the auto convention to high precision, turn off it would resolve the problem ? – xslittlegrass Jan 09 '15 at 19:55
  • What makes you think that option prevents the automatic conversion? – Michael E2 Jan 09 '15 at 20:15
  • @MichaelE2 I thought the option removes the process of catching underflow as its name suggested. Also I read from the documentation that RuntimeOptions->"Speed" is equivalent to turning off those underflow or overflow catch, which seem to suggest that those catches would take extra time. – xslittlegrass Jan 09 '15 at 20:24
  • 1
    Well, I'm not sure. Setting it to False speeds it up, but only by a factor of 4. Perhaps there's a cost to underflow even when it is not "caught." Someone else is going to have to answer this. Sorry. – Michael E2 Jan 09 '15 at 20:42
  • @MichaelE2 No problem, your comments are already very helpful for me. Thanks:) – xslittlegrass Jan 09 '15 at 20:44
  • Where did you get that DumpGet from? It's not documented. – Sjoerd C. de Vries Jan 09 '15 at 22:50
  • @SjoerdC.deVries I guessed it :) – xslittlegrass Jan 09 '15 at 22:53
  • 1
    Get on its own should work. DumpGet has no doc page, though there exists an error page. – Sjoerd C. de Vries Jan 09 '15 at 22:54

2 Answers2

13

For readers who didn't read all the comments, the slowdown is due to a lack of packing of tb, whereas RandomReal returns packed arrays when more than 250 elements are generated. The reason why packing tb fails is because some elements have different precision than others, and (I think?) ToPackedArray requires arrays to be of homogeneous type.

To fix this, chop and multiply by 1. + 0. I, then pack:

<< Developer`
tbP = ToPackedArray[(1. + 0. I) Chop@tb];
PackedArrayQ[tb2]
Re[Conjugate[#].(xls*#)] & /@ tb; // AbsoluteTiming
Re[Conjugate[#].(xls*#)] & /@ tbP; // AbsoluteTiming

producing

True
{0.031200, Null}
{0., Null}

Does anyone know a better way of taking an array composed of heterogeneous-precision real and complex numbers, and converting it into an array of homogeneous-precision complex numbers? The above trick works, but I feel like there ought to be a better way.

Update

To address the Update in your question, I'm stumped. Comparing the DumpSave output with the solution I used above:

DumpGet["tb3PK.dump"];
Re[Conjugate[#].(xls*#)] & /@ tbP; // AbsoluteTiming
Re[Conjugate[#].(xls*#)] & /@ tb3PK; // AbsoluteTiming
Tally[Precision /@ Flatten[tbP]]
Tally[Precision /@ Flatten[tb3PK]]
ByteCount[tbP]
ByteCount[tb3PK]

giving

{0., Null}
{0.040000, Null}
{{MachinePrecision, 100000}}
{{MachinePrecision, 100000}}
1600152
1600152

In other words, the DumpSave version is indeed slower as you observed, despite having the same bytecount and precision, and I'm not sure why. You may want to ask that as a separate question.

DumpsterDoofus
  • 11,857
  • 1
  • 29
  • 49
  • Could use N[array]. – Daniel Lichtblau Jan 09 '15 at 15:36
  • I think the numbers in the tb are all MachinePrecision, for example And @@ ((Precision[#] == MachinePrecision &) /@ tb) returns True. Also I'm interested in the log spectrum of the data, where small numbers are important to me, so I would prefer to not Chop if possible. – xslittlegrass Jan 09 '15 at 15:37
  • @DanielLichtblau: Using N[tb] and then packing fails to pack the array, because the result is still heterogeneous-precision. – DumpsterDoofus Jan 09 '15 at 15:47
  • @xslittlegrass: Tally[Precision /@ Flatten[tb]] shows that most, but not all elements are MachinePrecision. – DumpsterDoofus Jan 09 '15 at 15:50
  • @DumpsterDoofus Those few numbers have precisions like 15.954589770191005, and if you compare with Machine precision, it return true 15.954589770191005 == MachinePrecision`. – xslittlegrass Jan 09 '15 at 15:55
  • @xslittlegrass: It's strange: one number with MachinePrecision occupies 56 bytes, and another with precision 15.9546 occupies 216 bytes. I'm not sure why this is the case. Hopefully someone else will answer with a better explanation. In the mean time, I noticed that the largest elements of your array have absolute value of about 0.6, so if a log-spectrum of your data is needed, then hopefully Chop will only affect the very-very-low-amplitude areas of your plot. Otherwise, you may have to use the slower, but more precise unpacked array tb. – DumpsterDoofus Jan 09 '15 at 16:04
  • @xslittlegrass: Also, you can control the delta-cutoff of Chop, for example, you can use ToPackedArray[(1.0 + 0.0 I) Chop[tb, 10^-32]]. I haven't tested it to see if it works properly, though. – DumpsterDoofus Jan 09 '15 at 16:06
  • If N is not creating floats, that typically means there are numbers outside the range of floats (and they will have precision numerically equal to N[MachinePrecision], if they were computed at machinePrecision). I assume they are small rather than large in magnitude, and so your Chop suggestion seems like a viable way to proceed. – Daniel Lichtblau Jan 09 '15 at 16:09
  • 1
    @xslittlegrass 15.954589770191005 == MachinePrecision does return True, but when using as precision, they're different, the latter represents machine precision, while the former represents arbitrary precision, they lead to 2 different algorithm… well, it's a little hard for me to explain in my own words, but there're many related posts in this site, for example this one. – xzczd Jan 10 '15 at 05:08
2

I propose a silly workaround, instead of a workable explanation:

Get["tb.dat"];

xls = Range[-500, 500, 1000/(1000 - 1)] // N;
test[tb_] := (Re[Conjugate[#].(xls*#)] & /@ tb;) // AbsoluteTiming;
test[RandomComplex[{0., 1. + I}, Dimensions[tb]]]
(* {0.002002, Null} *)
test[tb]
(* {0.040038, Null} *)
test[tb + ConstantArray[0. + 0. I, Dimensions@tb]]
(* {0.003003, Null} *)
test[tb (1.0 + 0. I)]
(* {0.002002, Null} *)    

This tricks Mathematica into doing the right thing. I find it often (too often) helps a lot to multiply with 1., or just add 0.. From my tests it doesn't seem to matter if you multiply by (1.0 + 0. I) or add a constant array, you get the same performance boost. It also doesn't seem to matter if the array is packed or not.

Gleno
  • 1,501
  • 11
  • 16