6

I noticed that for the computation of the trace of a product of two matrices, using Tr[Dot[A,B]] is a little inefficient. Dot is computing all the elements of the matrix product, while Tr only needs the diagonal.

Is there a low-level, or fast implementation of trace-dot in Mathematica? (It needs to be able to work on matrices of mixed datatypes)


Look, I made a top-level implementation of trace-dot that is faster than Trace[Dot[...]]:

myTrDot[m1_,m2_]:=Total[MapThread[Dot, {m1, Transpose[m2]}]];

exMat1 = RandomVariate[GaussianOrthogonalMatrixDistribution[1000]];
exMat2 = RandomVariate[GaussianOrthogonalMatrixDistribution[1000]];

Tr[Dot[exMat1, exMat2]]; // AbsoluteTiming
(* 0.020229 *)

myTrDot[exMat1, exMat2]; // AbsoluteTiming
(* 0.015503 *)
QuantumDot
  • 19,601
  • 7
  • 45
  • 121

3 Answers3

4

This seems a bit faster than your myTrDot:

m1 = RandomVariate[GaussianOrthogonalMatrixDistribution[2000]];
m2 = RandomVariate[GaussianOrthogonalMatrixDistribution[2000]];
cf = Compile[{{A, _Real, 2}, {B, _Real, 2}}, 
         Module[{n = Length@A, Bt = Transpose@B}, Sum[A[[i]].Bt[[i]], {i, n}]]]

Tr[m1.m2]; // AbsoluteTiming 
(* 0.20 *)
myTrDot[m1, m2]; // AbsoluteTiming
(* 0.052 *)
cf[m1, m2]; // AbsoluteTiming
(* 0.028 *)
anderstood
  • 14,301
  • 2
  • 29
  • 80
4

I also throw in my hat into the ring:

m1 = RandomVariate[GaussianOrthogonalMatrixDistribution[2000]];
m2 = RandomVariate[GaussianOrthogonalMatrixDistribution[2000]];

a = Tr[m1.m2]; // RepeatedTiming // First
b = cf[m1, m2]; // RepeatedTiming // First
c = Total[Compile[{{x, _Real, 1}, {y, _Real, 1}}, x.y,
       CompilationTarget -> "WVM",
       RuntimeAttributes -> {Listable},
       Parallelization -> True,
       RuntimeOptions -> "Speed"
       ][m1, Transpose[m2]]]; // RepeatedTiming // First
a == b == c

0.106

0.018

0.014

True

Remark:

I am working on a Haswell quad core which seems to behave slightly different on such problems than more recent CPUs. So I am not sure if this method really performs better on other machines.

Henrik Schumacher
  • 106,770
  • 7
  • 179
  • 309
3
myTrDot2 = 
 Compile[{{m1, _Real, 2}, {m2, _Real, 2}}, 
  Flatten[m1].Flatten[Transpose[m2]]]

exMat1 = RandomVariate[GaussianOrthogonalMatrixDistribution[10000]];

exMat2 = RandomVariate[GaussianOrthogonalMatrixDistribution[10000]];

Tr[Dot[exMat1, exMat2]]; // AbsoluteTiming

myTrDot[exMat1, exMat2]; // AbsoluteTiming

myTrDot2[exMat1, exMat2]; // AbsoluteTiming

{18.0692, Null}

{1.64879, Null}

{1.42026, Null}

Henrik Schumacher
  • 106,770
  • 7
  • 179
  • 309
ulvi
  • 1,808
  • 10
  • 15