3

I’ve been working with SVD – singular value decomposition. Things weren’t working as expected. Thus, I looked over to Matlab and executed the following code:

X=[105 103 103 66;245 227 242 267;685 803 750 586; 147 160 122 93; 193 235 184 209; 156 175 147 139; 720 874 566 1033; 253 265 171 143; 488 570 418 355; 198 203 220 187; 360 365 337 334; 1102 1137 957 674; 1472 1582 1462 1494; 57 73 53 47; 1374 1256 1572 1506;
375 475 458 135; 54 64 62 41];
data = X; [M,N] = size(data);
mn = mean(data,2); 
data = data - repmat(mn,1,N); 
Y = data' / sqrt(N-1); 
[u,S,PC] = svd(Y);
S=diag(S);V=S.*S;
signals=PC'*data;

It results in a sensible PCA diagram for the data. Now, if I run it in Mathematica

countrydata = {{105, 103, 103, 66}, {245, 227, 242, 267}, {685, 803, 
750, 586}, {147, 160, 122, 93}, {193, 235, 184, 209}, {156, 175, 
147, 139}, {720, 874, 566, 1033}, {253, 265, 171, 143}, {488, 570,
 418, 355}, {198, 203, 220, 187}, {360, 365, 337, 334}, {1102, 
1137, 957, 674}, {1472, 1582, 1462, 1494}, {57, 73, 53, 
47}, {1374, 1256, 1572, 1506}, {375, 475, 458, 135}, {54, 64, 62, 
41}};

datamean = {{Mean[countrydata[[1]]]}};


Print[datamean];

For[i = 2, i <= Dimensions[countrydata][[1]], i++,

datamean = Append[ datamean, {Mean[countrydata[[i]]]}]


];



datashift = countrydata;


For[i = 1, i <= Dimensions[countrydata][[1]], i++,

    For[j = 1, j <=  Dimensions[countrydata[[1]]][[1]], j++,

      datashift[[i]][[j]] = countrydata[[i]][[j]] - datamean[[i]][[1]];]];


Y = Transpose[datashift]/(Sqrt[(Dimensions[countrydata[[1]]][[1]]) - 1]);


svd = SingularValueDecomposition[N[Y, 100]];

S = Diagonal[svd[[2]]];


V = S*S;



signal = Transpose[-svd[[3]]].datashift;


Print[MatrixForm[N[Y, 10]]];

 Show[ListPlot[{{{signal[[1, 1]], 0}}, {{signal[[1, 2]], 
 0}}, {{signal[[1, 3]], 0}}, {{signal[[1, 4]], 0}}},
    PlotRange -> All,
    PlotLegends -> {"England", "Wales", "Scotland", "N. Ireland" },
    PlotLabel -> "First two principal components for  data",
    FrameLabel -> {"z1", "z2"},
    RotateLabel -> False,
    Axes -> False,
    Frame -> {{True, False}, {True, False}},
    ImageSize -> {300, 300}]]

I get the following information for the V matrix in the singular value decomposition. This is the PC matrix in Matlab and svd[[3]] in Mathematica. The other two matrices returned in Singular Value Decomposition svd[[1]] and svd[[2]] exactly match mathematica.

Matlab PC: -0.0569553797856850 0.0479276281346853 -0.258916658336122 -0.0844149825250836 -0.00519362266004777 -0.0376209828394020 0.401402060296248 -0.151849941562302 -0.243593728990274 -0.0268862325367470 -0.0364882691115938 -0.632640897872237 -0.0477028583736490 -0.0261877559085335 0.232244140472894 -0.463968167976707 -0.0296502010879939

Mathematica svd[[3]]: 0.05695537979 -0.04792762813 0.2589166583 0.08441498253 0.005193622660 0.03762098284 -0.4014020603 0.1518499416 0.2435937290 0.02688623254 0.03648826911 0.6326408979 0.04770285837 0.02618775591 -0.2322441405 0.4639681680 0.02965020109

This is all and well. It matches - apart from the sign. Now, if we go to the 4th column

Matlab gives: -0.400748417916043 -0.150602373645460 -0.676298033445679 -0.0733743113594902 0.165183462293022 0.0375841868716541 -0.150666145597042 -0.103524165890781 0.124583909064645 0.0312459639649717 -0.0235673440745193 -0.150688757752460 0.322911600549900 0.0339945596788104 -0.148559469028165 0.351916694927059 0.0271722903782781

Mathematica gives: -0.9979609973 -0.0009722258735 0.001277619728 0.004860699968 -0.004240772408 0.0005876914821 -0.03923679062 0.01152344604 0.008614386639 0.0005409596602 0.002511166093 0.04306340746 -0.009183918074 -0.00001644119820 -0.008341374583 0.01635552490 0.0005560180332

Which is massively different. This trend continues for the rest of the lines. Why does there exist this disparity between the two Math Engines?

I thought it might be something to do with the domain matrices and U and V in the singular value decomposition are not unique, so it is difficult to compare results from different math engines. From another stack overflow post. But, this wouldn't explain the massive differences - or the similarity in the first few columns.

Note, the graphs returned are the same. Just the svd[[3]] and PC matrices are different.

Matlab reference for SVD: http://uk.mathworks.com/help/matlab/ref/svd.html Based on tutorial for PCA: http://www.sdss.jhu.edu/~szalay/class/2015/SignalProcPCA.pdf

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
Tomi
  • 4,366
  • 16
  • 31
  • 5
    It is because $Y^* Y$ has three non-zero eigenvalues and the rest are zero; columns 4 through 17 of PC form an orthonormal basis for the null space. There are infinitely many ways to do that. Given that the residual of the SVD, $U\Sigma V^* - Y$, is different in the two systems, I think the explanation is that they constructed bases in different ways, perhaps due to differences in round-off. I don't think there is a conventional standard method for completing a basis; I'm sure there is not a canonical one. – Michael E2 Feb 14 '16 at 21:40
  • 3
    In support of what Michael E2 says in his comment: 1 Compare the approximation norms of the decompositions in both Matlab and Mathematica. E.g. Norm[svd[[1]].svd[[2]].Transpose[svd[[3]]] - Y]. 2 Look at TensorRank[Y]. 3 How the results from the two systems compare with low rank SVD? E.g. SingularValueDecomposition[Y, 3]. – Anton Antonov Feb 14 '16 at 22:17
  • 4
    I'm voting to close this question as off-topic because the issue it raises is not a Mathematica issue but a mathematical one. That it is formulated in terms of Mathematica is not sufficient to make it an appropriate question for Mathematica.SE. – m_goldberg Feb 14 '16 at 23:04
  • 2
    I respectfully disagree. However, I admit I am relatively new and not quite understanding of the guidelines here. But, surely it is the math mathematica is using which is of interest here - not the math itself, which makes it a mathematica question? I plan on investigating the ideas Anton put forward and reporting back.Further, the solutions suggested could be the reason Mathematica gives different results to other math engines. – Tomi Feb 14 '16 at 23:23
  • 1
    Tomi, I won't vote to close, not yet, so that I might see what you find. But with respect to @m_goldberg's comment, as I understand SVD, one cannot say whether Matlab's or Mathematica's construction of PC/svd[[3]] is better. I think that it doesn't matter and that the 4th-17th columns can be ignored. The question of what math is used by Mathematica does not seem interesting in this case. Further, since the norm of the residual is less in Mathematica than in Matlab, one might argue that in this particular case, Mathematica's way is better. But that argument is not terribly convincing. – Michael E2 Feb 15 '16 at 01:08
  • FWIW, Mathematica's columns agree with NullSpace. Try NullSpace[N@Y] - Transpose[svd[[3]]][[4 ;; 17]]. One can also see that the 14 columns of Matlab's PC span the same space: MatrixRank[ Join[Transpose[svd[[3]]][[4 ;; 17]], Transpose[pc][[4 ;; 17]]]]. (See http://mathematica.stackexchange.com/a/24478 for importing Matlab computations into Mathematica.) – Michael E2 Feb 15 '16 at 11:40
  • @MichaelE2 If you feel like making your comments an answer let me know. – Mr.Wizard Feb 16 '16 at 16:56
  • 1
    @Mr.Wizard Thanks, I think the OP might get a better answer on datascience.SE or math.SE. I think an answer in terms of PCA, instead of linear algebra (which is all I can give) or internal implementation details (which may be unavailable), might be more satisfactory. So I don't think I would give an answer here. – Michael E2 Feb 16 '16 at 17:28
  • @Tomi, try this to centralize your matrix: countrydata // Query[{Identity, Query[All, Mean]} /* Replace[{mat_, mean_} :> Query[# - mean &][mat]]] – alancalvitti May 14 '18 at 23:50

0 Answers0