I’ve been working with SVD – singular value decomposition. Things weren’t working as expected. Thus, I looked over to Matlab and executed the following code:
X=[105 103 103 66;245 227 242 267;685 803 750 586; 147 160 122 93; 193 235 184 209; 156 175 147 139; 720 874 566 1033; 253 265 171 143; 488 570 418 355; 198 203 220 187; 360 365 337 334; 1102 1137 957 674; 1472 1582 1462 1494; 57 73 53 47; 1374 1256 1572 1506;
375 475 458 135; 54 64 62 41];
data = X; [M,N] = size(data);
mn = mean(data,2);
data = data - repmat(mn,1,N);
Y = data' / sqrt(N-1);
[u,S,PC] = svd(Y);
S=diag(S);V=S.*S;
signals=PC'*data;
It results in a sensible PCA diagram for the data. Now, if I run it in Mathematica
countrydata = {{105, 103, 103, 66}, {245, 227, 242, 267}, {685, 803,
750, 586}, {147, 160, 122, 93}, {193, 235, 184, 209}, {156, 175,
147, 139}, {720, 874, 566, 1033}, {253, 265, 171, 143}, {488, 570,
418, 355}, {198, 203, 220, 187}, {360, 365, 337, 334}, {1102,
1137, 957, 674}, {1472, 1582, 1462, 1494}, {57, 73, 53,
47}, {1374, 1256, 1572, 1506}, {375, 475, 458, 135}, {54, 64, 62,
41}};
datamean = {{Mean[countrydata[[1]]]}};
Print[datamean];
For[i = 2, i <= Dimensions[countrydata][[1]], i++,
datamean = Append[ datamean, {Mean[countrydata[[i]]]}]
];
datashift = countrydata;
For[i = 1, i <= Dimensions[countrydata][[1]], i++,
For[j = 1, j <= Dimensions[countrydata[[1]]][[1]], j++,
datashift[[i]][[j]] = countrydata[[i]][[j]] - datamean[[i]][[1]];]];
Y = Transpose[datashift]/(Sqrt[(Dimensions[countrydata[[1]]][[1]]) - 1]);
svd = SingularValueDecomposition[N[Y, 100]];
S = Diagonal[svd[[2]]];
V = S*S;
signal = Transpose[-svd[[3]]].datashift;
Print[MatrixForm[N[Y, 10]]];
Show[ListPlot[{{{signal[[1, 1]], 0}}, {{signal[[1, 2]],
0}}, {{signal[[1, 3]], 0}}, {{signal[[1, 4]], 0}}},
PlotRange -> All,
PlotLegends -> {"England", "Wales", "Scotland", "N. Ireland" },
PlotLabel -> "First two principal components for data",
FrameLabel -> {"z1", "z2"},
RotateLabel -> False,
Axes -> False,
Frame -> {{True, False}, {True, False}},
ImageSize -> {300, 300}]]
I get the following information for the V matrix in the singular value decomposition. This is the PC matrix in Matlab and svd[[3]] in Mathematica. The other two matrices returned in Singular Value Decomposition svd[[1]] and svd[[2]] exactly match mathematica.
Matlab PC: -0.0569553797856850 0.0479276281346853 -0.258916658336122 -0.0844149825250836 -0.00519362266004777 -0.0376209828394020 0.401402060296248 -0.151849941562302 -0.243593728990274 -0.0268862325367470 -0.0364882691115938 -0.632640897872237 -0.0477028583736490 -0.0261877559085335 0.232244140472894 -0.463968167976707 -0.0296502010879939
Mathematica svd[[3]]: 0.05695537979 -0.04792762813 0.2589166583 0.08441498253 0.005193622660 0.03762098284 -0.4014020603 0.1518499416 0.2435937290 0.02688623254 0.03648826911 0.6326408979 0.04770285837 0.02618775591 -0.2322441405 0.4639681680 0.02965020109
This is all and well. It matches - apart from the sign. Now, if we go to the 4th column
Matlab gives: -0.400748417916043 -0.150602373645460 -0.676298033445679 -0.0733743113594902 0.165183462293022 0.0375841868716541 -0.150666145597042 -0.103524165890781 0.124583909064645 0.0312459639649717 -0.0235673440745193 -0.150688757752460 0.322911600549900 0.0339945596788104 -0.148559469028165 0.351916694927059 0.0271722903782781
Mathematica gives: -0.9979609973 -0.0009722258735 0.001277619728 0.004860699968 -0.004240772408 0.0005876914821 -0.03923679062 0.01152344604 0.008614386639 0.0005409596602 0.002511166093 0.04306340746 -0.009183918074 -0.00001644119820 -0.008341374583 0.01635552490 0.0005560180332
Which is massively different. This trend continues for the rest of the lines. Why does there exist this disparity between the two Math Engines?
I thought it might be something to do with the domain matrices and U and V in the singular value decomposition are not unique, so it is difficult to compare results from different math engines. From another stack overflow post. But, this wouldn't explain the massive differences - or the similarity in the first few columns.
Note, the graphs returned are the same. Just the svd[[3]] and PC matrices are different.
Matlab reference for SVD: http://uk.mathworks.com/help/matlab/ref/svd.html Based on tutorial for PCA: http://www.sdss.jhu.edu/~szalay/class/2015/SignalProcPCA.pdf
Norm[svd[[1]].svd[[2]].Transpose[svd[[3]]] - Y]. 2 Look atTensorRank[Y]. 3 How the results from the two systems compare with low rank SVD? E.g.SingularValueDecomposition[Y, 3]. – Anton Antonov Feb 14 '16 at 22:17PC/svd[[3]]is better. I think that it doesn't matter and that the 4th-17th columns can be ignored. The question of what math is used by Mathematica does not seem interesting in this case. Further, since the norm of the residual is less in Mathematica than in Matlab, one might argue that in this particular case, Mathematica's way is better. But that argument is not terribly convincing. – Michael E2 Feb 15 '16 at 01:08NullSpace. TryNullSpace[N@Y] - Transpose[svd[[3]]][[4 ;; 17]]. One can also see that the 14 columns of Matlab's PC span the same space:MatrixRank[ Join[Transpose[svd[[3]]][[4 ;; 17]], Transpose[pc][[4 ;; 17]]]]. (See http://mathematica.stackexchange.com/a/24478 for importing Matlab computations into Mathematica.) – Michael E2 Feb 15 '16 at 11:40countrydata // Query[{Identity, Query[All, Mean]} /* Replace[{mat_, mean_} :> Query[# - mean &][mat]]]– alancalvitti May 14 '18 at 23:50