I am using Eigenvalues a large number of times on a machine with 8 cores. When I run Eigenvalues one time without any parallel implementation, the CPU usage hovers around 50%. I suppose there is some parallelization built into the function.
Ideally, if I had enough memory, I would launch 8 cores and use ParallelMap: this gives 100% CPU usage. However, I quickly run out of memory.
I do have enough memory to launch two cores and run in parallel. However, now the CPU usage is 25%, so this is slower than simple sequential evaluation. It seems using ParallelMap has excluded the in-built parallelization of Eigenvalues.
Is there any way I can get 2 parallel cores to run Eigenvalues, each one taking advantage of the parallelization already built into the function? i.e. any way I can use the extra memory I have to get 100% out of my CPUs?
MKLThreadNumberon the slave kernels. This may or may not result in any performance gain, note that Intel does not recommend the use of more threads than the number of physical cores. – ilian Sep 16 '15 at 14:54Eigenvaluesto use 8 hyperthreads, just setSetSystemOptions["ParallelOptions" -> "MKLThreadNumber" -> 8];and that will increase the reported CPU utilization. Similarly, If you want to use two kernels in parallel,LaunchKernels[2]; ParallelEvaluate[ SetSystemOptions["ParallelOptions" -> "MKLThreadNumber" -> 4]];will override the default behavior where each slave kernel is single-threaded. – ilian Sep 17 '15 at 01:05Eigenvaluesis working perfectly well without your additional attempts to parallelize it. If you measure the time taken to perform equivalent work, at best you will notice no difference, and more likely the attempt to use "100%" of the CPU will accomplish the task more slowly (i.e., less efficiently) than otherwise. – Oleksandr R. Sep 17 '15 at 01:05Eigenvalueswill not make the best parallelization choices by default. On my computer (with 4 cores), even for machine real matrices, there is the strange observation that 4 MKL threads results in slightly better parallelization than 2, but 3 threads is noticeably superior to either of these. In general one can expect that the result may be constrained as much by available memory bandwidth as it is by the resources of the CPU. – Oleksandr R. Sep 20 '15 at 16:24