Parallel Kernels and Hyperthreading

Question

I am using Eigenvalues a large number of times on a machine with 8 cores. When I run Eigenvalues one time without any parallel implementation, the CPU usage hovers around 50%. I suppose there is some parallelization built into the function.

Ideally, if I had enough memory, I would launch 8 cores and use ParallelMap: this gives 100% CPU usage. However, I quickly run out of memory.

I do have enough memory to launch two cores and run in parallel. However, now the CPU usage is 25%, so this is slower than simple sequential evaluation. It seems using ParallelMap has excluded the in-built parallelization of Eigenvalues.

Is there any way I can get 2 parallel cores to run Eigenvalues, each one taking advantage of the parallelization already built into the function? i.e. any way I can use the extra memory I have to get 100% out of my CPUs?

See this answer, in particular you may need to adjust MKLThreadNumber on the slave kernels. This may or may not result in any performance gain, note that Intel does not recommend the use of more threads than the number of physical cores. — ilian, Sep 16 '15 at 14:54
The most effective technique I found is to open another kernel in the front end. Then I am at 100% cpu usage. — user16316, Sep 16 '15 at 23:34
For Eigenvalues to use 8 hyperthreads, just set SetSystemOptions["ParallelOptions" -> "MKLThreadNumber" -> 8]; and that will increase the reported CPU utilization. Similarly, If you want to use two kernels in parallel, LaunchKernels[2]; ParallelEvaluate[ SetSystemOptions["ParallelOptions" -> "MKLThreadNumber" -> 4]]; will override the default behavior where each slave kernel is single-threaded. — ilian, Sep 17 '15 at 01:05
Your question reflects a common misconception about what SMT does and the mistaken idea that scheduler time corresponds to execution unit utilization. To put it briefly: Eigenvalues is working perfectly well without your additional attempts to parallelize it. If you measure the time taken to perform equivalent work, at best you will notice no difference, and more likely the attempt to use "100%" of the CPU will accomplish the task more slowly (i.e., less efficiently) than otherwise. — Oleksandr R., Sep 17 '15 at 01:05
I did as instructed: diagonalize 4 matrices around dimensions 500,000. In series required 1300 seconds. In parallel (2 for each kernel) 1050. I do not understand how having more cores in general would not make it more efficient to parallelize. — user16316, Sep 18 '15 at 14:52
Do you actually have 8 cores, then, or 4 cores with 2 threads able to run simultaneously on each (i.e. SMT, "HyperThreading")? This makes a big difference and since you have mentioned HyperThreading in your question title I assumed the latter. But maybe it is actually the former situation. Also, if the matrices consist of something other than machine reals, the performance characteristics will be different. And if the matrices are small and independent, then perhaps it will be more efficient to manually parallelize it, because the parallelization will not be as effective for small matrices. — Oleksandr R., Sep 20 '15 at 16:06
It also seems that Eigenvalues will not make the best parallelization choices by default. On my computer (with 4 cores), even for machine real matrices, there is the strange observation that 4 MKL threads results in slightly better parallelization than 2, but 3 threads is noticeably superior to either of these. In general one can expect that the result may be constrained as much by available memory bandwidth as it is by the resources of the CPU. — Oleksandr R., Sep 20 '15 at 16:24

Parallel Kernels and Hyperthreading

0 Answers0