As exemplified by Jed Brown's answer to Costs of lookups versus calculations, using vectorized vs non-vectorized floating point operations results in much faster code. Many modern compilers claim that they can perform automatic vectorization. How do I which parts of my code are being successfully vectorized?
Asked
Active
Viewed 4,594 times
3 Answers
10
With the Intel compiler of any modern vintage, -O3 -vec-report3. Optimization level three guarantees that it's trying to vectorize, and the vector report will tell you what it's doing.
The GNU page on vectorization says that it's on by default at optimization level 3, but I can't find the equivalent of vec-report.
Bill Barth
- 10,905
- 1
- 21
- 39
-
Thanks for the quick response. I didn't know about -vec-report3. Do you have a compiler preference when it comes to this type of thing (automatic vectorization)? – Matthew Emmett Jun 13 '12 at 19:54
-
Intel compilers are really good, but only for intel chips. You just have to add all the right pragmas (#pragma ivdep is easiest) GCC 4.7 has gotten a lot better but looking over some code with a colleague it still has bugs (like no vectorization inside openmp pragmas). – aterrel Jun 13 '12 at 20:44
-
1I would advise double checking how the Intel compiler does with vectorization on AMD chips. I'm not 100% sure that the problems of old still exist. – Bill Barth Jun 13 '12 at 22:18
-
2@BillBarth Yes, still an issue. See the Optimization Notice (in many place, e.g. http://software.intel.com/sites/products/collateral/hpc/compilers/intel_linux_compiler_compatibility_with_gnu_compilers.pdf). AMD won the court battle requiring Intel to disclose that they are anti-competitive, not to make them stop being. http://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler#Criticism Agner Fog on workarounds: http://www.agner.org/optimize/blog/read.php?i=49 – Jed Brown Jun 14 '12 at 13:28
-
@JedBrown, sure SSE is still weird, but what's the story with AVX? Those links indicate that things should be fine (since both companies implement AVX), but I haven't tested it on a Bulldozer machine. – Bill Barth Jun 14 '12 at 18:07
-
"Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors." Sounds to me like they are committed to making slow code on all hardware not manufactured by Intel. A have not heard of any evidence that it can produce good code for non-Intel, and I've seen it generate extremely bad code for non-Intel (not restricted to numerical code). – Jed Brown Jun 14 '12 at 21:07
-
I can confirm that Intel 12.1 is inserting code in binaries that checks cpuids and refuses to run AVX code on an AMD Interlagos processor. I don't know if it should be able to support it, or if the binary would run without the cpuid check. – Bill Barth Jun 15 '12 at 19:13
8
Within the GNU compiler collection, you have the option -ftree-vectorizer-verbose=n where n is a number between 0 and 6 which will print information similar to icc/ifort.
Pedro
- 9,573
- 1
- 36
- 45
5
With GNU compilers, adding -Wa,-ahl=asm.s will dump the generated assembly code to asm.s.
With Intel compilers, adding -fcode-asm -Faasm.s will dump the generated code to asm.s.
You can then inspect the assembly code and look for vector float point operations.
Matthew Emmett
- 2,076
- 15
- 22
-
I completely agree that inspecting assembly output is the only reliable way to determine if code is actually vectorized. There is nothing that requires compilers to be honest about their claims to vectorize code. – Jeff Hammond Apr 13 '13 at 20:06