4

Consider I have a function, with one version that compiled to the virtual machine and one that compiled to C:

fcp1[C0_, {xmin_, xmax_}, {tmin_, tmax_, dt_}] :=
 Compile[{},
  Module[{xls, Nx, tls, Ct, cdiagT, cdiagh, dx},
   Nx = Length[C0];
   dx = N@(xmax - xmin)/(Nx - 1);
   tls = Range[tmin, tmax, dt] + dt/2.;
   cdiagT = -1./(2. dx^2);
   Do[
    cdiagh = cdiagT;
    , {t, tls}]]
  , CompilationOptions -> {"InlineExternalDefinitions" -> True, 
    "InlineCompiledFunctions" -> True}
  ]


fcp2[C0_, {xmin_, xmax_}, {tmin_, tmax_, dt_}] :=
 Compile[{},
  Module[{xls, Nx, tls, Ct, cdiagT, cdiagh, dx},
   Nx = Length[C0];
   dx = N@(xmax - xmin)/(Nx - 1);
   tls = Range[tmin, tmax, dt] + dt/2.;
   cdiagT = -1./(2. dx^2);
   Do[
    cdiagh = cdiagT;
    , {t, tls}]]
  , CompilationOptions -> {"InlineExternalDefinitions" -> True, 
    "InlineCompiledFunctions" -> True}, CompilationTarget -> "C"
  ]

Now if we compare their compilation speed (the time it take MMA to compile the function, not the time to execute the function), the C compilation is very slow:

Nxgrid = 2000;

Ct0 = Array[Exp[-5. #^2] &, Nxgrid, {-1., 1.}] // 
   Developer`ToPackedArray;

xRange = {-199.9`, 199.9`};

dt = 0.05515999116515042`;

Ntgrid = 20000;

Needs["CompiledFunctionTools`"]


CompilePrint@f1 == CompilePrint@f2
f1 = fcp1[Ct0, xRange, {0, dt*Ntgrid, dt}]; // AbsoluteTiming
f2 = fcp2[Ct0, xRange, {0, dt*Ntgrid, dt}]; // AbsoluteTiming

(*True*)
(* {0.000642, Null} *)
(* {14.959721, Null} *)

So why does the C compilation so much slower than the WVM version, and how to speed it up?

Update

MarcoB gave a good suggestion to look at the compilation time independent of Mathematica. So I tested the compilation:

The documentation says the CCompilerDriver will be automatically involked when compiling to C. And it indeed seems quite slow.

Needs["CCompilerDriver`"]    
file = Export[FileNameJoin[{$TemporaryDirectory, "fcp1.c"}], 
    f1]; // AbsoluteTiming
CreateObjectFile[{file}, "fcp1"]; // AbsoluteTiming
(* {0.505879, Null} *)
(* {7.194721, Null} *)

And the compiler CCompilerDriver involked is Clang in my system

DefaultCCompiler[]
(* CCompilerDriver`ClangCompiler`ClangCompiler *)

so I also tested it outside MMA:

Import["!clang -v 2>&1", "Text"]
Import["!clang -shared -o " <> 
    ToString@FileNameJoin[{$TemporaryDirectory, "fcp1.so"}] <> " " <> 
        ToString@FileNameJoin[{$TemporaryDirectory, "fcp1.c"}] <> " -I" <>
         ToString[
         FileNameJoin[{$InstallationDirectory, 
       "SystemFiles/IncludeFiles/C/"}]] <> " 2>&1", 
   "Text"]; // AbsoluteTiming

(* "Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.5.0 Thread model: posix" *)

(* {0.750312, Null} *)


Import["!nm " <> ToString@FileNameJoin[{$TemporaryDirectory, "fcp1.so"}], "Text"]

".....
0000000000013140 T _fcp1
0000000000017030 b _funStructCompile
0000000000017020 d _initialize
                 U dyld_stub_binder"

It looks like compile outside of MMA is very fast. So:

  1. Why does CCompilerDriver take long time to compile?
  2. CCompilerDriver takes about 7s in the last example, so why Compile take about 15s?
xslittlegrass
  • 27,549
  • 9
  • 97
  • 186
  • Let me see if I understand your question: you are asking about the time it takes to compile the function, not to execute it once it has been compiled, correct? – MarcoB Jul 22 '15 at 23:05
  • @MarcoB yes, that's correct. – xslittlegrass Jul 22 '15 at 23:09
  • Compilation to C requires a call to the external compiler/linker, and some disk writes. As far as I understand it, compilation to MVM happens in memory. Have you measured how long it takes your compiler to compile/link something to a DLL? That may end up being a significant part of the overhead you see in compilation to C – MarcoB Jul 23 '15 at 00:29
  • 1
    On my machine, Windows 10 with Visual Studio 2015 C compiler, f2 takes about 2 seconds. So maybe a different compiler will help you here. – RunnyKine Jul 23 '15 at 03:36
  • 2
    I don't really know why compilation is so comparatively slow. I think I have missed something though: how is this typically going to affect performance? Of course, you would not compile your function every time you evaluate it. That approach does not seem to have any great upsides, but it is likely to kill any advantage you get from the compilation in all but the most extreme of cases. – MarcoB Jul 23 '15 at 03:51
  • 1
    With Windows 8.1 and Visual Studio 2013 I am also getting 2 sec for f2. I guess its machine specific compiler issue. – PlatoManiac Jul 23 '15 at 09:06
  • For me, with MinGW GCC on Windows with Mathematica 9, it takes about 7.5 seconds the first time and 3.5 seconds the second time to compile f2. I think this is within the normal range. When you test Clang by itself, you are only compiling, not linking (I think; I've never used Clang). Linking may take considerable time and should certainly be included as well. – Oleksandr R. Jul 23 '15 at 09:58
  • @OleksandrR. Thanks. Do you know how can I test linking? – xslittlegrass Jul 23 '15 at 15:06
  • Use -o switch instead of -c in most compilers. – Oleksandr R. Jul 23 '15 at 15:11
  • @OleksandrR. But what should I link to? I tried clang -o fcp1 fcp1.c and it says "_main" not defined. – xslittlegrass Jul 23 '15 at 15:20
  • Normally there is also a switch that indicates you are trying to build a library rather than an executable. In GCC, it's -shared. But you would be better off to read the Clang manual than to ask me, because I've never used it. – Oleksandr R. Jul 23 '15 at 15:36
  • @OleksandrR. Thanks. I updated. It seems that linking is also fast. – xslittlegrass Jul 23 '15 at 16:21
  • @MarcoB The function I'm working on take function arguments. So I have to recompile each time I evaluate with a different arguments. – xslittlegrass Jul 24 '15 at 04:58
  • Way too late, but I think the problem here is that you enforce the definition of the huge array C0 to be inlined, although you only need its length. And I am not sure whether the Mathematica and the C compiler are able to optimize this out. I think the MTensor representing C0 has first to be instantiated at runtime before its size can be queried. And instantiation needs to store all its values in the program code. So a lot of useless stuff to let the C compiler chew on. – Henrik Schumacher Jan 03 '24 at 13:40

0 Answers0