9

I'm looking for a more efficient way to generate an image with a large number of randomly coloured Gaussians. Here's the code I'm using:

Remove["Global`*"]
(* Don't care about underflow of Exp *)
SetSystemOptions["CheckMachineUnderflow" -> False];

(* The minimum allowable determinant of the covariance matrix *) detminscale = 0.1;

(* Make an association with the gaussian parameters. Reject with a gaussian with alpha value zero if det constraint is violated *) MakeGaussian[c_, a_, μ, Σ] := If[Det[Σ] < detminscale || ! PositiveDefiniteMatrixQ[Σ], <|"c" -> {0, 0, 0}, "a" -> 0.0, "μ" -> μ, "Σi" -> IdentityMatrix[2]|>, <|"c" -> Clip[c, {0, 1}], "a" -> Clip[a, {0, 1}], "μ" -> μ, "Σi" -> Inverse[Σ]|> ]

(* Evaluates the gaussian on the WxH grid, returning a 2D grid of RGBA tuples ) cfEvalGaussian = Compile[{{c, _Real, 1}, {a, _Real}, {m, _Real, 1}, {s, _Real, 2}, {w, _Integer}, {h, _Integer}}, Table[ Append[c, aClip[With[{X = ({i, j} - m)}, Exp[-X.s.X]], {0, 1}]], {i, 1, h}, {j, 1, w} ], CompilationTarget -> "C"];

(* Evaluates the gaussian and converts to an RGBA image *) EvalGaussian[g_, w_, h_] := Image[cfEvalGaussian[g["c"], g["a"], g["μ"], g["Σi"], w, h], ColorSpace -> "RGB"]

(* We evaluate all gaussians in parallel, the quiet is to silence underflow warnings. TODO: other kernels don't respect the CheckMachineUnderflow !? *) EvaluateAllGaussians[gs_, w_, h_] := ParallelTable[Quiet@EvalGaussian[g, w, h], {g, gs}]

ComposeRGBA[i1_, i2_] := ImageCompose[Image[i1], Image[i2]]

(* Starting from black, compose all the gaussian images over each other *) ComposeGaussians[gres_, w_, h_] := Fold[ComposeRGBA, ConstantArray[{0, 0, 0, 1}, {w, h}], gres]

random 2D gaussians

And this is how I'm testing it:

(* Generates a random positive semi-definite matrix, rejects bad det scales *)
randomGOMtx[scaler_] := Module[{M},
  Until[detminscale < Det[M],
   M = RandomVariate[
     GaussianOrthogonalMatrixDistribution[RandomReal[scaler], 2]]
   ];
  Return[M]
  ]

(* Test 500 random colour gaussians at 1024x1024 resolution ) With[{dmax = 1024, n = 500}, colours = RandomReal[1, {n, 3}]; alphas = RandomReal[{0.2, 1}, n]; means = RandomReal[{1, dmax}, {n, 2}]; matrices = Table[randomGOMtx[dmax1.0], n]; gaussians = MapThread[MakeGaussian, {colours, alphas, means, matrices}]; evals = EvaluateAllGaussians[gaussians, dmax, dmax]; ComposeGaussians[evals, dmax, dmax] ]

I am also looking to try out the 3D case too (with Image3D), but the 2D case takes 2 minutes to produce an image, and 3d is likely to be much slower.


Update: This is the 3D case and it's painfully slow. I've also improved the matrix generation by using rotation matrices.

MakeGaussian[c_, a_, μ_, Σi_] :=
 <|"c" -> Clip[c, {0, 1}], "a" -> Clip[a, {0, 1}], "μ" -> μ,
  "Σi" -> Σi|>

cfEvalGaussian = Compile[{{c, _Real, 1}, {a, _Real}, {m, _Real, 1}, {s, _Real, 2}, {w, _Integer}, {h, _Integer}, {d, _Integer}}, Table[ Append[c, a*Clip[With[{X = ({i, j, k} - m)}, Exp[-X . s . X]], {0, 1}]], {i, 1, h}, {j, 1, w}, {k, 1, d} ], CompilationTarget -> "C"];

EvalGaussian[g_, w_, h_, d_] := cfEvalGaussian[g["c"], g["a"], g["μ"], g["Σi"], w, h, d]

EvaluateAllGaussians[gs_, w_, h_, d_] := ParallelTable[Quiet@EvalGaussian[g, w, h, d], {g, gs}]

cfComposeRGBA = Compile[{{i1, _Real, 4}, {i2, _Real, 4}}, With[{w = Length[i1], h = Length[i1[[1]]], d = Length[i1[[1, 1]]]}, Table[ With[{c1 = i1[[i, j, k]], c2 = i2[[i, j, k]], a = i2[[i, j, k, 4]]}, c2a + c1(1 - a) ], {i, w}, {j, h}, {k, d}] ], CompilationTarget -> "C"];

randomGOMtx[scaler_] := With[{P = RotationMatrix[{{1, 0, 0}, RandomPoint[Sphere[]]}]}, Inverse[P] . DiagonalMatrix[1/RandomReal[{.1, scaler}, 3]] . P]

With[{dmax = 256, n = 50}, colours = RandomReal[1, {n, 3}]; alphas = RandomReal[{0.2, 1}, n]; means = RandomReal[{1, dmax}, {n, 3}]; matrices = Table[randomGOMtx[dmax*2], n]; gaussians = MapThread[MakeGaussian, {colours, alphas, means, matrices}]; evals = EvaluateAllGaussians[gaussians, dmax, dmax, dmax]; Image3D[ComposeGaussians[evals, dmax, dmax, dmax], ColorSpace -> "RGB"] ]

3d gaussian cloud

flinty
  • 25,147
  • 2
  • 20
  • 86
  • I don't find two minutes a long time. How much time does it take for 3D? – user64494 Nov 03 '23 at 13:23
  • 1
    @user64494 haven't finished that yet, but in C++ this takes seconds. – flinty Nov 03 '23 at 14:40
  • Me neither. That's why erased the comment ^^ – Henrik Schumacher Nov 03 '23 at 16:24
  • @flinty: This forum is on Mathematica. – user64494 Nov 03 '23 at 16:25
  • 1
    @user64494 Yes ... I know (?). My question is about improving the Mathematica code which seems unreasonably slow, to get it closer to C++ / native performance. I'm also looking at calling OpenCL from Mathematica as a last resort. – flinty Nov 03 '23 at 16:34
  • @flinty: Using a known C++ code in Mathematica is a reasonable solution. – user64494 Nov 03 '23 at 17:09
  • @flinty A bit on and off topic - during the ongoing WTC I heard some neat GLSL trick is coming, which can be used as some kinds of "universal" render pass for 2D and 3D graphics. – Silvia Nov 03 '23 at 17:36
  • @flinty If you have a C++ implementation already, I can help you with the LibraryLink code (which is really mostly boilerplate). And I also have a bit of experience with Metal (Apple's follow-up for OpenCL). I cannot do OpenCL because my machine does not want to run it anymore. – Henrik Schumacher Nov 03 '23 at 17:52
  • Got a trick to make it under 2 sec. Will post as answer after a sleep! – Silvia Nov 03 '23 at 19:29
  • 2
    Because it appears you are not making statistical inferences but rather producing a figure, why not just generate a single random sample from a bivariate (or trivariate) normal? Then you can transform those data points for any mean and covariance matrix you want. – JimB Nov 04 '23 at 03:59
  • @JimB Exactly what I did! :D – Silvia Nov 04 '23 at 07:00
  • @HenrikSchumacher I cannot do OpenCL because my machine does not want to run it anymore. I'm getting the same problem in 13.3.1.0 - Needs["OpenCLLink`"] then run OpenCLQ[] and it crashes the kernel - doesn't even reply False. I've filed a bug report. – flinty Nov 06 '23 at 13:16
  • Do you run this on a Mac? Apple has discontinued OpenCL in favor of Metal. – Henrik Schumacher Nov 06 '23 at 13:17
  • @HenrikSchumacher No, Windows 10. CUDALink works which is even more weird. But I dislike posting CUDA here because OpenCL works on AMD and nvidia cards, and it's more helpful to others who don't have expensive nvidia hardware. I even updated the driver and it still doesn't work. I have a lot of Mathematica+OpenCL code I've built up over the years and now it sadly doesn't work! – flinty Nov 06 '23 at 13:18
  • How, yeah. Then you should definitely contact Wolfram Support. (Btw., I also hate being locked-in into CUDA.) – Henrik Schumacher Nov 06 '23 at 14:07
  • 1
    @HenrikSchumacher I discovered OpenCLQ[] crashes the kernel because I had both the Intel oneAPI DPC C++ compiler installed and Visual Studio 2022. When I uninstalled the Intel one, OpenCLQ[] works and returns True! I informed Wolfram. – flinty Nov 07 '23 at 14:51

3 Answers3

7

You use MapThread with a CompiledFunction. But CompiledFunctions have their own threading mechanism, that may also exploit parallel threads. Here is my proposition for you:

cfEvalGaussianThread = 
  Compile[{{c, _Real, 1}, {a, _Real}, {m, _Real, 1}, {s, _Real, 2}, {w, _Real}, {h, _Real}},
   Block[{cnew, s11, s12, s22, r2, val},
    s11 = Compile`GetElement[s, 1, 1];
    s12 = Compile`GetElement[s, 1, 2];
    s22 = Compile`GetElement[s, 2, 2];
    Table[
     r2 = -(x s11 x + 2. x s12 y + y s22 y);
     val = Max[0., Min[a, a Exp[r2]]];
     {Compile`GetElement[c, 1], Compile`GetElement[c, 2], Compile`GetElement[c, 3], val}
     , {x, 1. - m[[1]], h - m[[1]], 1.}, {y, 1. - m[[2]], w - m[[2]], 1.}
     ]
    ],
   CompilationTarget -> "C",
   RuntimeAttributes -> {Listable},
   Parallelization -> True,
   RuntimeOptions -> "Speed"
   ];

On my machine

dataThread = cfEvalGaussianThread[
   gaussians[[All, "c"]],
   gaussians[[All, "a"]],
   gaussians[[All, "\[Mu]"]],
   gaussians[[All, "\[CapitalSigma]i"]],
   w, h
   ];

is 10 faster than

data = ParallelTable[ cfEvalGaussian[g["c"], g["a"], g["\[Mu]"], g["\[CapitalSigma]i"], w, h], {g, gaussians}];

It does not convert to Image, though. And that is the new bottleneck.

The parallelization in Compile is suboptimal, and single precision would suffice for you purpose. So a C++ implementation might be a good any idear. It is quite easy to create a function callable from Mathematica by using LibraryLink.

Henrik Schumacher
  • 106,770
  • 7
  • 179
  • 309
7

Basically the idea is to draw in vector form, then exploit FrontEnd's ability to Rasterize the result.

First we make the base primitive corresponding to a "standard Gaussian" shape.

Note 1: To reduce visual artifect, the mesh needs to be based on concentric circles.

Note 2: For as many as 500 Gaussian primitives, we probably don't need such a fine detailed mesh, so if we reduce the number of r here, we should see an extra performance boost at the final rasterization step.

basePts = Module[{r = N@Subdivide[5], Δr, Δl, n, θ0, θmin = 20 °}
            , Δr = Differences@r
            ; Δl = 2 Δr Tan[θmin]
            ; n = Ceiling[(2 π Rest[r])/Δl]
            ; θ0 = π/Most[n] // Prepend[0] // N // Accumulate
            ; MapThread[CirclePoints[{#1, #2}, #3] &, {Rest[r], θ0, n}] // Prepend[{{0., 0.}}]
            ];
baseMesh = Join @@ basePts // DelaunayMesh

base mesh

Then we give each vertices correct intensity, which will later be used to color them:

baseVtx = MeshCoordinates[baseMesh];
basePoly = Polygon[MeshCells[baseMesh, 2][[;; , 1]]];
baseIntense = basePts[[;; , 1]]^2 . {1, 1} // 
     Thread[Rule[#, Rescale[Exp[-5 #]]]] & //
    Nearest[#, baseVtx^2 . {1, 1}][[;; , 1]] & // 
   Developer`ToPackedArray;

Now we have the primitive, which can be shown like this:

GraphicsComplex[baseVtx, basePoly, 
      VertexColors -> Map[GrayLevel[1, #] &, baseIntense]
      ] //
     Graphics[#, ImageSize -> 40, Background -> Black] &

standard Gaussian

The rest work is just to geometrically transform and colorize it. For that we define two helper functions:

ClearAll[genTF, genCF]

genTF[regionsize_, scale_] := RightComposition[ ScalingTransform@RandomPoint@Cuboid[{.1, .1}, scale {1, 1}] , RotationTransform@RandomReal[2 π] , TranslationTransform@RandomPoint@Cuboid[regionsize {-1, -1}, regionsize {1, 1}] ]

genCF[] :=RandomColor[RGBColor[,,_]] // Inactive[Function][ Append[#, RandomReal[{.1^(1/2), 1}]^2 Inactive[Slot][1]] ] & // Activate

Do the plot, 50 Gaussian primitives take a bit more than 1 second:

Graphics[{
     Function[{cf, tf}
       , GraphicsComplex[tf@baseVtx, {EdgeForm[], basePoly}, VertexColors -> Map[cf, baseIntense]]
       ] // MapThread[#
        , Table[{genCF[](*GrayLevel[1,#]&*), genTF[5, 4]}, 50]//Transpose
        , 1] &
     }, Background -> Black] //
   Rasterize[#, RasterSize -> 300, Background -> Black] & // 
   AbsoluteTiming

50 Gaussian

The result should look better (and probably more correct) if we consider gamma correction (for example, consider the sRGBGamma function from this answer):

50 Gaussian with gamma correction


And generating 500 Gaussian on 1024 size canvas takes only 8 seconds:

500 Gaussian

Silvia
  • 27,556
  • 3
  • 84
  • 164
  • How would you propose extending this to the 3D case? I can see how the transformation of a sphere might work, but we don't have the VertexColor blending for tetrahedra. – flinty Nov 06 '23 at 10:44
  • @flinty Unfortunately I don't think my approach here can be extended to 3D case. But it's possible an approximate solution can be done through tricks similar to that in my other post. – Silvia Nov 06 '23 at 13:03
  • @flinty On the second thought, maybe I misunderstood when you meant by "3D case"? Did you mean rasterized but in 3D -- a 3-dimensional array like Image3D? – Silvia Nov 06 '23 at 13:38
  • yes Image3D or something I scan scroll around like in the other post you linked, but also doesn't need to use the Image3D interface, could just generate an image from a projection I suppose. – flinty Nov 06 '23 at 15:37
  • @flinty Then I think it should do the job if you Inset planar Images in a Graphics3D like my other post. – Silvia Nov 07 '23 at 05:38
5

The 3D Case takes just a few seconds using OpenCL. It was a real pain getting OpenCLLink to work. I also improved the alpha blending.

3d gaussians

Remove["Global`*"]
Needs["OpenCLLink`"];
Assert[OpenCLQ[]];
SetSystemOptions["CheckMachineUnderflow" -> False];

kernelGaussian3D = "__kernel void gaussian3D( __global float4 * out, __constant float4 * colour, __constant float4 * mu, __constant float4 * mtx, int dim) { int index = get_global_id(0); { int z = index % dim; int y = ((index - z)/dim)%dim; int x = (index - z - dimy)/(dimdim); float3 p = (float3)(x,y,z) - mu[0].xyz;

float3 r;
r.x = dot(mtx[0].xyz, p);
r.y = dot(mtx[1].xyz, p);
r.z = dot(mtx[2].xyz, p);

float a = colour[0].w * exp(-dot(p, r));

float3 c1 = out[index].xyz;
float3 c0 = colour[0].xyz;
float a1 = out[index].w;
float a0 = a * colour[0].w;

float aM = (1.f - a0)*a1+a0 + 0.00000001f;
float4 result;
result.xyz = ((1.f-a0)*a1*c1+a0*c0)/aM;
result.w = aM;
out[index] = result;

} }";

clGaussian3D = OpenCLFunctionLoad[kernelGaussian3D, "gaussian3D", {{"Float[4]", 1, "InputOutput"}, {"Float[4]", 1, "Input"}, {"Float[4]", 1, "Input"}, {"Float[4]", 1, "Input"}, "Integer32"}, {8, 8}];

randomGOMtx[scaler_] := With[{P = RotationMatrix[{{1, 0, 0}, RandomPoint[Sphere[]]}]}, Inverse[P] . DiagonalMatrix[1/RandomReal[{.1, scaler}, 3]] . P]

MakeGaussian[c_, μ, Σi] := <|"c" -> c, "μ" -> μ, "Σi" -> Σi|> With[{dmax = 128, n = 50}, colours = RandomVariate[ UniformDistribution[{{0, 1}, {0, 1}, {0, 1}, {0.5, 1}}], n]; means = RandomReal[{1., dmax}, {n, 3}]; matrices = Table[randomGOMtx[dmax*1], n]; gaussians = MapThread[MakeGaussian, {colours, means, matrices}]; mem = OpenCLMemoryLoad[ Flatten@ConstantArray[{1., 1., 1., 0.}, dmax^3], "Float"]; Do[ clGaussian3D[mem, g["c"], Append[g["μ"], 0], Flatten[Append[#, 0] & /@ g["Σi"]], dmax, {dmax^3}]; , {g, gaussians}]; im3d = Image3D[ ArrayReshape[ OpenCLMemoryGet[mem] , {dmax, dmax, dmax, 4}], "Float", ColorSpace -> "RGB", Axes -> True]; OpenCLMemoryUnload[mem]; im3d ]

The 2D case is below. Virtually instantaneous!

kernelGaussian2D = "__kernel void gaussian2D(
__global float4 * out,
__constant float4 * colour,
__constant float2 * mu,
__constant float2 * mtx,
int dim)
{
int index = get_global_id(0);
{
    int x = index % dim;
    int y = ((index - x)/dim);
    float2 p = (float2)(x,y) - mu[0].xy;
float2 r;
r.x = dot(mtx[0].xy, p);
r.y = dot(mtx[1].xy, p);

float a = colour[0].w * exp(-dot(p, r));


float3 c1 = out[index].xyz;
float3 c0 = colour[0].xyz;
float a1 = out[index].w;
float a0 = a * colour[0].w;

float aM = (1.f - a0)*a1+a0 + 0.00000001f;
float4 result;
result.xyz = ((1.f-a0)*a1*c1+a0*c0)/aM;
result.w = aM;
out[index] = result;

} }"; clGaussian2D = OpenCLFunctionLoad[kernelGaussian2D, "gaussian2D", {{"Float[4]", 1, "InputOutput"}, {"Float[4]", 1, "Input"}, {"Float[2]", 1, "Input"}, {"Float[2]", 1, "Input"}, "Integer32"}, {8, 8}]; MakeGaussian[c_, μ, Σi] := <|"c" -> Clip[c, {0, 1}], "μ" -> μ, "Σi" -> Σi|>

randomGOMtx[scaler_] := With[{P = RotationMatrix[{{1, 0}, RandomPoint[Circle[]]}]}, Inverse[P] . DiagonalMatrix[1/RandomReal[{.1, scaler}, 2]] . P]

With[{dmax = 512, n = 100}, colours = RandomVariate[ UniformDistribution[{{0, 1}, {0, 1}, {0, 1}, {0.5, 1}}], n]; means = RandomReal[{1., dmax}, {n, 2}]; matrices = Table[randomGOMtx[dmax*1], n]; gaussians = MapThread[MakeGaussian, {colours, means, matrices}]; mem = OpenCLMemoryLoad[ Flatten@ConstantArray[{1., 1., 1., 0.}, dmax^2], "Float"]; Do[ clGaussian2D[mem, g["c"], g["μ"], Flatten@g["Σi"], dmax, {dmax^2}]; , {g, gaussians}]; im = Image[ ArrayReshape[ OpenCLMemoryGet[mem] , {dmax, dmax, 4}], "Float", ColorSpace -> "RGB"]; OpenCLMemoryUnload[mem]; im ]

2d gaussians

flinty
  • 25,147
  • 2
  • 20
  • 86
  • I think that the following might save you some flops: float3 p = (float3)(x,y,z) - mu[0].xyz; float3 r = { dot(mtx[0].xyz, p), dot(mtx[1].xyz, p), dot(mtx[2].xyz, p) }; float a = colour[0].w * exp(-dot(p, r)); (It computes the difference p - mu[0].xyz just once. – Henrik Schumacher Nov 14 '23 at 13:04
  • @HenrikSchumacher thanks, added. – flinty Nov 14 '23 at 14:55