Jacobian for parallelised FindRoot with multiple variables

Question

Michael E2 explains how to use parallel processing to speed up multiple FindRoot calculation with the same function in https://mathematica.stackexchange.com/a/163295/40271.

An example of some code which does this is

f[x_] := x^x + (4 - x)^(4 - x) - 20;
df0[x_?VectorQ] = D[f[x], x];
df[x_?VectorQ] := DiagonalMatrix@SparseArray@df0[x];
params = RandomReal[{2, 3}, 10];
FindRoot[f[x], {x, params}, Jacobian :> df[x]]

I would now like to extend this to work in more than one variable. A syntax to do this is quite simple;

f[x_, y_] := {x^x + y^y - 20, x + y - 4}
inits = RandomReal[{2, 3}, {2, 10}];
FindRoot[f[x,y], {{x, inits[[1]]}, {y, inits[[2]]}}]

However I can't find the right syntax to add in an explicit Jacobian. My example code is

f[x_, y_] := {x^x + y^y - 20, x + y - 4}
df0[x_, y_] = Outer[D, f[x, y], {x,y}];
df[x_, y_] := df0[x, y]
inits = RandomReal[{2, 3}, {2, 10}];
Block[{x, y}, 
  FindRoot[f[x, y], {{x, inits[[1]]}, {y, inits[[2]]}}, 
   Jacobian :> df[x, y]]]

And I get an error that The Jacobian is not a matrix of numbers at {x,y} = inits. So I try instead

f[x_, y_] := {x^x + y^y - 20, x + y - 4}
df0[x_, y_] = Outer[D, f[x, y], {x,y}];
SetAttributes[df, Listable];
df[x_, y_] := df0[x, y]
inits = RandomReal[{2, 3}, {2, 10}];
Block[{x, y}, 
  FindRoot[f[x, y], {{x, inits[[1]]}, {y, inits[[2]]}}, 
   Jacobian :> df[x, y]]]

Which produces the same error, however in this case if I evaluate df@@inits, I do indeed get a list of matrices of numbers. So I am confused about this.

I have tried also with explicit ?VectorQ, like so;

f[x_?VectorQ, y_?VectorQ] := {x^x + y^y - 20, x + y - 4}
df0[x_?VectorQ, y_?VectorQ] = 
  Outer[D, {x^x + y^y - 20, x + y - 4}, {x, y}];
df[x_?VectorQ, y_?VectorQ] := df0[x, y];
inits = RandomReal[{2, 3}, {2, 10}];
FindRoot[f[x, y], {{x, inits[[1]]}, {y, inits[[2]]}}, 
 Jacobian :> df[x, y]]

And this doesn't work either.

Who knows how to do this?

You can mark a "block of code" by indenting each line with 4 spaces. This can easily be done by selecting the code portion and pressing Ctrl+K. — halirutan, Mar 02 '18 at 12:41

Henrik Schumacher · Accepted Answer · 2018-03-16T11:33:05.407

2

The major problem is to have a fast way to obtain a block diagonal matrix. Moreover, there is a problem with the vectorization of df. In the following I skip the vectorization part (for complicated functions in several variables, vectorization is not always more performant).

xx = Table[Compile`GetElement[x, i], {i, 1, 2}];
f = x \[Function] Evaluate[N[{xx[[1]]^xx[[1]] + xx[[2]]^xx[[2]] - 20, xx[[1]] + xx[[2]] - 4}]];
Df = x \[Function] Evaluate[N[D[f[xx], {xx, 1}]]];

Regarding the block diagonal matrix, I compute its column indices and row pointers by hand and store them for later use in DF.

m = Length[xx];
n = 10000;    
ci = Flatten[
   Transpose[{Partition[Partition[Range[m n], 1], m]}[[ConstantArray[ 1, m]]]], 
  2];
rp = Range[0, m m n, m];

F and DF are sort of wrapper functions that constitute the high dimensional system. I use BlockMap in order to apply f and Df to flat list of length 2 n. This is actually where the block diagonal matrix is produced.

F = X \[Function] Flatten@(BlockMap[f, X, m]);
DF = X \[Function] SparseArray @@ {Automatic, {m n, m n}, 
     0., {1, {rp, ci}, Flatten[BlockMap[Df, X, m]]}};

For speeding up even further (and as substitute for vectorization), I also compile f and Df and create wrapper functions for them.

With[{code = f[xx]},
  cf = Compile[{{x, _Real, 1}},
    code,
    CompilationTarget -> "C",
    RuntimeAttributes -> {Listable},
    Parallelization -> True,
    RuntimeOptions -> "Speed"
    ]
  ];
With[{code = Df[xx]},
  cDf = Compile[{{x, _Real, 1}},
    code,
    CompilationTarget -> "C",
    RuntimeAttributes -> {Listable},
    Parallelization -> True,
    RuntimeOptions -> "Speed"
    ]
  ];
cF = X \[Function] cf[Partition[Abs[X], m]];
cDF = X \[Function] SparseArray @@ {Automatic, {m n, m n}, 0., {1, {rp, ci}, Flatten[cDf[Partition[Abs[X], m]]]}};

Now, we have at least 4 ways to solve the systems; let's try them out and compare! (I use Quiet to suppress some errors that get produced in the compiled function, probalby during the line search...)

X = X0 = RandomReal[{2, 3}, {n m}];

Y = Map[x \[Function] FindRoot[f, {{x}}, Jacobian -> Df], 
    Partition[X0, m]]; // AbsoluteTiming // First
cY = Quiet[Map[x \[Function] FindRoot[cf, {{x}}, Jacobian -> cDf], 
     Partition[X0, m]]]; // AbsoluteTiming // First
X = FindRoot[F, {X0}, Jacobian -> DF][[1]]; // AbsoluteTiming // First
cX = Quiet[FindRoot[cF, {X0}, Jacobian -> cDF][[1]]]; // AbsoluteTiming // First

3.29515

2.45185

1.19915

0.353124

Residuals:

Max[Abs[F[Flatten[Y]]]]
Max[Abs[F[Flatten[cY]]]]
Max[Abs[F[X]]]
Max[Abs[F[cX]]]

8.52651*10^-14

8.52651*10^-14

3.55271*10^-15

3.55271*10^-15

And the differences are:

Max[Abs[Flatten[X] - Flatten[Y]]]
Max[Abs[Flatten[X] - Flatten[cY]]]
Max[Abs[Flatten[X] - Flatten[cX]]]

2.88658*10^-15

2.88658*10^-15

6.66134*10^-16

edited Mar 16 '18 at 11:33

answered Mar 02 '18 at 18:50

Henrik Schumacher

106,770
7
179
309

OK so I just threw this into Mathematica and I haven't really understood what's going on in too much detail yet. I get CompiledFunction::cfne: Numerical error encountered; proceeding with uncompiled evaluation. on the lines when you actually do the calculation. Is that what you were suppressing with Quiet? Is this a problem or not? – Jojo Mar 15 '18 at 10:48
Also what is the purpose of the Compile GetElement? I see you and the other Compile wizards use this a lot and I don't really get what's wrong with Part within Compile? – Jojo Mar 15 '18 at 10:54
OK further questions: 1) Why do we have Abs in cF = X \[Function] cf[Partition[Abs[X], 2]];? Isn't this restricting the domain of the function to be only positive real numbers or something? I don't think I want to do that. 2) I guess that the {Automatic, {m n, m n}, 0., {1, {rp, ci}, Flatten[BlockMap[Df, X, m]]}} is a set of options for Jacobian in FindRoot? Can you run me through what each one is? I can't see them in the documentation – Jojo Mar 15 '18 at 11:02
The F and the DF, I presumably won't need these if I implement your fourth (fastest) method in my algorithm, and they're just to show that it's quicker with Compile? You also talk about a block diagonal matrix, but I can't actually see any block diagonal matrices anywhere? Why do we have to turn off Part::partd? – Jojo Mar 15 '18 at 11:09
Turning off Part::partd: I edited the post. Supressing error messages should better avoided and if necessay, should only be done locally. – Henrik Schumacher Mar 15 '18 at 11:17
About Quiet: You equations are somewhat nasty: The exponential functions lead easily to something that is not treatable by machine precision real numbers (for example, if x is negative than x^x has to be computed in the complex numbers - and it is also no continuous any more. I think this happens when FindRoot makes a step into the negative numbers. Due to backtracking line search, this does not cause severe problems since the step size is sufficiently reduced; but during the linesearch, cF might get evaluated for invalid inputs; this is why the warnings occur. – Henrik Schumacher Mar 15 '18 at 11:40
Yeah, you don't need F and DF. In general use cF and cDF. – Henrik Schumacher Mar 15 '18 at 11:41
Towards the block diagonal matrix: Just set n=10 and evaluted cDF[X0] // MatrixPlot or cDF[X0] // Normal; then you well see it. I build the matrix with quite a dirty trick, but it is much faster than building it with the standard syntax for SparseArray. – Henrik Schumacher Mar 15 '18 at 11:51
Towards Compile`GetElement: It is a read-only operation and in that respect, it does basically the same as Part. But it suppresses bound checks in the generated C-code: Instead of asking: "Please, please with sugar on top: Is array x long enough that i really indexes into it? If yes, please give be x[[i]]. If no, please throw an error." it just states "Give me x[[i]]. Believe me, it will work". Of course, that can lead to severe trouble if i does not really index into x. So it is in the responsibility of the user to assure that. – Henrik Schumacher Mar 15 '18 at 11:53
Thanks for all of your answers. OK so I now understand the block diagonal matrix. You've made all of the repeated calculations into one big system and you just run FindRoot once. Why was it faster to do it this way? How does FindRoot know to parallelise this kind of computation? Also, can you explain how SparseArray works with the {1, {rp, ci}, Flatten[BlockMap[Df, X, m]]} syntax? The documentation isn't too enlightening for me on this. Finally, am I likely to run into problems if I change _Real to _Complex? – Jojo Mar 15 '18 at 15:54
FindRoot parallelizes nothing; it mere applies Newton's algorithm (with some tricks, e.g. line search or trust region). Newton's method is basically computing x =x - LinearSolve[cDF[x],cF[x]] recursively until a suitable stopping criterion is met. The parallelization is happening in cf and cDf. To some extend, also LinearSolve profits from the fact, that we reduce the number of calls to it. – Henrik Schumacher Mar 15 '18 at 16:40
For the SparseArray syntax, see here. – Henrik Schumacher Mar 15 '18 at 16:42
Thank you for an excellent solution to my problem, and for all of your answers to my questions. Just to note, there are some 2 in your code that should be m, it might be useful for someone else to change these. I see one in ci, and also in the definitons of cF, cDF, Y and cY. – Jojo Mar 16 '18 at 11:27
@Joe You're welcome! And thanks for the feedback! – Henrik Schumacher Mar 16 '18 at 11:41
Ah, I've just started to implement this in my real-use case and I'm not sure if it'll work for what I'm doing. I'm solving equations by Monte-Carlo, so what happens is I FindRoot at the n random initial points, to 1 digit of precision but with very many iterations. Then if a solution converges to 1 d.p. I keep it, and any that fail a Check for FindRoot error messages are thrown away. I then home in on the 1.d.p. solutions to higher accuracy, and repeat the process comparing the ones that hit against previously generated solutions. – Jojo Mar 19 '18 at 11:11
So of the n trials, I expect that some will converge maybe quite quickly, and I don't want the algorithm to continue running for these ones. But many will run all the way to the end, but never converge, and I want these ones to return some error message. Is this possible somehow with your code? This part of the algorithm also works with a fixed n, but then the second part which homes in on solutions varies n depending on how many we found, so it looks to me like this won't work with what you've written either, as I think your code requires n to be fixed? – Jojo Mar 19 '18 at 11:14

Jacobian for parallelised FindRoot with multiple variables

1 Answers1

Linked