Why is this scipy.root code not converging?

Question

I'm running a test problem to set up larger problems. Solving the simple unsteady heat equation via finite differences:

$$ \frac{\partial T}{\partial t} = \alpha \frac{\partial^2T}{\partial x^2}$$

$\alpha = 0.001$

$T(x=0,t) = 60°C$

$T(x=L,t) = 25°C$

$T(x,t=0) = 25°C$

import matplotlib.pyplot as plt
import numpy as np
import sympy as sp
from scipy.optimize import fsolve
from scipy import optimize
import pandas as pd
from numba import njit
t_max = 25
x_max = 1
N_points_1 = 61
N_points_2 = 11
dt = t_max/(N_points_1-1) 
dx = x_max/(N_points_2-1)
N_variables = N_points_1 * N_points_2
alpha = 0.01 
T_x_0 = 60 
T_x_L = 25
T_t_0 = 25
@njit
def EDP (v):
    T_N =  np.ones((N_points_1, N_points_2))
#Variables
k=0
for i in range(N_points_1):
    for j in range(N_points_2): 
        T_N[i,j]=v[k]
        k=k+1

#Model
sys = []
##Boundary Condition x = 0
for i in range(N_points_1):
    BC1 = T_N[i,0] - T_x_0
    sys.append(BC1)

##Boundary Condition x = L
for i in range(N_points_1):
    BC2 = T_N[i,-1] - T_x_L
    sys.append(BC2)

##Initial Condition
for j in range(1,N_points_2-1):
    IC1 = T_N[0,j] - T_t_0
    sys.append(IC1)

## Energy Balance 
for i in range(N_points_1-1):
    for j in range (1,N_points_2 -1):
        EB = alpha*(T_N[i,j+1]-2*T_N[i,j]+T_N[i,j-1])/dx**2 - (T_N[i+1,j]-T_N[i,j])/dt  
        sys.append(EB)

sys=np.array(sys)


return sys


initial_guess = np.linspace(25,60,N_variables)

function_compile = EDP(initial_guess)

I tested the following scipy.root algorithms:

Solution_T = fsolve(EDP,initial_guess)
Solution_T = optimize.anderson(EDP,initial_guess, verbose = True,maxiter = 1000)  
Solution_T = optimize.newton_krylov(EDP,initial_guess,verbose=True,maxiter = 1000)
Solution_T = optimize.broyden1(EDP,initial_guess,verbose=True,maxiter = 1000)
Solution_T = optimize.broyden2(EDP,initial_guess,verbose=True,maxiter = 1000)

And ONLY fsolve converges (instantly, in fact) but since I am just running a test for a larger run (which will contain about 50000 variables), fsolve won't be able to handle that and the scipy docs recommend the algorithms mentioned above, but none of them converge.

They either diverge with the objective function overflowing or remain stuck in the same value, not converging. As for why I am not using linear solvers, it's because I intend to input more complex non-linear equations later.

To some part this is because with the non-linear (necessarily sequential) solvers, you play some kind of whack-a-mole with the local error. Getting improvements in one place worsens the error in another one. So each step is part global error reduction and a larger part error shifting. In some old methods to solve Poisson equations, this was called "ironing", moving local errors around until they fall off over the boundary conditions. — Lutz Lehmann, Mar 07 '23 at 15:09
Another effect is that with a simple grid the distance to the boundary is linear in the number of nodes. Using multi-grid or multi-resolution (like wavelet) discretizations reduces this distance, purely on graph connectivity, to the logarithm of the node size. This should reduce the "travel time" for local errors to the boundary, thus speed up the solution process. — Lutz Lehmann, Mar 07 '23 at 15:13
So i've been testing with initial conditions and even when providing an initial guess very close to the solution, they still don't converge. I am going to make another question because this is far too bizarre for such a simple system. — Klaus3, Mar 08 '23 at 00:31
Why are you using a root solver for an initial value problem? The right method to use here is scipy.integrate.solve_ivp. — IPribec, Mar 10 '23 at 09:39
@IPribec i'm Sorry but all differential equation problems transform into root finding problems when discretized. Read the "events" part of the link you posted , all values are to be found by a root finding algorithm. Plus, Ivp is only for ODEs, this is a PDE. — Klaus3, Mar 10 '23 at 13:11
After the spatial discretization using FDM you end up with a system of ODEs. You can then integrate this system in time to obtain the temperature values at nodes. This approach is known as the method of lines. The events callback in solve_ivp serves a different purpose, say, to find out when does a particular point reach a certain temperature. — IPribec, Mar 10 '23 at 14:10
The system of ODE's that's discretized by the Runge Kutta or BDF methods of solve.ivp, which involves rootfinding algorithms to solve them. — Klaus3, Mar 10 '23 at 14:29
So, a root finding problem. There should be zero problem to simply use a root finder from the start. Just read this part in the link: "Each event occurs at the zeros of a continuous function of time and state. Each function must have the signature event(t, y) and return a float. The solver will find an accurate value of t at which event(t, y(t)) = 0 using a root-finding algorithm. By default, all zeros will be found. — Klaus3, Mar 10 '23 at 14:40
Without a sink term to get rid of heat, it's kind of difficult to have T_N[N_points_1,:] = 1, when the boundaries are set to 25 and 60. Don't you think? Seems like a violation of the maximum principle to me... — IPribec, Mar 10 '23 at 15:38
Where do you see T_N[N_points_1,:] = 1?. Because thats a clear mistake by me if thats the case — Klaus3, Mar 10 '23 at 16:31
Just below the function definition of EDP. In the energy balance you only iterate till N_points_1-1. Perhaps you should consider a different toy problem, instead of a parabolic one. Edit: I stand corrected... It gets replaced with v[k]. But that gets set to the initial guess np.linspace(25,60,N_variables). But the last row still will not be a linear function from 25 to 60. — IPribec, Mar 10 '23 at 16:51

score 3 · Answer 1 · answered Mar 10 '23 at 14:46

My suggestion would be to use an ODE solver combined with the method of lines instead of trying to use a non-linear system solver.

Here's an example of what this might look like:

import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
def F(t,y,alpha,dx):
dydt = np.empty_like(y)
alpha = alpha/dx**2

# BC at x = 0
dydt[0] = 0
# BC at x = L
dydt[-1] = 0
# Interior nodes
dydt[1:-1] = alpha*(y[2:] - 2*y[1:-1] + y[0:-2])

return dydt


if name == 'main':
t_max = 25
alpha = 0.001
N = 61

x, dx = np.linspace(0,1,N,retstep=True)
t = np.linspace(0,t_max,11)

T0 = 25*np.ones_like(x)
T0[0] = 60

def F_(t,y):
    return F(t,y,alpha,dx)

sol = solve_ivp(F_,(0,t_max),T0,method=&quot;LSODA&quot;,t_eval=t)

plt.plot(x,sol.y)
plt.xlabel(r&quot;<span class="math-container">$x$</span>&quot;)
plt.ylabel(r&quot;<span class="math-container">$T(x,t)$</span> [°C]&quot;)
plt.title(r&quot;Heat equation - <span class="math-container">$T_t = \alpha T_{xx}$</span>&quot;)

plt.legend([&quot;t = {:0.2f}&quot;.format(t_) for t_ in t])

plt.show()

I understand i could do it, but as i said, the intention is to use this just a toy problem to set up larger and more complex ones, which will need usage of nonlinear solvers which can handle potentially millions of variables. — Klaus3, Mar 10 '23 at 14:52

score 1 · Accepted Answer · answered Mar 12 '23 at 05:27

After playing around with this a bit, I am almost certain that this is a conditioning issue. I discovered this by attempting to solve the problem with the initial guess being the result from fsolve with some iid Gaussian noise added. Convergence of the Newton-Krylov method happened once the variance of the Gaussian noise was lower than $10^{-14}$, indicating there are some severe sensitivities to perturbations, i.e., conditioning issues.

I will restrict my discussion here to the Newton-Krylov method (Newton-GMRES by default), since Broyden's method is typically worse anyway. I believe that the stagnation of the Newton iteration is being caused by a failure of GMRES to converge properly with the large condition number combined with the poor globalization properties of a backtracking line search when the Jacobian conditioning is bad.

Poor conditioning absolutely wrecks the convergence properties of GMRES for general problems (read: eigenvalues aren't clustered). See this result from Tim Kelley's book:

I don't know what internal settings SciPy is using for the GMRES iteration, but I believe it is basically getting nowhere and not throwing any errors. On top of this, poor Jacobian conditioning can lead to poor performance of line searches, but I believe this is primarily an issue with GMRES not converging properly since the problem is linear and the line search shouldn't be needed in the first place. A preconditioner in the form of a coarse grid approximation or something similar may work well for this problem, but it is easier to formulate good preconditioners for steady-state problems. Broyden methods and Anderson mixing all attempt to approximate the Jacobian, so they are likely all stymied by the same conditioning issues.

As for why fsolve works, it is because fsolve uses Powell's hybrid method (link), which has much better globalization properties. Powell's method uses trust regions to ensure the iteration is more stable. When the Jacobian conditioning is bad, Powell's method basically reduces to steepest descent with a trust region, which is very stable.

As for how to use this knowledge for your next application, the main takeaway is that preconditioners for large PDE discretizations are absolutely essential. There are a variety of choices depending on the type of PDE, geometry, etc., but not using one can get you in a situation like this, where the fastest and best-scaling methods (Newton-Krylov) are crippled by poor conditioning.

Kelley, C. T., Iterative methods for linear and nonlinear equations, Frontiers in Applied Mathematics. 16. Philadelphia, PA: SIAM, Society for Industrial and Applied Mathematics. xiii, 165 p. (1995). ZBL0832.65046.

Thank you! I am surprised that this system (which is essentially a linear system of equations) causes so much trouble to the solvers. — Klaus3, Mar 13 '23 at 03:50
A poorly conditioned system causes issues with most solvers, regardless of structure and is something that must be taken into account — whpowell96, Mar 13 '23 at 03:52

Why is this scipy.root code not converging?

2 Answers2

Linked