Neural Network Automatic Differentiation ("Autograd")

Question

Automatic differentiation methods in general have been the topic of past questions, but this is specifically about automatic differentiation of neural networks.

The motivation is an old but useful trick for solving ODEs and PDEs. For example, in solving an ODE of the form: $u'(x) = f(x)$, where $f(x)$ is some known function and the goal is to find $u(x)$. Let's assume $f(x) = 2x$ and $u(0) = 1$. Then, we can implement neural network layers for our functions as follows:

Clear[u, f, g]
g = NetChain[ (just some ansatz for the auxiliary function)
  {LinearLayer[32], LogisticSigmoid, LinearLayer[32], LogisticSigmoid, LinearLayer[1]},
  "Input" -> "Real"]
(create u from the auxiliary function g)
u = With[{u0 = 1}, FunctionLayer[#*g[#] + u0 &]]
(function we want to match (right hand side) )
f = FunctionLayer[{2*#} &, "Input" -> "Real"]

and then fit $u$ by training a neural network function of the form:

(* finite difference approximation of du/dx *)
deriv[netU_] := With[{dx = 0.0001},
  NetChain[
   { FunctionLayer[{#, # + dx} &],(*given input x, compute x and x+dx*)
     NetMapOperator[netU],(*apply the u function to both inputs*)
     FunctionLayer[(#[[2]] - #1[[1]])/dx &]}, (*take the differences and divide by dx*)
   "Input" -> "Real"]
  ]
(* define a network where we minimize the difference between the learned du/dx and known f(x) *)
nn = NetGraph[
  {deriv[u], f, MeanSquaredLossLayer[]},
  {NetPort["Input"] -> {1, 2} -> 3}]
(* train on randomly generated points *)
trained = NetTrain[nn, RandomReal[{-1, 1}, 64], BatchSize -> 64]
(extract the trained u function)
trainedU = NetExtract[trained, {1, 2, "Net"}]

where deriv computes the derivative of the neural network function implementing $u(x)$ by finite-differences. This is fine in a simple 1D-example, but is inefficient as the dimensionality increases and opens up problems about the appropriate finite difference value to use. A better approach would be to compute the derivative of the network defining $u(x)$ with respect to its inputs, $x$, using so-called "autograd" methods.

MXNet (the underlying engine used by Mathematica) supports calculating the gradient of networks with respect to inputs (and is described in the MXNet Tutorial on mxnet.autograd) So in principle this is doable. (A related question on how to calculate the Hessian of a neural network in Mathematica was posted 2 years ago, but at the time MXNet didn't support Hessians.) What's the best way to do this in Mathematica?

My understanding is that this is not quite the functionality of NetPortGradient (which takes a derivative with respect to particular specified data). That is, while it allows for the following:

example = NetInitialize[u]
example[5, NetPortGradient["Input"]] (* computes the derivative at x=5 *)

it cannot be used to define new networks that can be used elsewhere:

FunctionLayer[example[#, NetPortGradient["Input"]] &] (*throws errors*)

In MxNET or PyTorch, you would implement this by backpropagating twice. Compute gradient g, let $m=g*e_1$, then compute gradient of $m$. This gives you the first row of the Hessian matrix. I wouldn't be surprised if Mathematica doesn't integrate this because frameworks didn't originally support double backprop (requires retain_graph=True arg). What kinds of layers were you planning to use? If it's LinearLayer/LogisticSigmoid/ReLU, I could throw together a simple implementation in pure Mathematica — Yaroslav Bulatov, Oct 05 '21 at 21:24
To make this efficient with MxNET, Mathematica should implement a function like "NetPortHvp(input,vec)" which computes a Hessian-vector product. Internally it would issue two .backwards() calls, one with retain_graph=True and one without. — Yaroslav Bulatov, Oct 05 '21 at 23:50
For anyone on Wolfram side reading this, here's an end-to-end implementation of Hessian vector product in PyTorch, which uses similar API to MxNet -- https://colab.research.google.com/drive/1VlirNQxEr3I36s7ra7wzYIapNpcB-l97#scrollTo=Z1dk3gI7QiSx&line=1&uniqifier=1 — Yaroslav Bulatov, Oct 05 '21 at 23:55

Neural Network Automatic Differentiation ("Autograd")

0 Answers0