6

I have the expression:

$$ p_i = \frac{e^{x_i}}{\sum_{j=1}^{n} e^{x_j}} $$

which I defined with the following Mathematica code:

Subscript[p, i] = E^Subscript[x, i] / Sum[E^Subscript[x, j], {j, 1, n}]

I tried to take the derivative with respect to an $x_i$:

D[Subscript[p, i], Subscript[x,i]]

but, I just get the expression back as the result:

$$ \frac{e^{x_i}}{\sum_{j=1}^{n} e^{x_j}} $$

Obviously, I am doing something wrong. How should I go about working with this type of expression?

RandomBits
  • 617
  • 4
  • 12

4 Answers4

6

In M11.1+ derivatives of sums behave the way you requested:

Assuming[
    i ∈ Integers && 1<=i<=n,
    Simplify @ D[Subscript[p,i], Subscript[x,i]]
] //TeXForm

$\frac{e^{x_i} \left(\sum _{j=1}^n e^{x_j}-e^{x_i}\right)}{\left(\sum _{j=1}^n e^{x_j}\right){}^2}$

Bob Hanlon
  • 157,611
  • 7
  • 77
  • 198
Carl Woll
  • 130,679
  • 6
  • 243
  • 355
6

Following this answer, if we define a couple of rules for formal differentiation.

Clear[d];
d[Log[x_], a[k_]] := 1/x d[x, a[k]]
d[Sum[x_, y__], a[k_]] := Sum[d[x, a[k]], y]
d[a[k_] b_., a[k_]] := b /; FreeQ[b, a]
d[a[q_] b_., a[k_]] := b Subscript[δ, k, q] /; FreeQ[b, a]
d[c_ b_, a[k_]] := d[c, a[k]] b + d[b, a[k]] c
d[b_ + c_, a[k_]] := d[c, a[k]] + d[b, a[k]]
d[Subscript[δ, r_, q_], a[k_]] := 0
d[x_, a[k_]] := 0 /; FreeQ[x, a]
d[G_^n_, a[k_]] := n G^(n - 1) d[G, a[k]] /; ! FreeQ[G, a]
d[Exp[G_], a[q_]] := Exp[G] d[G, a[q]] /; ! FreeQ[G, a]

and some simplification rules

ds = {Sum[a_ + b_, {s_, 1, p_}] :> 
    Sum[a, {s, 1, p}] + Sum[b, {s, 1, p}], 
   Sum[y_ Subscript[δ, r_, s_], {s_, 1, p_}] :> (y /. s -> r), 
   Sum[y_ Subscript[δ, s_, r_], {s_, 1, p_}] :> (y /. s -> r), 
   Sum[Subscript[δ, s_, r_], {r_, 1, p_}] :> 1, 
   Sum[δ[i_, k_] δ[j_, k_] y_., {k_, n_}] -> δ[i,
       j] (y /. k -> i), 
   Sum[a_ b_, {r_, 1, p_}] :> a Sum[b, {r, 1, p}] /; NumberQ[a]};


 Clear[a]; Format[a[k_]] = Subscript[a, k]

Then considering

    Q = Exp[a[i]]/Sum[Exp[a[j]], {j, 1, n}]; Q /. a -> x

Mathematica graphics

we have for instance

 grad = d[Q, a[p]] //. ds /. a -> x // Simplify

Mathematica graphics

It is fairly general (though not fully bullet proof): for instance

Q = Exp[2 a[i] a[k]]/Sum[Exp[a[j]], {j, 1, n}]^4; Q /. a -> x

Mathematica graphics

  grad = d[Q, a[p]] //. ds; grad /. a -> x // Simplify

Mathematica graphics

  hess = d[grad, a[q]] //. ds /. a -> x // FullSimplify

Mathematica graphics

chris
  • 22,860
  • 5
  • 60
  • 149
  • I was hoping to be able to get the result without explicitly adding all of the derivative and simplification rules. After following the link in your answer, I found that I could show Mathematica the relationship I intended between x[i] and x[j] with the following code: x /: D[x[i],x[j],NonConstants -> {x}] = KroneckerDelta[i,j]. – RandomBits Sep 10 '15 at 20:23
  • 1
    I also found the user of format in your answer (Format[a[k_]] = Subscript[a, k]) very useful. – RandomBits Sep 10 '15 at 20:24
1

If you don't mind beeing slightly less symbolic and general but staying more standard you could proceed as follows.

Define

p[n_, i_] := Exp[x[i]]/Sum[Exp[x[j]], {j, 1, n}]

Now we have two typical cases for the derivative

With[{n = 5, i = 3, k = 2}, D[p[n, i], x[k]]]

(*
Out[97]= -(E^(x[2] + x[3])/(E^x[1] + E^x[2] + E^x[3] + E^x[4] + E^x[5])^2)
*)

and

With[{n = 5, i = 3, k = 3}, D[p[n, i], x[k]]]

(*
Out[98]= -(E^(2 x[3])/(E^x[1] + E^x[2] + E^x[3] + E^x[4] + E^x[5])^2) + E^x[3]/(
 E^x[1] + E^x[2] + E^x[3] + E^x[4] + E^x[5])
*)

From this the general pattern is easily gathered.

Dr. Wolfgang Hintze
  • 13,039
  • 17
  • 47
1

Based on the comments from @It's Pronounced Oiler and following the link from @chris's answer, I found this answer which was helpful.

First, instead of using subscripted symbols as variables, it is suggested to to use function call notation like this: x[i]. Second, the following code can be used to show Mathematica the intended relationship between x[i] and x[j]:

x /: D[x[i], x[j], NonConstants -> {x}] := KroneckerDelta[i,j]

The expression I want to differentiate then becomes:

p[i] = E^x[i] / Sum[E^x[j], {j,1,n}]

or,

$$ p_i = \frac{e^{x_i}}{\sum_1^n e^{x_j}} $$

And, the differentiation:

Assuming[i \[Element] Integers && j \[Element] Integers && 
  1 <= i <= n && 1 <= j <= n, D[p[i], x[i], NonConstants -> {x}]]

This produces the expected result of:

$$ \frac{e^{x_1} (-e^{x_i} + \sum_1^n e^{x_j})}{(\sum_1^n e^{x_j})^2} $$

which reduces to,

$$ p_i - p_i^2 $$

RandomBits
  • 617
  • 4
  • 12