0

I am trying to write Mathematica code to symbolically compute the expectation of $g(X_1, \ldots X_K)$, where $X_1, \ldots, X_K$ are Bernoulli random variables with probabilities $p_1, \ldots, p_K$, respectively. I know how to write code to compute the expectation when I know the numerical value of $K$. For example, the code below is for when $K=2$ and $g(X_1,X_2) = \log\left(1 + \log(1+X_1) + \log(1+X_2)\right)$.

Expectation[ Log[1 + Log[1 + Subscript[x, 1]] + Log[1 + Subscript[x, 2]]], {Subscript[x, 1] \[Distributed] BernoulliDistribution[Subscript[p, 1]], Subscript[x, 2] \[Distributed] BernoulliDistribution[Subscript[p, 2]]}]

However, I want to leave $K$ as it is and have Mathematica (if possible) give me the expectation of my function in terms of sums and products. I had no idea how to even search for this on Google or the Mathematica documentation and that is why I am asking it here. In fact I am not even sure what tags to put, so if a mod is reading this, please add an appropriate tag.

Thanks in advance :)

sepehr78
  • 67
  • 6
  • Your question is unclearly formulated for me. – user64494 Apr 07 '20 at 18:28
  • @user64494 could you explain which part you don't understand? – sepehr78 Apr 07 '20 at 21:07
  • My guess is that your question should be written more like @MikeY 's definition of gg in his answer to clarify things. (And you should avoid using Subscript except maybe for display purposes and use something like x[k] and p[k].) Also, I think that @MikeY 's answer is the answer. – JimB Apr 07 '20 at 21:42
  • This might be useful: https://mathematica.stackexchange.com/questions/65471/how-to-do-algebra-on-summations-of-variable-expressions. – JimB Apr 07 '20 at 23:27

2 Answers2

3

This is just an extended comment as @MikeY has the answer.

For your particular example the function SymmetricReduction can help with seeing patterns:

k = 4;
bernoullies = Table[x[i] \[Distributed] BernoulliDistribution[p[i]], {i, k}];
Expectation[Log[1 + Sum[Log[1 + x[i]], {i, k}]], bernoullies];
SymmetricReduction[%, Table[p[i], {i, k}], Table[s[i], {i, k}]][[1]]
(* Log[1+Log[2]] s[1]+
  (-2 Log[1+Log[2]]+Log[1+Log[4]]) s[2]+
  (3 Log[1+Log[2]]- 3 Log[1+Log[4]]+Log[1+Log[8]]) s[3]+
  (-4 Log[1+Log[2]]+ 6 Log[1+Log[4]]-4 Log[1+Log[8]]+Log[1+Log[16]]) s[4] *)

k = 5;
bernoullies = Table[x[i] \[Distributed] BernoulliDistribution[p[i]], {i, k}];
Expectation[Log[1 + Sum[Log[1 + x[i]], {i, k}]], bernoullies];
SymmetricReduction[%, Table[p[i], {i, k}], Table[s[i], {i, k}]][[1]]
(* Log[1+Log[2]] s[1]+
  (-2 Log[1+Log[2]]+Log[1+Log[4]]) s[2]+
  (3 Log[1+Log[2]]-3 Log[1+Log[4]]+Log[1+Log[8]]) s[3]+
  (-4 Log[1+Log[2]]+6 Log[1+Log[4]]-4 Log[1+Log[8]]+Log[1+Log[16]]) s[4]+
  (5 Log[1+Log[2]]-10 Log[1+Log[4]]+10 Log[1+Log[8]]-5 Log[1+Log[16]]+Log[1+Log[32]]) s[5] *)

k = 6;
bernoullies = Table[x[i] \[Distributed] BernoulliDistribution[p[i]], {i, k}];
Expectation[Log[1 + Sum[Log[1 + x[i]], {i, k}]], bernoullies];
SymmetricReduction[%, Table[p[i], {i, k}], Table[s[i], {i, k}]][[1]]
(* Log[1+Log[2]] s[1]+
  (-2 Log[1+Log[2]]+Log[1+Log[4]]) s[2]+
  (3 Log[1+Log[2]]- 3 Log[1+Log[4]]+Log[1+Log[8]]) s[3]+
  (-4 Log[1+Log[2]]+ 6 Log[1+Log[4]]-4 Log[1+Log[8]]+Log[1+Log[16]]) s[4]+
  (5 Log[1+Log[2]]-10 Log[1+Log[4]]+10 Log[1+Log[8]]- 5 Log[1+Log[16]]+Log[1+Log[32]]) s[5]+
  (-6 Log[1+Log[2]]+15 Log[1+Log[4]]-20 Log[1+Log[8]]+15 Log[1+Log[16]]- 6 Log[1+Log[32]]+Log[1+Log[64]]) s[6] *)

From the observed pattern one can put together a general formula:

mean[k_] := Sum[SymmetricPolynomial[j, Table[p[i], {i, k}]] *
  Sum[(-1)^(j - i) Binomial[j, i] Log[1 + Log[2^i]], {i, 1, j}], {j, 1, k}]
mean[4]
(* (-4 Log[1+Log[2]]+6 Log[1+Log[4]]-4 Log[1+Log[8]]+Log[1+Log[16]])
  p[1] p[2] p[3] p[4]+ Log[1+Log[2]] (p[1]+p[2]+p[3]+p[4])+
  (-2 Log[1+Log[2]]+Log[1+Log[4]]) (p[1] p[2]+p[1] p[3]+p[2] p[3]+
  p[1] p[4]+p[2] p[4]+p[3] p[4])+(3 Log[1+Log[2]]-
  3 Log[1+Log[4]]+Log[1+Log[8]]) (p[1] p[2] p[3]+p[1] p[2] p[4]+p[1] p[3] p[4]+
  p[2] p[3] p[4]) *)

Use of the general formula is hundreds to thousands times faster than using Expectation.

JimB
  • 41,653
  • 3
  • 48
  • 106
2

This sort of question comes up, where the range of K is open-ended. If you are dealing with a generalized g function where you don't already know the answer, Mathematica won't find it for you straightaway (hopefully someone will correct me on this). A method I use is to generate answers with increasing K and look for patterns using the old Mark I eyeball induction process.

gg := Log[1 + Sum[Log[1 + x[k]], {k, 1, nn}]];

Table[
    tab = Table[x[i] \[Distributed] BernoulliDistribution[p[i]], {i, nn}];
    Expectation[gg, tab], 
    {nn, 1, 2}] // TableForm

$$ \left( \begin{array}{l} p(1) \log (1+\log (2)) \\ p(2) \log (1+\log (2))+p(1) (p(2) (\log (1+\log (4))-2 \log (1+\log (2)))+\log (1+\log (2))) \\ \end{array} \right)$$

MikeY
  • 7,153
  • 18
  • 27
  • Thanks for your answer! No, I do know the function, I just do not know K. So in other words expectation of g should be a function of $p_i$ and $K$. I already tried it for a few values of $K$ and have an idea of what the general expectation should be, but I was wondering if Mathematica can derive the general expression in terms of sums and products. – sepehr78 Apr 07 '20 at 21:11