How to properly (efficiently) define a mixture distribution and optimize the likelihood?

Question

To start with:

link3 = {679, 531, 379, 272, 198, 143, 99, 67, 46, 32, 22, 14, 9, 5, 3, 1};
klink3 = 20;
B1997 = {14, 10, 6, 3, 3, 6, 2, 3, 2, 1, 1, 1, 0, 3, 2, 0, 3, 1, 1, 1, 0, 0, 1, 2, 0, 0, 1};
kB1997 = 50;
taxicabsA = {142, 81, 49, 7, 3, 1};
ktaxicabsA = 10;
rabbits = {43, 16, 8, 6, 0, 2, 1};
krabbits = 18;


mylik[data_, kdata_] := Block[
    {K, mypdf, fj, pj, loglik},
    K = kdata;
    mypdf = ProbabilityDistribution[
        Binomial[K, j]*(
        w*p^j*(1 - p)^(K - j) 
        + 
        (1 - w)*Beta[alpha + j, beta + K - j]/Beta[alpha, beta]), {j, 0, K, 1}
    ];

    fj = Prepend[PadRight[data, K], f0];
    pj = PDF[mypdf, #] & /@ Range[0, K];
    loglik = 
    LogGamma[Total[fj] + 1] - Total[LogGamma[fj + 1]] + fj.Log[pj]
];


cons = {f0 > 0, alpha > 0, beta > 0, 0 < p < 1, 0 < w < 1};
pars = {f0, p, w, alpha, beta};

We will first take a look at:

NMaximize[{mylik[link3, klink3], cons}, pars, Method -> {"RandomSearch", "SearchPoints" -> Automatic}]
NMaximize[{mylik[link3, klink3], cons}, pars, Method -> {"RandomSearch", "SearchPoints" -> 200}]

(* Automatic is better, multiple search points is no good *) 

NMaximize[{mylik[B1997,kB1997], cons}, pars, Method -> {"RandomSearch", "SearchPoints" -> Automatic}]
NMaximize[{mylik[B1997,kB1997], cons}, pars, Method -> {"RandomSearch", "SearchPoints" -> 200}]

(* Automatic is not good enough, multiple search points is useful *)

How about some other methods?

Try with DifferentialEvolution:

OptMethod="DifferentialEvolution";

NMaximize[{mylik[B1997,kB1997], cons}, pars, Method -> {OptMethod, "SearchPoints" -> Automatic}]
NMaximize[{mylik[B1997,kB1997], cons}, pars, Method -> {OptMethod, "SearchPoints" -> 200}]
NMaximize[{mylik[link3, klink3], cons}, pars, Method -> {OptMethod, "SearchPoints" -> Automatic}]
NMaximize[{mylik[link3, klink3], cons}, pars, Method -> {OptMethod, "SearchPoints" -> 200}]

(* None of the above result is optimal *)

Try with SimulatedAnnealing:

OptMethod="SimulatedAnnealing";

NMaximize[{mylik[B1997,kB1997], cons}, pars, Method -> {OptMethod, "SearchPoints" -> Automatic}]
NMaximize[{mylik[B1997,kB1997], cons}, pars, Method -> {OptMethod, "SearchPoints" -> 200}]
NMaximize[{mylik[link3, klink3], cons}, pars, Method -> {OptMethod, "SearchPoints" -> Automatic}]
NMaximize[{mylik[link3, klink3], cons}, pars, Method -> {OptMethod, "SearchPoints" -> 200}]

(* None of the above result is optimal *)

So my main questions are:

If I want to write a general routine (function) for different datasets, what method do I use? I dont want to use too many search points and see from above, sometimes, it's even worse.
What is the proper way to setup a mixture distribution? For this example, it's a mixture of binomial and betabinomial distribution. I have tried to setup the probabilities using many other ways, such as:

pj = Table[ Binomial[K, j](wp^j*(1 - p)^(K - j) + (1 - w)* Beta[alpha + j, beta + K - j]/Beta[alpha, beta]), {j, 0, K}]

(* OR *)

pj = MixtureDistribution[{w,1-w},{BinomialDistribution[K,p],BetaBinomialDistribution[alpha,beta,K]}]; pj = Table[PDF[pj,j],{j,0,K}]

(* OR *)

pj = Table[w*PDF[BinomialDistribution[K, p], j] + (1 - w)*PDF[BetaBinomialDistribution[alpha, beta, K], j], {j, 0, K}]

(* Sorry about the poor format, but i just can't get the code out of the list environment *)
I have even tried to compile the likelihood function, but it does not seem to help. In terms of speed or the optimization result. Most cases, if I check CompilePrint, there are still a lot of MainEvaluate. I think that's why there is no speed up.

All I want, is (hopefully) to speed up the optimization, settle at the global optima using less "SearchPoints" and a general way to adapt to the toy datasets. Any suggestion would be appreciated.

Update:

A lot of the tips (tricks) says that bultin functions are better, so I tried this:

mylik3[data_, kdata_] := Block[
    {K, mypdf, fj, pj, loglik},
    K = kdata;
    mypdf = ProbabilityDistribution[
        w*PDF[BinomialDistribution[K,p],j]
        + 
        (1 - w)*PDF[BetaBinomialDistribution[alpha,beta,K],j], {j, 0, K, 1}
    ];

    fj = Prepend[PadRight[data, K], f0];
    pj = PDF[mypdf, #] & /@ Range[0, K];
    loglik = 
    LogGamma[Total[fj] + 1] - Total[LogGamma[fj + 1]] + fj.Log[pj]
];

But this is not better (or faster) at all.

I don't understand how this question differs from the last one that I already answered as a comment, apart from the part about the mixture distribution? I already told you that "SearchPoints" has hardly any useful effect for differential evolution and to modify "ScalingFactor" and "CrossProbability" instead. NMaximize[{mylik[B1997, kB1997], cons}, pars, Method -> {"DifferentialEvolution", "ScalingFactor" -> 0.90, "CrossProbability" -> 0.05}] and NMaximize[{mylik[link3, klink3], cons}, pars, Method -> {"DifferentialEvolution", "ScalingFactor" -> 0.90, "CrossProbability" -> 0.05}] ... — Oleksandr R., Dec 19 '14 at 23:09
... work perfectly fine, as expected, although it certainly would help NMaximize if your function also did not return nonsense values like Indeterminate for certain values of the parameters. — Oleksandr R., Dec 19 '14 at 23:10
I do appreciate your help! However, if I try your settings on taxicabsA, it returns a likelihood value of -16.3518. But I knew there is a better one -16.3354, which would give a huge difference on the parameter values. I want to keep search points merely as a way to ensure I found the best global optima. I spend a lot of time, search on different ways to produce MixtureDistributions because I dont know whether they will make a difference in the optimization. If bultin functions are faster, would that be more efficient to use? — Chen Stats Yu, Dec 19 '14 at 23:35
@OleksandrR. For your information, the taxicabsA data should have these values {alpha -> 5714973, beta -> 31443412, w -> 1 - 0.6243, f0 -> 180.3501, p -> 0.0317}, which gives a target function value of -16.3354. — Chen Stats Yu, Dec 19 '14 at 23:40
Changing the number of search points does not help for differential evolution. In fact, it can even make matters worse to choose an excessive number. There are also no universal values of the tuning parameters that will work for every problem--global optimization is an open question and major field of research, and you just cannot expect to solve it so easily. Normally one either knows something a priori about the problem, or otherwise, another way to approach it is to meta-optimize the tuning parameters, as I did here. — Oleksandr R., Dec 19 '14 at 23:41
Yes, I understand it's a difficult question. That is why I spend a lot time on finding the proper way to compute the likelihood function. So that I can use more search points (using randomsearch). I think randomsearch will explore a wider parameter space if more points are used, so a better chance to find the global optima? — Chen Stats Yu, Dec 19 '14 at 23:46
@OleksandrR. I know that I lot of people tend to use differential evolution. But I personally think randomsearch is more suitable (with multiple search points) to explorer a wider parameter space? — Chen Stats Yu, Dec 19 '14 at 23:48
Well, I'm not going to say that you're wrong, because this is your decision to make. But, differential evolution doesn't work like it seems that you think it works; not at all. There are entire books about it that you may find useful. Not that DE is necessarily the best method in general, but I would be confident in saying that it's the best method that's been implemented in Mathematica, if you get the tuning parameters right (even though the implementation is not perfect). Using random search is rather like using Nelder-Mead as in the linked question. They are both pseudo-local optimizers. — Oleksandr R., Dec 19 '14 at 23:52
Does Maple do any better on any of these problems, by the way? — Oleksandr R., Dec 19 '14 at 23:56
Quick answer is Maple does not have Global Optimization tools built in. You have to purchase a different add on license for it. Shall we continue to chat? — Chen Stats Yu, Dec 20 '14 at 00:02

How to properly (efficiently) define a mixture distribution and optimize the likelihood?

0 Answers0