7

Let's define F4 and estimate the distribution of the data:

W = Import["https://pastebin.com/raw/kE49s1Fj", "Package"];
F4 = Log[W];

EstimatedDistribution[F4, MixtureDistribution[{0.65, 1 - 0.65},
                      {NormalDistribution[α, c], NormalDistribution[d, e]}]]

EstimatedDistribution[F4, MixtureDistribution[{0.65, 1 - 0.65},
                      {NormalDistribution[Subscript[μ, 4], Subscript[σ, 4]],
                       NormalDistribution[Subscript[ν, 4], Subscript[τ, 4]]}]]

EstimatedDistribution[F4, MixtureDistribution[{0.65, 1 - 0.65},
                      {NormalDistribution[a, c], NormalDistribution[d, e]}]]

The code provides three different results in version 8.0 and version 10.2 as well. Only the last is appropriate. To my knowledge all are using NMinimize (MLE) internally.

Some users reported similar issues when variables/symbols are changed in other functions. I thought that has been fixed in version 10?

What does version 11 do?

JHT
  • 1,005
  • 8
  • 16
  • 11.1.0 for Mac OS X. I get the last two as closely (many digits) equivalent: MixtureDistribution[{0.65, 0.35}, {NormalDistribution[8.69107, 0.568114], NormalDistribution[5.62755, 2.06378]}] , first one: MixtureDistribution[{0.65, 0.35}, {NormalDistribution[6.86847, 2.28485], NormalDistribution[8.646, 0.386817]}] – John Joseph M. Carrasco Aug 24 '17 at 11:33
  • 1
    Definitely all about lexicographic ordering of variable choice. Without getting exotic in variable names: same dichotomy between D1 = EstimatedDistribution[Log[W], MixtureDistribution[{0.65, 1 - 0.65}, {NormalDistribution[a1, a2], NormalDistribution[a3, a4]}]] vs `D2 = EstimatedDistribution[Log[W], MixtureDistribution[{0.65, 1 - 0.65}, {NormalDistribution[a3, a4], NormalDistribution[a1, a2]}]] – John Joseph M. Carrasco Aug 24 '17 at 11:39
  • So, it is looking better in 11.1. In 8 and 10 there is a huge difference. Can you try replacing 'a' by 'q' in the last one. – JHT Aug 24 '17 at 11:41
  • {NormalDistribution[q, c], NormalDistribution[d, e]} $\mapsto$ {NormalDistribution[a3, a4], NormalDistribution[a1, a2]} ==={NormalDistribution[\[Alpha], c], NormalDistribution[d, e]} – John Joseph M. Carrasco Aug 24 '17 at 11:46
  • So, only the first differs from the rest in v11, strange! – JHT Aug 24 '17 at 11:48
  • 3
    This probably isn't considered a bug. This kind of problem doesn't have a single correct solution in many cases. When you change the variable names, you're effectively changing the expression being given to NMinimize and NMinimize can't guarantee that it returns the same result in that case - even if they're symbolically equal up to a variable name change. – Searke Aug 24 '17 at 14:34
  • 2
    This function could be made to be do what you want and have consistent results regardless of what the variable names are. I'm afraid however that would mean removing any symbolic processing from the core the function. That could have some serious downsides. – Searke Aug 24 '17 at 14:40
  • That would mean the result may change in every run, which is not. It does only change when variable names are changed. This is the worst thing that can happen in a CAS. – JHT Aug 24 '17 at 15:45
  • 2
    I think it's a precision issue. Using WorkingPrecision -> 30 gets one pretty much the same answer. – JimB Aug 24 '17 at 19:23
  • What exactly is a measure of bugginess in your understanding? You keep using dramatic words like "tremendous" and "huge", but these adjectives do not provide any real information beyond the fact that you think it is a bug (which in this case is not true, according too comments). – István Zachar Aug 25 '17 at 12:10
  • It doesn't matter what I'm thinking, it is provable a bug. Its not about that particular function. If you are doing science you rely on software. Since you cannot cross-check every computation you have to trust your software, which is massively discredited by a behavior as shown here and in the link. I'm not aware of similar big bugs in other software. – JHT Aug 25 '17 at 13:00

1 Answers1

2
$Version

(*  "11.1.1 for Mac OS X x86 (64-bit) (April 18, 2017)"  *)

W = ToExpression@Import["https://pastebin.com/raw/kE49s1Fj"];

F4 = Log[W];

The default ParameterEstimator for EstimatedDistribution is MaximumLikelihood

Options[EstimatedDistribution, ParameterEstimator]

(*  {ParameterEstimator -> "MaximumLikelihood"}  *)

With this default option the first case differs from the last two.

(dist=EstimatedDistribution[F4,
    MixtureDistribution[{0.65, 1 - 0.65},
     {NormalDistribution @@ #[[1]],
      NormalDistribution @@ #[[2]]}]] & /@
  {{{α, c}, {d, e}},
   {{Subscript[μ, 4], Subscript[σ, 4]},
    {Subscript[ν, 4], Subscript[τ, 4]}},
   {{a, c}, {d, e}}}) // Column

enter image description here

EDIT: With the ParameterEstimator option set as suggested by @JimBaldwin the results are equivalent between the cases

(dist = N[EstimatedDistribution[F4, 
       MixtureDistribution[{0.65, 1 - 0.65}, {NormalDistribution @@ #[[1]], 
         NormalDistribution @@ #[[2]]}], 
       ParameterEstimator -> {Automatic, 
         Method -> {Automatic, WorkingPrecision -> 30, 
           MaxIterations -> 300}}]] & /@ {{{α, c}, {d, 
       e}}, {{Subscript[μ, 4], 
       Subscript[σ, 4]}, {Subscript[ν, 4], 
       Subscript[τ, 4]}}, {{a, c}, {d, e}}}) // Column

enter image description here

Legended[
 Show[
  Histogram[F4, Automatic, "PDF"],
  SmoothHistogram[F4,
   PlotStyle -> {{Blue, Thick}}],
  Plot[PDF[dist[[1]], x], {x, 0, 11},
   PlotRange -> All,
   PlotStyle -> {{Red, Thick}}],
  PlotLabel -> Style[distr[[2]], Bold],
  ImageSize -> Large,
  Epilog -> Inset[
    DistributionFitTest[F4, dist[[1]], {"TestDataTable", All}] //
      Rasterize // Image,
    {3, 0.35}]],
 Placed[
  LineLegend[
   {Directive[Blue, Thick], Directive[Red, Thick]},
   {"SmoothHistogram", "PDF"}],
  {0.3, 0.3}]]

enter image description here

Bob Hanlon
  • 157,611
  • 7
  • 77
  • 198
  • Unfortunately, this favors the very bad solution. The case a,c,d,e privides a much better fit. – JHT Aug 24 '17 at 15:42
  • 1
    Explicitly giving the defaults (that sounds kinda circularly redundant?) and including WorkingPrecision and MaxIterations seems to make it give consistent results: ParameterEstimator -> {Automatic, Method -> {Automatic, WorkingPrecision -> 30, MaxIterations -> 300}}. – JimB Aug 25 '17 at 02:10
  • @JimBaldwin - Thanks. Corrected. – Bob Hanlon Aug 25 '17 at 03:13
  • 1
    For whatever it's worth my comment was not intended to be a correction but rather just an observation that might lead someone much more knowledgeable than me as to what might be causing the issue. Just putting in WorkingPrecision -> 30 gives one 3 different answers. Another oddity is that the default MaxIterations for NMaximize is 100 and no warnings are given if MaxIterations isn't included or if MaxIterations -> 100. However, one does get warnings of failure to converge if one puts in MaxIterations -> 150 for two of the parameterizations. – JimB Aug 25 '17 at 03:24
  • Okay the above solution works also in v8. Although the running time is much longer. But it does not address the bug that the same code provides different result using other variables, which hurts the fundamental principle of such a software. see also https://mathematica.stackexchange.com/questions/25182/variable-naming-changes-everything – JHT Aug 25 '17 at 10:38