I have three datasets as follow:
d1 = {7.813540819717949`, 6.529568930239602`, 8.109143155429088`, 7.689731068450451`, 8.74120436001789`, 7.4987912906550225`, 7.615218703959835`, 8.247993113806512`, 7.2855561696238285`, 7.166026422873959`, 7.378283448111686`, 7.3970482801481445`, 7.021646802522941`, 7.1487021619286235`, 7.209809605280611`, 7.872181282365198`, 7.984087026932971`, 7.264607460785361`, 7.491235907642249`, 6.986130172036584`, 8.132032453432691`, 6.738507551311768`, 7.634485996132314`, 7.786727105539017`, 6.747498220290592`, 5.652606213674484`, 6.7145418893159245`, 7.3870764772231245`, 8.14556152457044`, 7.081683553610293`, 7.878354128765276`, 7.502128544096253`, 7.445033772765536`, 8.112365581746175`, 6.964895579434653`, 7.98895773278444`, 7.097718201996728`, 6.6447632395708816`, 7.014871229850273`, 7.596255451339725`, 7.327487147907844`, 7.626051696868463`, 7.973711939912565`, 7.497221611664195`, 7.6858101713490985`, 8.629983423881175`, 6.599398955562014`, 6.833800440038957`, 7.171926480447397`, 5.789311864786593`, 7.089675372368081`, 6.1631268127766665`, 7.639965587796188`, 7.086047166284019`, 5.406270852788866`, 6.93340616203404`, 5.807420811406816`, 6.419517442993749`, 7.414925539201528`, 7.33535213385308`, 8.263651842593319`, 6.164116766463783`, 6.947143869593553`, 6.9901653234057335`};
d2 = {3.5113704043630944, 7.608006873863145, 7.277906729276423, 8.103217766965727, 7.250931121957523, 7.8289541079690075, 6.974346234874238, 8.270349382459646, 8.10910442305307, 6.9048101280375365, 7.325825456049268, 7.73757379706066, 6.765535372291408, 6.785935805992911, 6.688548887697433, 8.459691239773468, 8.186443329842977, 6.460124310707724, 5.331370159943533, 7.509021973926822, 5.895785915422184, 7.155632848713018, 6.297014977893012, 7.024257130369493, 6.7061658261632875, 6.076510324090876, 6.619713727473183, 6.806885656438763, 6.29566021880475, 6.379851689698886, 5.886155198942714, 6.711177147084791, 6.0644975554789955, 6.476320208302014, 5.555972713843852, 5.445658305743026, 4.106563198219526, 9.662304286781078, 7.245526688643017, 7.729224257804472, 7.348730647320301, 6.811708787441987, 6.453437984409927, 6.251602287446226, 6.222552426881395, 6.325257487433763, 6.561448036981885, 6.992186187552647, 6.581755244192236, 6.35555449505844, 6.356911340563265, 5.993472986445495, 6.8937984018722185, 6.475209760379862, 5.709341745086517, 5.993789526904261, 5.81235159806516, 5.929503391669893, 6.692420198312638, 6.897925343017829, 7.935826808183915, 5.21638101511404, 6.0629537994613605, 5.011619213604298, 5.329736171185173, 5.581381723984043, 5.386724423467079, 6.8984666598061235, 6.624173458096634, 5.9758576034447595, 6.262093086610586, 6.191574403265945, 6.345257463708366, 6.691754218111958};
d3 = {11.3197, 10.9668, 10.6479, 11.6099, 10.2554, 11.3928, 11.1466, 8.62521, 9.39976, 8.52043, 9.68226, 9.16244, 9.56907, 9.6331, 10.0117, 11.9325, 11.0703, 10.2413, 10.1749, 11.377, 9.48853, 9.27371, 8.69103, 9.91404, 10.1807, 8.29698, 9.88819, 9.10128, 11.2514, 8.5246, 9.90356, 9.61888, 9.94975, 10.562, 10.3259, 10.5507, 10.3181, 10.4145, 10.6412, 9.67268, 10.5768};
Now, I need to calculate the entropy of each of the above data from their corresponding histograms; however, the bin widths of histograms are not known. I should determine the bin widths in such a way that the function $g$, defined below, to be minimized.
Thus, first, we write for the entropies as:
e1[y_] := NIntegrate[With[{f = PDF[HistogramDistribution[d1, {y}], x]}, If[f > 0, -f Log[f], 0]], {x, -\[Infinity], \[Infinity]}]
e2[z_] := NIntegrate[With[{f = PDF[HistogramDistribution[d2, {z}], x]}, If[f > 0, -f Log[f], 0]], {x, -[Infinity], [Infinity]}]
e3[w_] := NIntegrate[With[{f = PDF[HistogramDistribution[d3, {w}], x]}, If[f > 0, -f Log[f], 0]], {x, -[Infinity], [Infinity]}]
where $y$, $z$, and $w$ are the bin widths of histograms, and the entropy is defined as usual.
The bin widths should be chosen in such a way that the following function to be minimized:
g[x_, y_, z_, w_] := (0.40) Log[(0.40)/((E^(-x (e1[y])))/(E^(-x (e1[y])) + E^(-x (e2[z])) + E^(-x (e3[w]))))]
+ (0.38) Log[(0.38)/((E^(-x (e2[z])))/(E^(-x (e1[y])) + E^(-x (e2[z])) + E^(-x (e3[w]))))]
+ (0.22) Log[(0.22)/((E^(-x (e3[w])))/(E^(-x (e1[y])) + E^(-x (e2[z])) + E^(-x (e3[w]))))]
so:
FindMinimum[{g[x, y, z, w], x > 0 && y > 0 && z > 0 && w > 0}, {x, y, z, w}]
At this stage, Mathematica returns errors. I think, NIntegrate cannot be defined in terms of variables. Any help is appreciated.





?NumericQon the arguments. – Michael E2 Oct 09 '21 at 01:23f[x_?NumericQ, y_?NumericQ, z_?NumericQ, w_?NumericQ]and similarly for other arguments. – bbgodfrey Oct 09 '21 at 01:30Plot[e1[y], {y, .01, 50}, MaxRecursion -> 5, PlotPoints -> 50, PlotRange -> All]produces noisy results fory < 10. Is this reasonable? – bbgodfrey Oct 09 '21 at 01:57Plot[e1[y], {y, .01, 1}, MaxRecursion -> 3, PlotPoints -> 10, PlotRange -> All]is noisy throughout.FindMinimumwill not work well as a result. Try the plot. – bbgodfrey Oct 09 '21 at 02:10e1to see. – bbgodfrey Oct 09 '21 at 02:15