2

I'm a novice Mathematica user that is having some problems using the HistogramDensity function in a DistributionChart. My observations are heavily skewed to one category and the height of the other categories are larger than the number of observations should imply. Any ideas? Here's a stylized example of my problem:

data = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4}
DistributionChart[data, ChartElementFunction -> ChartElementDataFunction["HistogramDensity", "Bins" -> 4]]

enter image description here

kglr
  • 394,356
  • 18
  • 477
  • 896
user13947
  • 23
  • 3
  • Don't you need to have a list of datasets, something like Gather[data] ? – b.gates.you.know.what Apr 25 '14 at 14:08
  • I don't think its necessary with several datasets. It plots the histogram with no problem, its just the scales that are off... – user13947 Apr 25 '14 at 14:24
  • In which way is Histogram not doing what you need to achieve ? – b.gates.you.know.what Apr 25 '14 at 14:27
  • Below is the resulting histogram – user13947 Apr 25 '14 at 14:55
  • Above I have included the resulting histogramdensity. As you can see the top category is almost as large as the two next even though it contains only one observation versus three and five in the other ... – user13947 Apr 25 '14 at 15:03
  • It seems as though there's a "minimum" bar size for HistogramDensity, which definitely distorts the distribution. Unfortunately, I can't offer a suggestion for eliminating the problem (other than to choose a different approach to plotting the distribution, such as SmoothDensity). – Cassini Apr 25 '14 at 16:43

1 Answers1

1

As @DavidSkulsky commented the problem is due to the scaling inside ChartElementDataFunction["HistogramDensity"].

A workaround is to use a custom ChartElementDataFunction. For example, the following transforms the Rectangles produced by standard Histogram to produce "double-sided" and rotated rectangles:

ceF := Dynamic@(Histogram[#2, 4, "Probability", BarOrigin -> Left, 
   ChartStyle -> CurrentValue["Color"]][[1]] /. 
     RectangleBox[{x0_, y0_}, {x1_, y1_}, z___] :> 
       RectangleBox[{-x1 + 2 #[[1, 1]], y0}, {x1 + 2 #[[1, 1]], y1}, z]) &

Using OP's data:

data = Join[ConstantArray[1, 50], {2, 2, 2, 2, 2, 3, 3, 3, 4}];

DistributionChart[{data[[40 ;;]], Join[{0, 0, 0}, data[[30 ;;]]], data}, 
      ChartStyle -> "SandyTerrain", 
      ChartElementFunction -> ceF]

enter image description here

kglr
  • 394,356
  • 18
  • 477
  • 896