2

After looking at some plots on OriginLab which combine boxplots with the distribution curve and the raw data, I wanted to do something similar in PGFPlots. I found some interesting work already here. Building on this, I created the following plot. I see that the presentation of the plot and some visuals can be improved. The issue here is the distribution curves are too big. Would it be possible to do compress them? To plot these curves, I manually inserted the domain values along with the mean and variance of the respective datasets. Please find the code of the plots below:

\documentclass[border=5mm]{standalone}
\usepackage{pgfplots}
\usepackage{pgfplotstable}
\usepackage{tikz}
\usepgfplotslibrary{statistics}

\definecolor{saffron}{HTML}{FF9933} \definecolor{brickred}{HTML}{F96302} \definecolor{grenadier}{HTML}{D44700} \definecolor{grandisorange}{HTML}{FFCF79}

\pgfplotsset{ jitter/.style={ x filter/.code={\pgfmathparse{\pgfmathresult+rnd*#1}} }, jitter/.default=0.1 }

\begin{document}

\begin{tikzpicture}

%--- PLOTTING DATA ---%

\pgfplotstableread[row sep = \]{ x data \ 1 13.7219 \ 1 10.1599 \ 1 10.7791 \ 1 9.5097 \ 1 6.6125 \ 1 9.9981 \ 1 8.1011 \ 1 11.2725 \ 1 10.105 \ 1 12.0993 \ 1 10.5794 \ 1 11.7101 \ 1 9.9047 \ 1 9.8042 \ 1 7.6605 \ 1 8.3959 \ 1 10.2814 \ 1 5.2662 \ 1 9.861 \ 1 10.2249 \ 1 9.8222 \ 1 12.3537 \ 1 8.3699 \ 1 10.6791 \ 1 9.9997 \ 1 10.1496 \ 1 10.833 \ 1 12.2066 \ 1 6.4986 \ 1 10.0193 \ 1 9.1327 \ 1 8.4368 \ 1 9.162 \ 1 7.5179 \ 1 6.7939 \ 1 7.9497 \ 1 12.8426 \ 1 5.68 \ 1 12.1573 \ 1 8.3866 \ 1 12.8471 \ 1 7.6095 \ 1 8.569 \ 1 9.6969 \ 1 10.9593 \ 1 11.5845 \ 1 9.7342 \ 1 11.3692 \ 1 12.067 \ 1 8.6466 \ }\dataA \pgfplotstableread[row sep = \]{ x data \ 2 10.6485 \ 2 12.1327 \ 2 8.3516 \ 2 6.8364 \ 2 9.087 \ 2 9.7096 \ 2 11.559 \ 2 12.0554 \ 2 8.6433 \ 2 12.6913 \ 2 9.6378 \ 2 7.6447 \ 2 8.1357 \ 2 8.5953 \ 2 6.7866 \ 2 4.49 \ 2 6.3017 \ 2 15.3823 \ 2 7.0121 \ 2 13.2661 \ 2 10.2045 \ 2 8.5268 \ 2 8.4672 \ 2 8.8128 \ 2 9.659 \ 2 7.7304 \ 2 10.6145 \ 2 8.4329 \ 2 10.7728 \ 2 9.4515 \ 2 8.1012 \ 2 9.4598 \ 2 8.2576 \ 2 9.2268 \ 2 6.8337 \ 2 7.8323 \ 2 10.8946 \ 2 10.1366 \ 2 6.3889 \ 2 5.5418 \ 2 9.3919 \ 2 9.3976 \ 2 7.3574 \ 2 11.4605 \ 2 7.8299 \ 2 8.0555 \ 2 9.3363 \ 2 8.2947 \ 2 8.9302 \ 2 4.9484 \ }\dataB \pgfplotstableread[row sep = \] { x data \ 3 11.1575 \ 3 6.3445 \ 3 7.6768 \ 3 6.8726 \ 3 6.1151 \ 3 8.0012 \ 3 7.3762 \ 3 7.2804 \ 3 4.5495 \ 3 8.7247 \ 3 7.1302 \ 3 5.6378 \ 3 9.6559 \ 3 7.6132 \ 3 7.7771 \ 3 7.182 \ 3 5.9942 \ 3 7.8322 \ 3 3.3087 \ 3 7.5138 \ 3 9.9824 \ 3 7.4416 \ 3 7.5475 \ 3 8.0151 \ 3 7.9198 \ 3 6.2829 \ 3 7.3886 \ 3 4.9581 \ 3 4.36 \ 3 6.6295 \ 3 7.805 \ 3 6.5626 \ 3 7.0912 \ 3 7.6083 \ 3 6.0897 \ 3 8.5777 \ 3 4.0153 \ 3 8.4225 \ 3 7.2019 \ 3 5.1663 \ 3 3.9603 \ 3 7.5764 \ 3 7.3596 \ 3 8.2149 \ 3 6.9772 \ 3 4.9117 \ 3 6.6025 \ 3 6.8943 \ 3 7.1555 \ 3 5.7075 \ }\dataC

\begin{axis} [boxplot/draw direction=y, xlabel={Classes}, ylabel={Value}, height=6cm, ymin=0,ymax=20, xtick={1, 2, 3}, boxplot/box extend=0.1,
boxplot/whisker extend=0.03, every axis plot/.append style={fill,fill opacity=0.5}, cycle list={{cyan},{orange},{black}}]

\addplot+ [boxplot] table [y index = 1] {\dataA}; \addplot+ [boxplot] table [y index = 1] {\dataB}; \addplot+ [boxplot] table [y index = 1] {\dataC};

\addplot+ [jitter=0.2, only marks, mark size=.7pt, xshift = 0mm] table [y index = 1] {\dataA}; \addplot+ [jitter=0.2, only marks, mark size=.7pt, xshift = 0mm] table [y index = 1] {\dataB}; \addplot+ [jitter=0.2, only marks, mark size=.7pt, xshift = 0mm] table [y index = 1] {\dataC};

\addplot+ [domain=5.266:13.7219, fill=none, smooth] ({1+1exp(-pow(x-9.76243,2)/3.6622)},x); \addplot+ [domain=4.489:15.3822, fill=none, smooth] ({2+1exp(-pow(x-8.98630,2)/4.4150)},x); \addplot+ [domain=3.309:11.1574, fill=none, smooth] ({3+1*exp(-pow(x-6.96339,2)/2.3290)},x);

\end{axis} \end{tikzpicture} \end{document}

enter image description here

crypto
  • 757
  • 6
    You are asking about at lot of different things, that you have not tried to do yourself. You should focus on one thing per question and describe what you have tried and what is causing you problems. – hpekristiansen Sep 24 '21 at 02:06
  • 1
    To decrease the width of the curves you can simply replace the 1 with a smaller value, e.g. 0.7: 1+0.7*exp(... – dexteritas Sep 24 '21 at 12:00

1 Answers1

2

Are you searching for something like the following?

% used PGFPlots v1.18.1
\documentclass[border=5pt]{standalone}
\usepackage{pgfplots}
\usepackage{pgfplotstable}
    \usepgfplotslibrary{statistics}
    \pgfplotsset{
        compat=1.3,
        jitter/.style={
            x filter/.code={\pgfmathparse{\pgfmathresult+rnd*#1}}
        },
        jitter/.default=0.1
    }

\begin{document} \begin{tikzpicture}[ /pgf/declare function={ mu1 = 9.76243; sigma1 = 3.6622; mu2 = 8.98630; sigma2 = 4.4150; mu3 = 6.96339; sigma3 = 2.3290; factor = 1.5; amplitude = 0.75; }, ]

\pgfplotstableread{
    a        b        c
    13.7219  10.6485  11.1575
    10.1599  12.1327  6.3445
    10.7791  8.35160  7.6768
    9.50970  6.83640  6.8726
    6.61250  9.08700  6.1151
    9.99810  9.70960  8.0012
    8.10110  11.5590  7.3762
    11.2725  12.0554  7.2804
    10.1050  8.64330  4.5495
    12.0993  12.6913  8.7247
    10.5794  9.63780  7.1302
    11.7101  7.64470  5.6378
    9.90470  8.13570  9.6559
    9.80420  8.59530  7.6132
    7.66050  6.78660  7.7771
    8.39590  4.49000  7.182
    10.2814  6.30170  5.9942
    5.26620  15.3823  7.8322
    9.86100  7.01210  3.3087
    10.2249  13.2661  7.5138
    9.82220  10.2045  9.9824
    12.3537  8.52680  7.4416
    8.36990  8.46720  7.5475
    10.6791  8.81280  8.0151
    9.99970  9.65900  7.9198
    10.1496  7.73040  6.2829
    10.8330  10.6145  7.3886
    12.2066  8.43290  4.9581
    6.49860  10.7728  4.36
    10.0193  9.45150  6.6295
    9.13270  8.10120  7.805
    8.43680  9.45980  6.5626
    9.16200  8.25760  7.0912
    7.51790  9.22680  7.6083
    6.79390  6.83370  6.0897
    7.94970  7.83230  8.5777
    12.8426  10.8946  4.0153
    5.6800   10.1366  8.4225
    12.1573  6.38890  7.2019
    8.38660  5.54180  5.1663
    12.8471  9.39190  3.9603
    7.60950  9.39760  7.5764
    8.56900  7.35740  7.3596
    9.69690  11.4605  8.2149
    10.9593  7.82990  6.9772
    11.5845  8.05550  4.9117
    9.73420  9.33630  6.6025
    11.3692  8.29470  6.8943
    12.0670  8.93020  7.1555
    8.64660  4.94840  5.7075
}\data

\begin{axis}[
    boxplot/draw direction=y,
    xlabel={Classes},
    ylabel={Value},

% height=6cm, ymin=0,ymax=20, xtick={1, 2, 3}, boxplot/box extend=0.1, boxplot/whisker extend=0.03, every axis plot/.append style={fill,fill opacity=0.5}, cycle list={{cyan},{orange},{black}}, smooth, ]

    \foreach \i in {0,1,2} {
        \addplot+ [boxplot] table [y index=\i] {\data};
    }

    \foreach \i in {0,1,2} {
        \addplot+ [
            jitter=0.2,
            only marks,
            mark size=.7pt,
        ] table [
            x expr=\i+1,
            y index=\i,
        ] {\data};
    }

    \pgfplotsinvokeforeach{1,2,3}{
        \addplot+ [
            draw=.!50,
            fill=none,
            domain=mu#1-factor*sigma#1:mu#1+factor*sigma#1,
        ] ({#1+amplitude*exp(-pow(x-mu#1,2)/sigma#1)},x);
    }

\end{axis}

\end{tikzpicture} \end{document}

image showing the result of above code

Stefan Pinnow
  • 29,535
  • Thanks for the answer (and a huge cleanup of my code). When we set the amplitude to 0.75, does it distort any information presented in the graph? And what is the role of 'factor' in the code? – crypto Sep 26 '21 at 22:57
  • 1
    You are welcome. Since amplitude changes all x-values I don't think that anything gets distorted. Regarding factor I suggest: Simply change its value and see what changes in the output ;) – Stefan Pinnow Sep 27 '21 at 05:26