2

I have about five thousand observations which I would like to plot in a graph. Since the range of values is wide (from 1 to 108,000, with 90% of the observation taking a value of 2, 99% of the observations taking a value between 1 and 6, and just one observation taking the max value of 108,000) I thought that the only way to represent them graphically would be with a volume graph--that is, 3D. The circumference of the sphere will range from 3.89 to 185.62 (then possibly--keeping the same ratio--0.08cm to 3.7cm) in the figure.

This is my data along with the circumference (in cm) of a sphere of volume equivalent to the value of Var1

enter image description here

EDIT: TikZ in drawing a circle requires as attribute the radius not the circumference as I wrongly thought. I correct the formula below accordingly.

Of course, the graph doesn't need to show clearly all the small values. What I would like to give is the idea of a big sphere floating in a sea of micro spheres. Something like this:

enter image description here

First let's define a script to get the radius from the volume of the sphere. The R script is:

radiusFromVolume <- function(volume){
  radius <- (volume/((4/3)*pi))^(1/3)
  radius <- radius/10
  radius <- round(radius, digits=2)
  return(radius)
}

From the R script I will obtain a vector of radius values. Now, the hard part. On these values I would like to draw a sphere for each item in the vector and dispose them (randomly?) in my figure.

Thanks to this answer (by Tom Bombadil) I come out with this basic script which gives me two spheres of radius 0.06 and 2.96.

\documentclass[a4paper]{article}

\usepackage{tikz}

\begin{document}

\begin{tikzpicture}
    \draw (0,0) circle (2.96cm);
    \shade[ball color=blue!10!white,opacity=0.70] (0,0) circle (3.7cm);
    \draw (3,0) circle (0.06cm);
    \shade[ball color=blue!10!white,opacity=0.70] (3,0) circle (0.08cm);
\end{tikzpicture}

\end{document}

Moving from this script, in pseudocode what I am trying to get through is:

vector <- {volume values} % The values are actually ordered from bigger to smaller to avoid the small spheres to be covered by the biggest spheres
figure_box <- {(-4,-4),(4,4)}

\begin{tikzpicture}


for (item in vector) {

size <- item 
coordinates <- ( random(>-4 && <4) , random(>-4 && <4) )
\draw (coordinates) circle (size cm);
\shade[ball color=blue!10!white,opacity=0.70] (coordinates) circle (size cm);

}
\end{tikzpicture}

Is it possible to obtain with TeX and tikz?

EDIT:

Why not to use a log/log scale? The problem with my data is that I have one big outlier. A log-log plot of my data will be something like this

enter image description here

or like this

enter image description here

Clearly my outlier--for being so far off the other values--doesn't even show (you can only tell it is there from the X scale). Also log scales are not that easily understood by the median viewer. (On the contrary, differences in ball volumes are much easier to see...)

Francesco
  • 4,624
  • 4
  • 35
  • 66
  • your question is confusing, I don't understand what you're trying to achieve. How the percentages 90% + 99% are supposed to add up to 100%? Could you please rephrase your question and if possible add a sketch of what you have in mind as the final plot? – juliohm Oct 30 '13 at 09:54
  • 2 is between 1 and 6. I will add a sketch. – Francesco Oct 30 '13 at 10:02
  • http://meta.tex.stackexchange.com/questions/1272/why-doesnt-maths-render-as-maths – percusse Oct 30 '13 at 10:37
  • 2
    I have to say, I don't think this kind of graph is a great idea: it's an effort for the reader to figure out exactly what it is supposed to show, and even then, I don't think the final picture would convey much more information than the phrase "a big sphere floating in a sea of microspheres." Have you considered a histogram with logarithmic scales on both axes? – Charles Staats Oct 30 '13 at 11:37
  • See my edit on log/log plot. – Francesco Oct 30 '13 at 22:04

2 Answers2

5

Neither should you use a logarhitmic bar chart (you can choose the "base" to be e.g. 101 or 10-5, which would drastically alter how the "bars" look), nor should you use a continuous line, as you have discrete values (there's no data for 1.5). In my opinion a double logarhitmic scatter plot would be the best:

Code

\documentclass[tikz,border=2mm]{standalone}
\usepackage{pgfplots}
\pgfplotsset{compat=1.9}
\usepackage{kerkis}

\begin{document}

\begin{tiny}

\pgfplotsset{grid style={very thin,gray!30}}

\begin{tikzpicture}
    \begin{loglogaxis}
    [   scatter,
        scatter src=y,
        only marks,
      x tick label style={align=right,rotate=90},
      enlarge x limits=0.01,
      enlarge y limits=0.02,
      grid=minor,
      log ticks with fixed point,
      xmax=200000,
    ]
        \addplot coordinates {(1,180) (2,5400) (3,240) (4,120) (5,40) (6,20) (108000,1)};
    \end{loglogaxis}
\end{tikzpicture}

\end{tiny}

\end{document}

Output

enter image description here

Tom Bombadil
  • 40,123
  • Very nice output! I would even compress the x axis further to have only even powers of ten but that's a matter of taste. Minor nitpick; double logarithm is often used for log(log(x)). – percusse Oct 31 '13 at 12:19
  • @percusse: Thanks. But log(log(x))? I've never seen that before. Do you have an example for that, or at least a field where that is used? – Tom Bombadil Oct 31 '13 at 12:22
  • A quick example here http://tex.stackexchange.com/questions/69032/single-double-logarithmic-axis – percusse Oct 31 '13 at 12:59
1

I ended up with this solution:

\documentclass[a4paper]{article}

\usepackage{datatool, tikz}
\begin{filecontents*}{test.csv}
id,ball_radius
1,2.96
2,0.14
3,0.14
4,0.13
5,0.12
6,0.12
7,0.12
8,0.12
9,0.12
10,0.12
11,0.12
12,0.12
13,0.12
14,0.12
15,0.11
\end{filecontents*}

\begin{document}

\DTLloaddb[noheader=false]{radius}{"test.csv"}

\begin{tikzpicture}
 \DTLforeach*{radius}{\radius=ball_radius} {%
    \pgfmathsetmacro\X{rand*1.5}
    \pgfmathsetmacro\Y{rand*1.5}
    \draw (\X,\Y) circle (\radius);
    \shade[ball color=green] (\X,\Y) circle (\radius);}  
\end{tikzpicture}

\end{document}

With output:

enter image description here

Francesco
  • 4,624
  • 4
  • 35
  • 66