16

I have a large data set consisting of $\mathcal{O}(10^9)$ two-dimensional points. In order to save memory and time I have pre-binned these into a uniform grid of $500 \times 500$ bins using Fortran. When imported into Mathematica 8.0 as a table the resulting data look like:

data = {{0.388348, 0.388349, 9},{0.388348, 0.776699, 23},...},

where the first two items of each entry correspond to the $x$-$y$-coordinates of the upper-right-hand corner of the bin and the third is the count.

Edit:

  • For a sample of the raw data, raw=RandomReal[1,{1000000000,2}] is a good approximation. This is obviously unworkable.

  • For the binned data: binned=Table[{.01*Ceiling[raw[[i,1]]/.01],.01*Ceiling[raw[[i,2]]/.01],RandomInteger[1000]},{i,1,250000}].

I would like to plot this pre-binned data set in the form of a DensityHistogram, but my data format doesn't fit into what this function is expecting. I have reviewed a similar question for one-dimensional histograms at Histograms with pre-counted data, however I'm at a loss as to how to apply this to 2-D. I have also looked at doing

Image[Rescale[data]]

on the raw data. However, this crashes immediately with a SIGSEGV error that has the Wolfram Support team puzzled. Consequently, I haven't gone very far down this road.

Edit:

  • I have also tried ListDensityPlot[data,InterpolationOrder->0]. For the full data set, Mathematica hangs for over 10 minutes, at which point it runs out of memory and the kernel shuts down. For a subset of the data, I get something more reasonable, but I would need some way to scale this up to $500^2$ data points.

Making these plots seem to be something that is fairly easily done in Matplotlib, but I have already made some other plots in Mathematica and don't want to mess with different styles. I'm fairly new to Mathematica and don't have a good knowledge of all the functionality, unfortunately.

So, how can I make a DensityHistogram when the bins and counts have already been calculated?

cosmoguy
  • 185
  • 1
  • 8
  • 1
    ListPlot3D[data, InterpolationOrder -> 0, Filling -> Bottom, Mesh -> None] - can you try this and tell me what you see? BTW, welcome to MSE! Also can you upload somewhere your binned and original data sets and provide a link? – Vitaliy Kaurov Oct 01 '12 at 03:47
  • @VitaliyKaurov--With the $\mathcal{O}(500^2)$ binned set, this ran out of memory. With 1000 points it gave me a 3D plot with the $z$-axis as the count number. I would need a flat, 2D plot that is similar to DensityHistogram. The binned data looks like what I have given above, just $500^2$ of them, and the unbinned data looks similar to just the first two items in each entry. – cosmoguy Oct 01 '12 at 03:58
  • 1
    ListDensityPlot[data, ColorFunction -> "SouthwestColors"] - then try this and let us know the result. – Vitaliy Kaurov Oct 01 '12 at 04:04
  • @VitaliyKaurov--This was what I tried first, actually. I really need two things: 1) no or very little interpolation and 2) the full $500^2$ data set plotted. With a simple ListDensityPlot Mathematica just hangs forever (10+ minutes) and I'm too impatient to see what the results are. With a truncated data set I get a washed-out density plot that loses sight of substructure. With InterpolationOrder->0 it's starting to resemble what I want, but the plotting time is still very slow. – cosmoguy Oct 01 '12 at 04:14
  • what about uniform binning? binned = BinCounts[raw, {0, 1, 1/100.}, {0, 1, 1/100.}]*1.; binned /= Max[binned]; binned // Image; it works for 10^8 points in about 20 seconds – chris Oct 01 '12 at 06:59
  • @chris--This is very close. I'm running into some memory problems when doing BinCounts on the full data set, but I'm able to extract this information from the bin counting I have already done in Fortran. How can I go about now superimposing some axes and ticks onto the image? – cosmoguy Oct 01 '12 at 20:25
  • I found this, which seems to have an answer for putting ticks on the image. – cosmoguy Oct 01 '12 at 20:40
  • @cosmoguy Wouldn't you want Image[Rescale[data[[All,All,3]]]? – s0rce Oct 04 '12 at 23:36
  • Do you really need a density plot? Cannot you just use a ListContourPlot[]? IMHO, this looks much clearer and you can really see the pattern. What is more, you can also output a color-bar and use many more other features... – Lady InRed Oct 09 '12 at 12:33
  • @s0rce--The binned data is stored as an $N \times 3$ table, so data[[All,All,3]] is undefined. Substituting data[[All,3]], though, gives an error when doing Image: The specified argument ... should be an array of rank 2 or 3 with machine-sized numbers. – cosmoguy Oct 09 '12 at 19:37
  • @AnastasiiaAnishchenko--A ListContourPlot is not ideal when the data has fine substructure as the contour borders obscure the image. A color-bar is very interesting, however, and is something I would like to implement, if possible, from chris' suggestion above. – cosmoguy Oct 09 '12 at 19:40
  • @chris--I have used your suggestion. If you want to write it up, I'll accept it. – cosmoguy Oct 09 '12 at 19:40

1 Answers1

15

You may be able to use the new WeightedData in version 9 with HistogramDistribution to create a weighted histogram. I've reduced the number of points for speed but it should hopefully scale to your actual problem.

raw = RandomReal[1, {10000000, 2}];

binned = Table[{.01*Ceiling[raw[[i, 1]]/.01], .01*
     Ceiling[raw[[i, 2]]/.01], RandomInteger[1000]}, {i, 1, 25000}];

Now I create the WeightedData using your bin counts and fit a HistogramDistribution to them. Note that you can set a different binning if you choose but I'm using the automatic binning.

wd = WeightedData[binned[[All, 1 ;; 2]], binned[[All, 3]]];

hd = HistogramDistribution[wd];

Now to use DensityPlot to visualize the PDF.

DensityPlot[PDF[hd, {x, y}], {x, 0.01, 1}, {y, 0.01, 1}, 
 PlotRange -> All, Exclusions -> None, PlotPoints -> 50, 
 PlotLegends -> Automatic]

enter image description here

Andy Ross
  • 19,320
  • 2
  • 61
  • 93