0

I'm working with a set of data binning values. I'm using bincount and binlists to find bins that meet a certain criteria and then testing a list to see which are within one of those bins. This list of edges is exported to python code to use in a python application.

When testing a set of values to identify which are within any of those bins my code produces a differing count than Total@binCount[mydata, binwidth] does.

data = {1, 1, 5, 5 , 12, 12}
data // Length
Total@BinCounts[data, 3]

My bin candidacy code is wrong. I'd like to get the edges of the bins, something like this:

data = {1, 1, 5, 5 , 12, 12}
BinEdges[data, 3]
{{0, 2}, {3, 5}, {6, 8}, {9, 11}, {12, 14}}

data // Length Total@BinCounts[data, 3]

Ultimately my code will look something like this.

validdata = RandomInteger[{0, 1000}, 10000];
testdata = RandomInteger[{0, 1000}, 1000000];
MapIndexed[If[ #1 > 10, #2[[1]], Nothing] &, BinCounts[validdata, 10]]

Then identify which values in test data lay within a bin with > 10 values in it. These edges will be export to a python application for use.

Is this the correct logic for generating bin edges and testing inclusion in a bin?

dx = 10;
data = Range[1, 100];
BinCounts[data, dx]
BinLists[data, dx]
data // Length
Total@BinCounts[data, dx]
binedges = 
 MapIndexed[{#2, start + (dx*#1) - dx, (start + (dx*#1)) - 1} &, 
  Range[1, ((data // Length)/dx) + 1]]
numinbin[nums_, start_, end_] := Map[ start < # <= end &, nums];

Edit: As I'm working with bignums brute force testing against bins is too slow.

  • I'm still working on this. Any attempt I try gets a different result from BinCounts/binlists. I really do need to know what binlists uses foe edges. Please help. – kernel density Feb 29 '24 at 02:49

1 Answers1

0

You may define a function: "whichBin" that tells ayou in which bin a numbers goes:

whichBin[x_, data_, dx_] := 
 Module[{start = Ceiling[Min[data] - dx, dx], 
   end = Floor[Max[data] + dx, dx], i = 1},
  While[start + i  dx <= x, i++];
  i
  ]

To test we can apply it to the numbers 0..12:

data = {1, 1, 5, 5, 12, 12};
whichBin[#, data, 3] & /@ Range[0, 12]

{1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5}

Daniel Huber
  • 51,463
  • 1
  • 23
  • 57
  • This does technically work. But I am using bignums, so this is so slow it never finishes even for small lists of numbers. I'm stuck finding the bin edges. – kernel density Feb 16 '24 at 19:55