1

I generated a list of 8000000 random numbers in python between 1 and 100000000000000.

import random

random_numbers = random.sample(range(1, 100000000000000), 8000000)

#Initialize empty list of random zeron elements
counters = [0] * 10

for item in random_numbers:
    first_digit = str(item)[0] #get first digit of element
    counters[int(first_digit)] += 1 #Increase corresponding counter

for i, c in enumerate(counters):
    print("Number of elements starting with %d is %d" % (i, c))

My result was

Number of elements starting with 0 is 0
Number of elements starting with 1 is 890472
Number of elements starting with 2 is 889404
Number of elements starting with 3 is 887623
Number of elements starting with 4 is 888416
Number of elements starting with 5 is 889126
Number of elements starting with 6 is 888614
Number of elements starting with 7 is 889199
Number of elements starting with 8 is 888683
Number of elements starting with 9 is 888463

It is a little bit weird, because that is not respect the benford's law. Could anyone be able to explain this.

Thanks in advance!

J.Doe
  • 13
  • 1
    You chose numbers uniformly in that range...that means that each of those buckets should be equally probable. Try the same thing if you double the cap. – lulu Apr 01 '17 at 17:36
  • Benford applies when the cap isn't a power of $10$. If your cap is, say, $2\times 10^n$ then more than half of the numbers will begin with $1$. – lulu Apr 01 '17 at 17:37
  • Please complete a develop answer. – J.Doe Apr 01 '17 at 17:45
  • 1
    From Wikipedia: "Benford's law, also called the first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data." So, this "law" does not say what you expect. – zoli Apr 01 '17 at 17:51
  • NOTE: This is not how you generate random numbers. Python's random.sample draws subsets of ${1,\dots,10^{14}}$, i.e. doesn't allow for repeated numbers. Given that your sample size is relatively small, this is not a main issue though, but as others have pointed out, Benford's law is a heuristical observation about probability distribution occuring in some, but not all, real-world scenarios, not about every possible distribution and certainly not about a fixed uniform distribution. – Bananach Apr 01 '17 at 17:57

1 Answers1

1

You are looking at numbers in one decade. Benford's Law applies to numbers which occupy a large number of decades:

From the Wikipedia article:

It tends to be most accurate when values are distributed across multiple orders of magnitude.

Furthermore, your data was generated to have a uniform distribution. This does not represent a natural, scale-independent distribution of numbers.

robjohn
  • 345,667
  • To elaborate: certainly it is true that numbers between $1$ and $10^{14}$ are distributed across multiple orders of magnitude. However, $90%$ of them are $14$ digits long, and $90%$ of the remainder are $13$ digits long, so this is barely noticeable. A "textbook" example of a data set to which Benford's law applies, the Fibonacci numbers, has about an equal number of $k$-digit numbers for any value of $k$. – Misha Lavrov Apr 01 '17 at 18:25