3

I've tried inurl:http but it takes forever to get even a bunch of sites right and I have to think of new keywords everytime to get the sites.

Is there kind of a directory I could use to get the address of all the sites on the public web?

Nicolas Raoul
  • 8,426
  • 5
  • 28
  • 61
Ayush gangwar
  • 31
  • 1
  • 2

1 Answers1

3

There are an estimated 1.6 billion public websites in the world, with 200 million being active. But no one really knows, because a website can be just an IP address with no domain, or an .onion link, or temporary or short-lived, or weather-dependent, ...


One way would be to use known DNS records (domain names) from the 2013 DNS Census.

It is a DNS registration dataset snapshot taken in 2013. Compressed - it is ~15GB and uncompressed 157GB.

They claim it contains: Dataset containing 2,676,380,336 DNS records and 106,928,034 domains


A more modest list would be, for example, the Alexa 1 million list:

Scripts for scanning the Alexa top 1 million sites and providing generic statistics about them.

Direct link: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip


Or, like my comment, loop over IPv4 addresses... and record if each IP is a valid http/https server.

Here's an estimate about how big your for loop will get:

According to Reserved IP addresses there are 588,514,304 reserved addresses and since there are 4,294,967,296 (2^32) IPv4 addressess in total, there are 3,706,452,992 public addresses.

philshem
  • 17,647
  • 7
  • 68
  • 170