0

My website has hundreds of thousands of html pages that are open to public. Each time a html page is requested, a call will be made to my database to get the correct data. Therefore the cost of each html request is not cheap (a call to databse is needed).

I know there are rotating proxy services out there that let users send each request with a different IP. They have a pool of hundreds or thousands of IP addresses so any rate limit applied to IP addresses will not work.

I wonder what is the best practice to defend my website if an attacker uses a proxy service to spam requests to my website? Appreciate any help

I thought about caching the database'data but since the list of my website's hundreds of thousands of html pages are available in the sitemap, the attacker could easily loop through all the html pages to request. I cannot cache the entire database.

Steffen Ullrich
  • 201,479
  • 30
  • 402
  • 465
  • Welcome to the community. Have you considered setting a robots.txt against crawlers (doesn't protect against malicious ones though), setting no-index and similar tags in your HTML? Also you could and should hide your sitemap if you don't want it indexed by search engines – Sir Muffington Feb 23 '24 at 16:55

1 Answers1

1

This is not trivial and there is no solution which works forever. It is a common problem though which basically boils down to detect automated access (bots) for a variety or reasons - scanning, grabbing content, spamming, ...

Content delivery networks (like Cloudflare, Akamai, Fastly and others) often provide bot detection and blocking as a service (in addition to caching). There are also security products you can buy to put in front of your server or integrate in your server. Once a bot is detected it will either be blocked or (in case one cannot be sure that this is really a bot) confronted with some kind of captcha or similar.

See also How can I detect and block bots?

Steffen Ullrich
  • 201,479
  • 30
  • 402
  • 465
  • thanks Steffen. The answer in that post is 13 years old, do you happen to know more recent discussions about this topic? – Tuan Do Feb 23 '24 at 21:53
  • @TuanDo: There is no significant change in what type of measures are taken, only more refined both at the attackers and the defenders side. See also https://www.google.com/search?q=bot+protection – Steffen Ullrich Feb 23 '24 at 22:07