Resources
Here are some helpful resources for learning about web scraping.
First, here's a few general purpose articles with a lot of overlap, but which provide the basics:
- How To Scrape A Website Without Getting Blacklisted
- How to Scrape Websites Without Getting Blocked
- How Websites Detect Web Scraper
- 12 Web Scraping Best Practices You Should Follow in 2021
Second, some more technical articles and sites, most with an accompanying test page:
- It is not possible to detect and block Chrome Headless
- Show my request headers
- Show entire request
- What's my User Agent
- Avoiding Bot detection: How to scrape the web without getting blocked
- https://niespodd.github.io/browser-fingerprinting/
- Headless Chrome Detection Tests
- Using Google Cache to crawl a website