This article will be just a quick one. It's a few line of code recipe on how to mitigate IP restrictions and WAFs when crawling the web. If you're reading this you probably already already tried web scraping. It's all easy breezy until one day someone managing the website you're harvesting data from realizes what happens and blocks your IP. If you're running your scrappers in an automated way you'll start seeing them failing miserably. You'll probably want to solve this problem fast, before any of precious data slips through your fingers.
Recently I've seen more and more static sites like blogs and using company landing pages using S3 behind CloudFront as their hosting infrastructure. While CloudFront + S3 combo is very cost efficient and for usual use-cases never bends under heavy traffic, setting it up correctly is not trivial. If you're using a static page generator like Pelican, Jekyll or Hugo or any kind of isomorphic single page application and plan to host it using AWS S3/CloudFront combo, this might be a read for you.
I remember when a few years ago I bumped into Beginning Scala by David Pollak. At the time I had no idea how this book would impact my future choices on which programming languages to learn, which technologies to invest my time in, and what I consider "good stuff" in general. Reading it must have been one of these movie-like moments of enlightment, with a heavenly beam of light striking you from above Mr. Bean style. Even today I can clearly remember the title of chapter that made the strongest impression. It was Collections and the Joy of Immutability.
This morning my friend messaged me with a question if he should "use monads" to solve his problem. I was just writing a short explanation for him, but in some mysterious ways it grew longer and longer, to finally become this beast. I've decided to put it online but first spread a sefty net by placing a disclaimer paragraph.