Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The sad part is that it's trivial to get around CF's bot protection if you're writing a bot (just use curl-impersonate and buy residential IPs), but it's pretty much impossible to bypass as a human if their magical black box doesn't like your browser and/or IP address.


It's the same for spam email, yet most spam gets caught in spamassassin rules that were written 20 years ago and haven't seen much improvement since then. Most bad guys just don't bother to do anything above the bare minimum. For example, I see lots of email getting caught in a rule that checks for incorrectly formatted pseudo-Outlook mailer header, which is trivial to circumvent if you pay any attention to it (the difference is in excessive whitespace, or a slightly incorrect "Outlook" version, or something like that).


see also: The surprising effectiveness of simply asking the spam server to try again(sometimes called graylisting). It shouldn't work at all, but proves to filter an awful lot of the worst mail noise.

http://man.openbsd.org/spamd


> it's pretty much impossible to bypass as a human if their magical black box doesn't like your browser and/or IP address

There are residential-IP-backed VPN services that you can use just like commercial VPN services — but they're mostly built on the backs of botnets, so it's ethically questionable to use them.


FWIW, StarVPN claims to have "ethically sourced" IPs. That is, not from botnets. Their pricing is quite a bit higher than many (cheapest plan is $20/month), but could be worth trying.

https://www.starvpn.com/


The "residential VPN" providers setup fake ISPs or buy AT&T/Verizon business circuits with large blocks of IPs and sell them as residential.

They are easily detected if you are buying IP intelligence from one of the higher quality providers: https://app.spur.us/context?q=STARVPN_PROXY


The linked page shows a sign-in screen.


Spur access requires a free account.


That's helpful to know. I wasn't aware of this.


You could also use Tailscale back to your own IP if the goal is not having to trust public WiFi.


To note, IP is only a part of it, and the full extent of what's baked into a CF score will never be explicited (for obvious reasons).

CloudFront being way past the simple blocking of IP addresses, I wouldn't be surprised if a mismatch between your IP block and your language/cookies would be enough to lower your score.


This is great for bypassing the server side bot detection but not the client side one, where it will attempt to verify the integrity of your browser environment.


Well yeah, if you’re a legitimate user, CF will block you.

It’s only easy to bypass if you’re scraping or doing nefarious stuff.


Surprisingly, it still works as intended. Yes, it won't keep professionals and dedicated bot-fabricators out, but that's like 5% of the botters out there; the rest are the bot equivalent of script kiddies who can't be bothered, and it filters them great. Meanwhile, the script kiddies have a process that still works on non-CF sites, so they don't need to improve their process.


We bypassed it by switching to starlink. Now my IP address is a too-big-to-fail CGNAT.

The old IP address was a mom-and-pop CGNAT.

Thanks CF, for protecting us from capitalism, I guess?


That's same for almost all surveillance/tracking tech. It's always trivial for criminals/abusers to bypass. The surveillance is just about controlling the sheep.


How does it get around captchas?


If they don't think you're suspicious they don't make you do the captchas, and as others have mentioned you can always outsource it to captcha farms. There are also AI models which do a fairly decent amount, and since most captchas let you repeat attempts with new patterns you can have a pretty high error rate to get past them. Then there's the ADA, which requires accessibility- many captchas have an audio component as a backup and those are easy to interpret by models.


curl-impersonate doesn't solve CAPTCHAs, but the goal is to look enough like a human that Cloudflare doesn't present a CAPTCHA in the first place.


Cloudflare turnstile isn't even a captcha. The user just has to tick a box. Behind the scenes there's a javascript challenge to make sure you're vaguely a browser and not some script a bazillion requests per minute.


It's also used for proof of work as many scrapers are using thousands of IPs but only a few CPUs


You pay contract workers in a third world country a tiny amount of money per day, to spend all day clicking boxes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: