Does finding the number of unique elements in a set actually require comparison of each element with everything else? Can't you use a hashtable? For every element, add it to the table (ignore if already exists), and finally, take a count of keys.
Imagine 1PB of data and you expect 30% of it to be unique. That needs 300TB RAM to store unique elements. Keep in mind the values in a hash table are the elements themselves, so a hastable of perhaps 300TB. Doing that without that much RAM, even swapping to disk can be tough.
Using a hashtable is the "normal" approach mentioned in the article. It works of course, but requires memory to store each unique element (or their hashes). If you have less memory available, the described algorithm can still give a very good approximation.
Using a hashtable is effective because you only compare elements within their hash buckets, not the entire set. However, they can become inefficient with very large datasets due to memory usage and processing time, which is where approximate counts shine.
This algorithm is still spinning a lot of random. I would guess that this is much less overhead than hashing but still seems like it could be significant.
That is fine when you have say 1 million values and only 1000 are unique.
But when you have 1 million values and about 900 thousand are unique you are putting more or less the whole data set into memory.