http://www.sistrix.com/blog/985-google-farmer-update-quest-f... and http://searchengineland.com/who-lost-in-googles-farmer-algor... are other links I've seen. Bear in mind that much of this third-party analysis compares e.g. US queries vs. queries against Google from Canada, Italy, or India, and geolocation can change the results. Also, different people are running different sets of queries and that subsampling can skew things depending on the query sets. Please bear those disclaimers in mind with any third-party analysis.
Matt, I've seen some high quality smaller sites (Internet versions of traditionally published books with original content) drop as well. Is there any place to submit such sites in order to help this algorithm improve?
Looking through this list of larger sites that dropped, I see one that is similar to the sites I'm talking about, so more people might be able to say what happened: findarticles.com. I have no relation to this site (it is actually a competitor), but I believe the vast majority of its content is licensed versions of articles published by real books, magazines, and newspapers (http://findarticles.com/p/articles/an_1/?browse=A&tag=co...). It seems to have dropped just as much as content mills with low quality articles, such as ezinearticles.com or articlesbase.com. It is possible that I don't know something about findarticles, but this makes it appear like this algorithm change is looking at some superficial common factors and is unable to really distinguish high quality sites from low quality sites.
> licensed versions of articles published by real books, magazines, and newspapers
That makes it sound like Find Articles is quite likely to have a lot of duplicate content, one of the things that Google's update was meant to penalize:
> This update is designed to reduce rankings for low-quality sites—sites which are low-value add for users, copy content from other websites or sites that are just not very useful.
Maybe, but this quote talks about punishing sites copying content from other websites, which is not what they are doing. Most newspaper sites will have the same versions of AP or Reuters articles as well, but this new algorithm didn't affect them. I believe that many older newspaper articles on their site are not republished anywhere else on the Web.
Matt, part of our site is a blog-like service. We'd love to have a Google service where we submit the content our users publishing and reject them based on the rating returned.
I'm sure this can also be used for evil purposes by content scrapers, but as a startup with limited resources, an automated (and free :) way to stop such content would be great.
Some high quality user generated content sites seem to have been impacted by this change too. My main site (see my about for details) has lots great car reviews, which are unique to the site, yet my Google US traffic has fallen off a cliff.
If my site is a content farm, then surely so are sites like Stack Overflow and Trip Advisor, as I'm using the same model of moderated and curated user generated content, and while I don't think my site is quite as useful as Stack Overflow (what is?), I've been running this site as a labour of love since 1997, and I've had countless emails from people who've found the site useful, so I must be doing something right.
We tend not to pre-announce ranking changes, because priorities and timing can always shift. But broadly speaking, anything I mentioned in my blog post a few weeks ago at http://googleblog.blogspot.com/2011/01/google-search-and-sea... is still open for improvement. The topics of that post included scrapers/copiers, spam, and low-quality sites.
Fair enough, but the domain is already shown next to the submitted link on HN. It might not confuse you, but it apparently confused other people, and it's really redundant information anyway.
I wouldn't've clicked if I hadn't understood it that way - I don't really care about the sites that dropped, but I know Quora has decent content and if it was the most penalized, that's a huge problem with those Google changes!
Great question. My first instinct would be that none of the gains are as dramatic as the drops. Basically this change gets rid of most of the negative outliers.
It will be interesting to see how JCalcanis spins this. Last I heard he was congratulating Google on going after the content farms and changing their algorithms, claiming Mahalo had superior original content. Assuming these stats aren't totally bunk, something's gotta change in that statement to avoid massive cognitive dissonance.
As a single live-alone late 20s male, I do not understand this position. eHow's content often seems perfectly tailored to people like me (I suck at cooking, carpet stain removal, calcium buildup removal, ..., all questions I've found perfectly satisfactory answers to via eHow).
In the meantime I'm deeply disappointed to see that the collateral damage of a change largely driven by a problem I don't understand includes faqs.org losing most of its ranking.
my problem with eHow is that it's very common to search for: "how to do X"
And get Google results with eHow as #1 with title: "How to do X"
And when you click it, you get..."you can't do X"
When in reality you actually can do X...all you have to do is hit #2 result for a forum to find out how.
Granted these were usually automotive questions...but still...the fact that eHow outranks legitimate sources with BS 1 paragraph of wordy text should qualify for penalizing them in the rankings.
I half agree with you. I think everyone loves to hate eHow, but it has its place in answering seemingly common sense questions. For queries like the ones you mentioned, I'd say that Demand Media's usual line about "filling in the gaps" on the web is somewhat true.
My qualm is that eHow often outranks much better content in places where their basic step by step articles don't deserve to be ranked. This is an example I've cited before... the two eHow articles with highly relevant titles don't deserve a spot above the state department's great resource that I see as #3. (Unfortunately, the state.gov site has a crummy title tag that isn't doing it much good.)
I did some casual analysis of data I collected for a small set of queries, and the exact opposite seems to be true - eHow saw substantial increases on many queries following the update. It's possible that this is because smaller sites above it were wiped out in the update, and my set of queries is very small - but it's clear that they weren't hit negatively.
This update seemed to have a much larger impact on user generated article directories than it did the new-wave content farms like eHow. That ezinearticles and Hubpages finally got hit doesn't surprise me much, because they are frequently exploited as a source for followed links by site owners.
eHow actually improved in the rankings. eHow is usually decent enough relative to the internet. Search for how to change your car's oil and you can find a helpful article that includes video from eHow, for example.
I know there is loads of crap out there, but I'm a bit confused by the overall attitude towards user written articles. It seems like a baby with the bathwater approach.
The same people most against articles sites, don't seem to have a problem with search results full of crappy youtube vids, yahoo answers, amazon and other shopping, twitter gibberish, and whatever else that isn't exactly Pulitzer Prize winning content.
It took me about 7 articles on AC before I realized Google would eventually catch up with the content farms. That's also the same time I recognized the problem I was contributing to.
Happened a lot sooner than I expected though. :)
ETA: On the plus side, my articles should make the cut if they go on a purge to raise the quality level.