The ongoing scourge of link spammers

Does it make any sense to you that these search terms have been searched for heavily?

  • the surveyor observed the use
  • the rivals shattered the farm
  • the link entitled the names
  • the trolls ambushed the dwarfs
  • the dwarfs ambushed the trolls

Hal Daume doesn't think so. He's analyzing a Google database of words known as their 5gram corpus. This is the list of most searched for sets of search terms that have only 5 keywords in them. None of them seem like they should be common search terms. They don't return very many web documents -- for some of them, none at all.

Hal can't figure out what's going on, but he asks his readers to leave comments if they have an inkling. Bob Moore writes:
I think there is link spamming going on here. When I searched for "the surveyor observed the use", Google returned the following as the fourth ranked search result:

e-Commerce Writers and Academician | XING1 101500 the surveyor observed the use 2 30619 the rivals shattered the farm 3 27999 the link entitled the names 4 22928 the trolls ambushed the dwarfs ...

Notice that the first four of Hal's weird 5-grams show up just in this small snippet. When I checked the cached copy of the page that was pointed to, Google told me that the search term occurred only in the referring pages. Probably someone has created a link farm using these 5-grams to boost the rank of the pages they point to when these terms are used as search queries

Whoa, what? Someone just uncovered a cool trick of link spammers. By making elaborate spam pages (the kind we ban and kill with relish at Posterous) that contain this obscure set of terms, they can just get ranked signficantly higher. Makes you realize there many dimensions to fighting blackhat SEO link spammers... not just URL, and not just referrer. Anything is a weapon, even a particularly obscure turn of phrase.