A website will rewrite the same post hundreds of times in an effort to increase its link count and traffic , while preventing it from being considered duplicate content. Some sites even manage to generate income from this type of content, through advertising links. However, since rewriting content is quite a tedious task, many sites are turning to auto-writing software that can automatically replace nouns and verbs . This usually results in the creation of very poor quality content or, in other words, gibberish.
The patent explains how Google identifies this type of content by identifying incomprehensible or incorrect sentences contained in a web page. The mobile number list system that Google uses is based on different factors to assign a contextual score to the page: this is the “ gibberish score ”, literally the gibberish score. Google uses a language model that is able to recognize when a sequence of words is artificial. In effect, it identifies and analyzes the different n-grams on a page and compares them to other n-gram groupings on other websites. An n-gram is a contiguous sequence of elements (here words). From there, Google generates a language model score and a query stuffing score . This is the frequency of repetition of certain terms in the content .
These scores are then combined to calculate the gibberish score. This is then analyzed to determine if the position of the content in the results page should be changed. Although the patent does not explicitly state that this system aims to penalize spinned articles, these often contain a lot of gibberish and are therefore the first to be penalized. Keyword Stuffing The patent in question: “ Detecting spam documents in a phrase based information retrieval system ” (December 13, 2011)  Keyword stuffing is one of the oldest so-called “black hat” practices.