Google’s Use of Readability, Reading Level & Vocabulary Metrics in Search Algorithms

We do know that Google is able to determine content that is high quality – or low quality – as well as being able to determine when content is nonsensical, such as the cases with spam content.  But how does Google determine it?  Do they use things such as evaluating the level of vocabulary, the tone of the content, or readability?  The question came up at the last Google Webmaster Office Hours with John Mueller.

What makes his answer particularly interesting is that he specifies that Google does not have anything public that they use to determine this, rather than simply saying they don’t use this as a ranking factor in their search algo for evaluating content.

First, the question:

I’m about to ask you something about related how Google calculates the quality of a content, piece of content.  So what is the importance of the some metrics like fresh reading, like the length of paragraphs, the paragraphs after hearings, and the basic voice tone or for example how difficult the text is written and something like this, in this direction.

John Mueller’s response:

So from from an SEO point of view it’s probably not something that you need to focus on, in the sense that as far as I know we don’t have kind of these basic algorithms that just count words and try to figure out what the reading level is based on these existing algorithms.

But it is something that you should figure out for your audience. So that’s something where I see a lot of issues come up in that a website will be kind of talking past their audience. So maybe you’re making like – a common example is a medical site, you want to provide some medical information for the general public because you know they’re worried about this and all of your article is used like these medical words or twenty characters long. Then technically it’s all tracked and you could calculate like the reading level score of that content you come up with a number.

But it’s not a matter of Google kind of using that reading level score and saying this is good or bad but rather does it matter what the people are searching for and if nobody’s searching for those long words, then nobody’s going to find your content.  Or if they do find your content they’re gonna be like, I don’t know what this means, like does anyone have an English translation for this this long word that I don’t understand and they go somewhere else to either convert or to read more, or to find more information.

Word count has long been a known factor, in that there is no “right” word count.  Content only needs to be as long as needed to answer the question the content is providing.  There is no algo or signal that says content needs to be over X words because there are plenty of examples of pages that rank highly, even earning a featured snippet, with as few as 50 words.

He also uses the word “basic”, which could also mean a more advanced one is being use, although it is likely just a word choice.

But the person asks for more clarification, about specific algorithms which gauge this, which prompts Mueller’s response about Google not having anything public for this.

So you don’t have any specific algorithms which calculates these metrics so something like that?

And Mueller’s response:

At least we don’t have anything public that we say this is what we do and this is what happens there.

It’s something that I know the team is still working on this so it’s not like a one-time algorithm thing and we figured it out and now it’s working forever. I know that people here in Zurich that are still working a lot trying to understand the quality of pages better and to figure out where where pages are good and what pages are bad and when to show them, where they’re relevant.

So it does confirm that there is something Google is using something algorithmically to determine the quality of content taking these factors into account, something that has been clear with the way the Google Panda algo works.  Google is fairly good at determining when content is good content and when it is spun or nonsensical spam content.

There are some tools that site owners can use to try and determine these factors, although there is no way to confirm that these match – or even come close – to what Google is doing in their algo.  There are readability score tools to determine how easy a piece of content is to read, as well as grade level scores to determine the reading level of content.

But there are obviously some caveats.  John Mueller uses the example of medical sites targeting non-medical people who need the content to be of a lower reading level.  But for a medical site targeting those in the medical profession, those sites could risk dumbing down the content too much with real life consequences, if they followed some of these reading level guidelines.  So ensure you are using the right readability or grade/reading level tools for your intended audience.

The following two tabs change content below.

My Twitter profileMy Facebook profileMy Google+ profileMy LinkedIn profileMy Twitter profileMy Facebook profileMy Google+ profileMy LinkedIn profile