Google and the Death of Originality

Navigate this article:

Legacy of the Helpful Content Update / Helpful Content Classifier
– How the HCU / Helpful Content Classifier (possibly) Works
– Interpreting Google Speak into Human Speak
– Defining How These Signals Might Be Measured
– Defining What Google Means by “People-First Content”
– There’s an Exception to Every Rule
– Testing Proves Too Much SEO is Correlated to HCU Losses
How Google is Killing Originality
– Why Consensus, Why Now?
– What is Consensus?
– Fake Consensus
– Why is it a Fallacy?
– Exactly How Google is Killing the Web’s Originality
– When Fake Consensus is Fine
What I believe happened

In August of 2022 SEOs around the world were elated when Google unveiled a brand new ranking signal called the “Helpful Content Update”. This new signal would be one of the few known site-wide signals Google uses to determine page quality and Google claimed it would help elevate content from around the web that was created by humans for humans. Immediately SEOs, including our team here, began to theorize what Google might deem “helpful” vs “unhelpful” and began adjusting our client’s content to match these guesses, experimenting, and measuring. This update came on the heels of the popularity of AI writing software systems like Jasper and Surfer which sought to easily replicate or even improve upon content created by a competitor and quickly get it to rank and just months prior to the release of ChatGPT – an event which would kick the AI arms race amongst tech corps into high gear and cause an investor-driven frenzy in both private investment markets and the stock market. It was thought by many that the HCU portion of the algorithm would destroy AI generated content and give preference to content known or believed to be created by humans.

The first few months of the HCU there was extremely little movement around the web. In the grand scheme of things it appeared that whatever was being labeled helpful vs. unhelpful barely moved the needle. Much like Core Web Vitals before it, the Helpful Content Update was shaping up to be a nothingburger (outside of fields like Medical this is important for later), an argument SEOs could make to clients to include FAQs and make pages better from a UX standpoint but that was about all. This is not out of the ordinary for Google to make a big fuss about something but the end result impacting websites is barely noticeable (especially for those not practicing known blackhat tactics).

A little over a year after the initial announcement Google announced an update to the Helpful Content Update known dryly as the September 2023 Helpful Content Update I would like to make an argument that this update should be known as the “Death of Originality” update.

To understand why we need to go back over 13-months earlier to exactly one week before the original HCU was announced and to a blog post penned by Google’s VP of Search Pandu Nayak. The post titled “New ways we’re helping you find high-quality information” appears quite innocuous at first glance and did not set off any alarm bells in my head when I first read it.

Legacy of the Helpful Content Update / Helpful Content Classifier

I will break this blog post down in a moment, but we need to break for a minute to discuss the Helpful Content Update.

First off, the HCU is dead. Google now calls it a “retired system” and the core functions of the HCU have been rolled into various ranking systems at Google, just like previously known one-off systems Panda and Penguin. This means it is impossible to get “hit” or “penalized” by the HCU now, but its ghost will still haunt you and it is also impossible for this system to grant you a recovery. Google has also deleted some of the old information about the HCU.

This does not mean recovery is impossible, it only means the original system is now gone and several possibly dozens or hundreds of ranking systems at Google now must be appeased using possibly the exact same criteria the original one did. Google will no doubt use these semantics to discredit claims of HCU impacts or recovery as they try and march forward.

How the HCU / Helpful Content Classifier (possibly) Works

When it first rolled out the Helpful Content Update was likely a reranking system that took results from the core ranking algorithm and adjusted them based on its new analysis. Now the same things are happening but inside of each component of the core ranking algorithm.

Most likely the HCU/HCC works similar to how we speculate Panda and Penguin worked. If a page or portions of a page are identified as “unhelpful” that page is given a negative score. This negative score is then passed throughout a website via internal linking mechanisms dragging down the ability for all pages on a site to rank. If a page or portions of a page are identified as “helpful” that page is given a positive score which passes throughout the website granting a boost in ranking potential.

This all sounds amazing but one key detail is left out – what does Google consider “helpful” content?

What Google considers helpful content (according to Google)
When the HCU was first announced Google was incredibly vague about what would constitute helpful content. Here are the few mentions of it from their now deleted documentation:

“The helpful content system aims to better reward content where visitors feel they’ve had a satisfying experience, while content that doesn’t meet a visitor’s expectations won’t perform as well.”
“The system automatically identifies content that seems to have little value, low-added value or is otherwise not particularly helpful to people.”
“This means that some people-first content on sites…”
“If you host third-party content on your main site or in your subdomains, understand that such content may be included in site-wide signals we generate, such as the helpfulness of content. For this reason, if that content is largely independent of the main site’s purpose or produced without close supervision or the involvement of the primary site, we recommend that it should be blocked from being indexed by Google.”

Based on only this information we can surmise very little but might takeaway the following:

Content with a low bounce / pogo sticking rate
Content with a higher time on site
Content that leads to other actions on site
Content that is easy to navigate
Content that is created for “people first”
Content that adds new value over other available content
Content that guides or helps people specifically and directly
Content that has been reviewed or edited by an expert

Anyone familiar with those who have been hit by and those who are rewarded by the HCU and its subsequent updates can attest that these claims by Google (which are all now deleted) are largely untrue. The update seemed to eradicate a lot of content created by people for people and that provided unique insights.

Thankfully, Google offered up another document to help creators identify what is and is not helpful content and we can quote directly from this document.

Google asked content creators / bloggers / writers / SEOs to ask these questions about each piece of content possibly hit by the HCU or subsequent updates where this system was likely active. Those in bold are ones we thought important to highlight:

Does the content provide original information, reporting, research, or analysis?
Does the content provide a substantial, complete, or comprehensive description of the topic?
Does the content provide insightful analysis or interesting information that is beyond the obvious?
If the content draws on other sources, does it avoid simply copying or rewriting those sources, and instead provide substantial additional value and originality?
Does the main heading or page title provide a descriptive, helpful summary of the content?
Does the main heading or page title avoid exaggerating or being shocking in nature?
Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
Would you expect to see this content in or referenced by a printed magazine, encyclopedia, or book?
Does the content provide substantial value when compared to other pages in search results?
Does the content have any spelling or stylistic issues?
Is the content produced well, or does it appear sloppy or hastily produced?
Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?
Does the content present information in a way that makes you want to trust it, such as clear sourcing, evidence of the expertise involved, background about the author or the site that publishes it, such as through links to an author page or a site’s About page?
If someone researched the site producing the content, would they come away with an impression that it is well-trusted or widely-recognized as an authority on its topic?
Is this content written or reviewed by an expert or enthusiast who demonstrably knows the topic well?
Does the content have any easily-verified factual errors?

Here Google adds a lot of color, however, yet again we can review sites and content hit by the September 2023 HCU and see that it ticks all or most of these boxes, so why would they be hit?

Interpreting Google Speak into Human Speak

Much like you I and my team have been scratching our collective heads over this same thing and built some experiments to determine what exactly was going on. First, let’s look at the bolded items above and theorize about how Google might actually measure this:

1. “Does the main heading or page title provide a descriptive, helpful summary of the content?” – We could interpret this a little differently as “Does the main heading or page title look keyword stuffed and provide no additional information beyond the target keywords?” In this context we see that Google would be claiming that keyword only title tags or title tags built with more than one keyword are unhelpful and therefore might draw the ire of the classifier and a negative score. For example “Keyword 1 | Keyword 2 | Company Name” which was long a stalwart of SEO might now be gaining negative attention by Google’s systems.

2. “Does the main heading or page title avoid exaggerating or being shocking in nature?” – We might interpret this as “Is the main heading or page title clickbait?”. In this context we see that publishing a lot of clickbait content would be a negative to Google. For example “10 Amazing Ways to Improve Your SEO While Saving Thousands of Dollars” would be a worse title tag than “10 Best Ways to Improve Your SEO”.

3. “Is this the sort of page you’d want to bookmark, share with a friend, or recommend?” – You would be excused for interpreting this as “Does this URL get recommended on social media, is it being sent to email inboxes in a non-spam way? Google could easily measure this if they had a contract to gain all of say Reddit’s data or decided that specific UGC sites would rank well and be a signal in their results but not other sites.

4. “Would you expect to see this content in or referenced by a printed magazine, encyclopedia, or book?” – We could easily interpret this as “Does this URL gain links from online publications with print editions?”. Google might consider this higher quality than online-only publications since print costs $$ per every letter printed. Editors would deeply consider including a link for reference and would only do so when necessary.

5. “Does the content provide substantial value when compared to other pages in search results?” – Most SEOs at first interpreted this as “Does the content provide net new information”. However, numerous sites and documents hit by the September 2023 version of the HCU did indeed have this. Many of those say increases slight or large between August 2022 and September 2023. As such we can toss that interpretation out and try a different one “Does this page have too much SEO content on it, that is content which appears designed first to rank in Google and to help users second”. We can use this interpretation based on earlier phrasing such as “people-first”.

6. “Does the content have any spelling or stylistic issues?” – Here we could interpret this as Google measuring the readability of a page, the formatting, and even the UX. What they consider “stylistic issues” has never been elaborated, but we do know they measure on mobile if elements are too close too each other and might overlap. That plus other issues when viewing on a smartphone might be dragging down a helpfulness rating.

7. “Does the content present information in a way that makes you want to trust it, such as clear sourcing, evidence of the expertise involved, background about the author or the site that publishes it, such as through links to an author page or a site’s About page?” – Interpreted a little differently Google is likely looking at entities here and measuring their topical authority at scale. Either the website’s brand or the author’s name at scale across the web. To do this Google says they want clear evidence that your brand should be trusted in places where a user might look.

8. “Is this content written or reviewed by an expert or enthusiast who demonstrably knows the topic well?” – A big one SEOs have long assumed might be a ranking factor. You could interpret this to easily mean that Google in some cases somehow can determine who an expert on a topic is and that they can demonstrate this. Google of course never elaborates on how they might do this especially across divergent subject matters where determining such expertise would be wildly different. Suffice it to say you probably can not just claim to be an expert on your site and have that count for this one.

9. “Does the content have any easily-verified factual errors?” – Google is saying here they know what a fact is and they know when a fact is wrong. If you look at your content and have a fact stated in there that Google believes is wrong, you are likely being labeled as “unhelpful”.

Defining How These Signals Might Be Measured

We can boil these down even further to make them more simple to understand and design tests for:

1. Is the page/document that appears to have lost in the HCU overly SEO’d? – By this we mean does the title tag and h1 match exactly? Do they appear to be too “keyword heavy” and less about telling users what the document is about. As well do other headings on the page appear to be keyword stuffed for example instead of “Widget Technical Specs” just use “Technical Specs” in the heading. And, is there content on the page that appears when read to speak to a search engine instead of to a user reading it? For example the sentence “There are a lot of Dallas SEOs to choose from in Dallas, TX but we are the best.” might be considered too much SEO on the page since a target keyword is shoved in the sentence instead of allowing it to flow more naturally such as “There are a lot of SEOs to choose from in Dallas, we believe we are the best and here is why.”

2. Is the page/document that appears to have lost in the HCU easy to navigate on mobile and desktop? – By this we mean is the main navigation easy to find, if the content is long are there jump links, does anything block content or overlap on mobile, if mobile-first responsive on desktop is the desktop version usable or just a blown-up mobile version?

3. Does the website make it easy to determine if the brand is legitimate or if the author might be a real person with real expertise? – By this we mean is it reasonable to assume these are true and are doubtful Google has committed the resources necessary to build an actual graph of expertise across the web, but is more looking for signs that the site goes out of their way to make it known. Spelling is included here, if you are creating content we assume that Google assumes you will go out of your way to avoid large volumes of spelling and grammar errors across the site, though minor errors here and there are likely tolerated (as are regional specific grammar rules appearing in other related regions like British u’s in words appearing on American websites).

4. Does the document state facts that Google knows to be true and only those facts that can be proven by their system? – By this we mean that Google will give a negative score to documents that are too far out of alignment with known facts or worse yet consensus on the web as pulled from larger more trusted brands. This is by far the scariest of those we have highlighted since it means that Google thinks their algorithm knows all and will punish those who disagree with it.

5. Do those whose livelihoods Google believes rely on being trustworthy trust this website or document?

Aside from a few long-held content standards, if our interpretations are accurate, we can begin to see a formula appearing here and we might possibly even be able to finally determine what Google means by “People-First” content which they used over and over again discussing the HCU/HCC.

Defining What Google Means by “People-First Content”

People-First content might then be defined as this: Content that is created with a focus on or created by human being web users that is largely in congruence with other brands/domains/authors which are considered to be trusted on a topic, because if they were not to be trusted they would cease to exist, and gains links or mentions among a set of other trusted websites with small tweaks and adjustments to help search engines better determine the target audience, nuances, and context of the content.

Anything that meets the above litmus test then gets a boost with a “helpful” label. Anything even slightly afoul of this is then defined as “unhelpful” and gets some degree of a negative scoring applied.

We might even be able to write a second definition for another issue that appears in SERPs: Content that is created by users on social media or a website/app that facilitates such interactions where the discussion itself between assumed real human users generates the content to satisfy the query of another real human user and interactive elements on the website/app are used by humans to in some way recognize this content was useful or helpful to them.

The second one would describe things like why Reddit conversations or Quora answers that are not full-length articles and do not fully satisfy a query might rank for queries like “beginner’s guide to dropshipping”.

(p.s. yes you can rewrite this and submit a new definition to me, best one wins a high-five!)

There’s an Exception to Every Rule

This is where things start to break down. Many of you reading this likely caught on to one or more of these potential targets for the HCU/HCC as did Cyrus Shephard among others.

[New Post] How Recent Google Updates Punish Good SEO: 50-Site Case Study

Something is insidious w/ Google's Algo Updates, so we looked at how good SEO practices fared:

• Page Update Frequency
• Schema Usage
• Anchor Text
• Page Titles

And tons more with help from @ahrefs🙏 pic.twitter.com/U3Aobp8nsc

— Cyrus SEO (@CyrusShepard) June 18, 2024

However, when viewing the SERPs you may have noticed numerous websites that violate these rules ranking well which destroys your theory. In SEO we always have to tell clients that just because it works for a competitor does not mean it will work for you, there’s always some exception that is hard to explain and its best to just keep doing what you know probably works for your SEO than to become obsessed with why something that shouldn’t rank is ranking so well. (the keyword “Dallas SEO” for example is a source of constant angst as Google almost always ranks spammers and scammers here).

If we look at most or all of the exceptions to this rule we find one thing in common amongst a large set of these websites:

High domain value, no matter what measurement you use – not high because of scams or tricks, but due to real-world quality inbound authority links

In fact just last month Tom Capper from Moz published an article titled “The Helpful Content Update Was Not What You Think” which says exactly this same thing, ok not exactly. Tom does use Moz-based data though to infer there’s some kind of dividing line between HCU winners and losers. Instead of link data, Tom argues that branded search volume data is more inline with this dividing line using a metric Moz calls “Brand Authority“.

Whether it is link based or search volume based or some combination of both, it seems evident to most everyone looking deeply that the original and September 2023 flavors of the Helpful Content Update considered established sites/brands that are well-known and liked to either be the source of truth for facts or at the very least immune to being labeled as “unhelpful”.

This is where SEOs began to loudly (and rightly IMHO) complain about “reputation abuse”. Suddenly websites like Forbes, USA Today, Reddit, and LinkedIn could rank on nearly anything.

In evaluation it would appear that (prior to recent demotion of reputation abuse) high-value websites defined by correlation of high link metrics from virtually any tool did not have the “People-First” content rule apply to them and even worse it is likely no matter what they produced it was being labeled as helpful almost immediately.

Thanks to this window of time where Google did not realize for some reason that large brands would see that they had a massive benefit in SERPs and abuse it, we were able to clearly see the separation line between sites. Small, independent solopreneur sites were highly susceptible to the HCU/HCC while Large, well-known highly cited brands appeared to be immune. +

Testing Proves Too Much SEO is Correlated to HCU Losses

We ran various tests with our clients and each one showed the exact same thing. A URL that lost traffic between the start of the September 2023 HCU and about 2-weeks after it completed saw a near instantaneous (12-48 hour) rebound. Never exactly to the position the URL held prior, but always close.

In this case for example the URL went from page 2 to #2 seemingly overnight after a chunk of text we determined was created just for SEO was removed:

Evidence the HCU Applies a Negative Score and Amplifies it for the presence of suspected SEO Spam:

This website was LIKELY hit by the September 2023 HCU (we are not certain as they signed with us in Oct 2023 and our data begins there).

Based on what we know they had been #1… pic.twitter.com/n3yMZz7o8b

— Joe Youngblood (@YoungbloodJoe) October 4, 2024

With this proof in hand it was time to experiment with other possible causes of HCU demotions. That brings us to the point of this article.

How Google is Killing Originality

Let’s go back to that blog post written just 1-week prior to the release of the Helpful Content Update in August of 2022. That post was written by a long-time veteran of Google and VP of Search Pandu Nayak.

There are a few portions that went from sort of banal or uninteresting, to extremely interesting as we dug through our research.

“We design our ranking systems to surface relevant information from the most reliable sources available – sources that demonstrate expertise, authoritativeness and trustworthiness. We train our systems to identify and prioritize these signals of reliability. And we’re constantly refining these systems — we make thousands of improvements every year to help people get high-quality information quickly.”

and

“By using our latest AI model, Multitask Unified Model (MUM), our systems can now understand the notion of consensus, which is when multiple high-quality sources on the web all agree on the same fact. Our systems can check snippet callouts (the word or words called out above the featured snippet in a larger font) against other high-quality sources on the web, to see if there’s a general consensus for that callout, even if sources use different words or concepts to describe the same thing. We’ve found that this consensus-based technique has meaningfully improved the quality and helpfulness of featured snippet callouts.”

With everything Google has told us about the Helpful Content Update / Classifier and everything our research and the research of others has unveiled, these seemingly innocuous statements now take on a frightening meaning for any small, independent, publisher or website who publishes unique content with their own unique takes.

1. To demonstrate expertise, authoritativeness and trustworthiness you must repeat facts Google thinks are 100% true, even if they are false.

2. If you publish content that is in stark disagreement with websites that appear immune to the HCU (i.e. if you’re in Travel think TripAdvisor, etc…) you will likely lose all of you traffic, even if your content is based on first-hand experiences, is your opinion, and is shared by others.

Going back to our interpretations about what Google might demote as “unhelpful” this means that if your document or lots of documents on your website violate the notion of “consensus” as generated by their AI system based on much larger sites – then your content and site are likely to be labeled as “unhelpful” and you will lose a lot of traffic if not all of it.

Remember in #4 above we said “By this we mean that Google will give a negative score to documents that are too far out of alignment with known facts or worse yet consensus on the web as pulled from larger more trusted brands. This is by far the scariest of those we have highlighted since it means that Google thinks their algorithm knows all and will punish those who disagree with it.”

If this is the case then we could do some thought experiments. For example, if you published a top 10 lists of the best restaurants in Dallas and this list was not similar to say the current top 10 on TripAdvisor, your document might get an “unhelpful” grade and lose traffic while another document closely in alignment takes your rankings.

We find evidence here like that of this client site which has spent over a decade publishing unique content for their audience. That content is rarely in alignment with what big publishers would say, because much of that is paid PR gibberish or AI generated vomit. In some cases this website even publishes content that is contrary to what a brand says about their own products, because the brand leaves out key details or uses misleading statements. Google’s AIO steals this client’s content and uses it, but downranks the site into oblivion choosing instead to use content that aligns more directly to every other website on the web or with the brand’s official documentation.

Google has been trying to kill this REAL, LOCAL, SMB since the Sept '23 HCU.

Google has eradicated their traffic -83% since August of last year.

They've never:
-> Bought a link
-> Spammed
-> Faked reviews
-> Cloaked, etc…

The Map listings for their vertical is CROWDED with… pic.twitter.com/JKVufwTb7M

— Joe Youngblood (@YoungbloodJoe) October 29, 2024

Why Consensus, Why Now?

Google might be moving slower the older and more bloated they get, but the company is still extremely smart and really good at understanding the near-future impacts of technology. For example in 2011 when I first predicted smart speakers to executives and clients at the marketing agency I was working at I believe Google was already conceiving of such innovation and determining how best it might come to market. And then when I wrote about this prediction in early 2014 after Google made the purchase of Nest Labs, I believe Google was in the throws of finalizing their approach assuming no other tech companies had spotted the opportunity (Amazon would surprise me and Google by releasing the first Alexa device later that year and 2-years before Google assistant was launched). We can assume this based on Google always being slightly ahead of and/or slightly behind the emerging tech curve and based on their hiring of Ray Kuzweil a noted futurist/futurologist and engineer who has spent decades predicting technological innovation with fascinating success.

On the heels of my prediction of smart speakers I began discussing (and trying to alert everyone) to a possible future where websites no longer existed or existed solely as data imports for large automation systems (I did not use the terms LLM or AI for this but our current tech tree tracks with those predictions). I did not like the term “search engine” for these systems since a search engine’s job was to crawl, index, and rank web documents. Instead I dubbed these “task completion engines”, systems designed to assist humans in completing a task or series of tasks and then fade into the background sort of like Mr. Meeseeks from Rick & Morty but not organic and not popping out of existence.

Task Completion Engines would face a problem though. In my prediction I realized it was likely a full reset of the web and tech would start from scratch. This would trigger what I refer to as the “soup of chaos” when a bunch of companies emerge all vying to do the same thing. Like the early days of search engines all of these would use the same basic, underlying, tech which would become relatively cheap and easy to access – no barriers like PageRank. Early search engines all had the same characteristics: use meta tag data, limited index of the web, require URL submission (i.e. crawler on demand not continuous crawling), use rudimentary understanding of content on the page to determine rankings (i.e. keyword stuffing / cloaking etc… worked). This all changed when 2 students developed a new system for ranking an index of web documents they called “Backrub” and thankfully later changed the name to Google. Google’s innovation here, PageRank, would produce demonstrably higher quality results rendering the rest of the engines useless and by 2012 they were all virtually non-existent in terms of global market share (outside of Naver, Yandex, Yahoo! Japan, and Alibaba in their respective countries).

Unfortunately, you cannot just port the same tech from a search engine over to a task completion engine and get quality results. If task completion engines render much the web useless then content production on the web declines and so does the engines ability to determine quality. Links which power PageRank and even other classic SEO signals might start to disappear from the web in large enough volumes that trying to operate as both a search engine and task completion engine is futile, leaving Google exposed to upstart competition for the first time in decades.

I won’t go deep into details here, but we could theorize that some of those non-link signals that might be counted in similar was for trust or quality and that would survive such a transition of the web could included: reviews thought to be from real humans, recommendations thought to be from real humans, brand mentions from trusted sources, recommendations from trusted sources, and finally a consensus among trusted sources on certain facts / figures / brand mentions.

Search engineers need to use what they know and what they theorize won’t change much in the wake of death of the web and the rise of task completion engines. Just like Amazon and eBooks did not kill physical books, it is unlikely that such a system kills publishing by top authority figures or news websites. It is unlikely that humans cold turkey quit using the web, instead begin to use it in very specific ways.

If you could develop a system that recognizes this and rewards such behavior early you might even gain a double win. Nontrusted publishers / brands / websites shut down and disappear, your engine requires less crawling power, your AI system can understand things easier, and you can reliably produce responses and answers without the messiness that might come with having to evaluate a larger set of data.

This is why I believe consensus was taken from Featured Snippets and placed into action for the HCU. I believe Google knew LLM-based AI’s using the technology created by their own engineers years earlier was close to becoming viable and they were aware of other actors chasing this (OpenAI was well-known in SV tech circles before the launch of ChatGPT). Google needed a way to help their systems cope with the coming changes while also transforming their company through a tumultuous period and consensus + anti-SEO is a convenient way to cull the web of “low quality content” removing vast swathes of the web knowing that a small portion might be real, authentic, creators who simply have different opinions or tastes and preferences.

Our team has tracked consensus being used by ChatGPT, Copilot, and Perplexity as well, with varying sourcing methodologies that appear different from Google.

This is part of the “AI search” arms race. At the moment all players chasing the creation of the next big web discovery breakthrough appear to believe that consensus is at the very least one way they can improve quality of their results OR possibly the main way as the link graph was for early search engines. Maybe we could even call this the “Consensus Graph” when all is settled.

What is Consensus?

Dictionary.com defines consensus as: majority of opinion, general agreement or concord, harmony. The term comes to us straight from antiquity where it was used by ancient Latin speakers to discuss “agreement”.

I believe Google is using consensus a little differently. Instead of demanding 100% agreement I believe Google is looking for an “alignment of facts”. Pandu said in his blog post even if the same thing is said in different ways, their systems could recognize it. Consensus here would be more like saying if the most trusted documents include X-entity and your document does not include X-entity then you are out of alignment with this consensus that X-entity must be included.

Think of it a little like a gameshow where a contestant is tasked with discovering which one of 10 former NFL players is not actually a former NFL player. The contestant can only ask the supposed players 1 question at a time, cannot see the players, and the players cannot hear each other answer. You might ask this group a specific question that only NFL players would know, for example “how does the tampa 2 defense defend against the run game?”. Virtually all former NFL players would know this by heart but a normal person may not. By asking the group the question one at a time and getting the same answer but stated differently you might be able to uncover the 1 fake NFL player and win the game. (btw the answer is: by giving gap assignments to each player in order to best utilize speedy linebackers and d-linemen to force the play to the weak side linebacker).

This is probably how Google is using consensus, as a way of determining correct vs. incorrect and weeding out the fakers.

Fake Consensus

This isn’t actually consensus. Instead Google appears to be falling prey to an argument fallacy known as “argument from authority”. This is an argument fallacy where you rely solely on the perceived expertise of someone in order to determine if what they say is true / credible or not. It takes the opinions of one or more perceived authority figures and uses that as evidence to support the argument that x page is or is not a quality or helpful page.

Think of it like like this tactic used in every political cycle. You know your beliefs and have an idea which candidate you want to support, but then their lead opponent gets endorsed by several celebrities / influencers that you also like and they all repeat the same thing about the candidate your originally wanted to support. This is a variation of “argument from authority” attempting to use the position of those people to influencer how you view a subject matter or person.

Argument form authority is closely related to “argumentum ad populum” also known as “consensus gentium”, “appeal to the masses”, or “appeal to the majority” – except in this case the population is restricted to perceived experts by Google. We might even need to create a new fallacy for Google called “Argumentum ad peritorum” which is Latin for “Argument to the experts” or “consensus peritorum” for “Consensus of the Experts” to better describe what is happening.

Why is it a Fallacy?

To determine an Argument from authority fallacy we must ask a simple question: Are the experts or authority figures being cited REALLY experts in this specific area?

For Google specifically we can follow this up with a second question: How would Google determine the expertise or authority of figures they are judging the rest of the web on for consensus? Is there information for these figures on formal education, higher degrees, or years of experience that can be validated with 100% accuracy?

If the answer to either question is “No” then we are looking at an “Argumentum ad peritorum” or “Consensus peritorum” fallacy.

If you are unable to validate the expertise with 100% certainty (or leniently 95% certainty) then the consensus is false and anything based on it – including search results – is also false.

Here is a complete and total list of websites that determine with any degree of accuracy someone’s actual expertise in any field which Google might use to determine such a consensus:

Exactly How Google is Killing the Web’s Originality

Google is killing the originality of the web.

The HCU was not and is not about helpful content at all, it appears to be more (in part at least) about driving conformity in ways only Google could demand for some unknown end. Perhaps the original intention was true to its moniker but something changed in September 2023 that turned into a bane for the web. Recently Danny Sullivan himself told a small group of creators at Google’s HQ it would never come back.

“… but some people are saying I want to be back to where I was in September. I was talking to somebody and I said September is not coming back, the whole format of search results has changed.” – Danny Sullivan, Google Search Liaison

Google told creators yesterday at the GooglePlex that a Google search ranking update is coming fairly soon but they should not expect to recover, they should move on https://t.co/lNpSqRnCdY via @charlestoncraft and @mountainweekly pic.twitter.com/fRLjPevWQ1

— Barry Schwartz (@rustybrick) October 30, 2024

Google has a long history of making demands of the web that transform how content is published and formatted and how users experience the web, for example their preference for long format content for recipe queries is why you must read a novel about someone’s grandmother before seeing a recipe.

Panda demanded that the web stop using the same content on various pages and eliminate thin or junk pages. Penguin demanded that you stop collaborating on blogs or using too precise of anchor text when linking to other sites.

The Helpful Content Update appears to be yet another in a long line of such demands. The HCU / HCC appears to mostly only apply to smaller sites and demands they conform to what larger sites say and do. They must use the same facts, say things in similar ways, and cannot say anything that is unique or inventive, use specific keywords to differentiate themselves anymore in title tags or headings, or even collaborate with peers.

The original, creative, and vibrant web powered by solopreneur creators is collapsing and Google is the culprit behind it and the one who stands to gain the most from it.

As the web dies, Google’s profits are rising significantly bolstered by their blatant content theft, their abuse of other people’s content, and their forceful push for AI superiority. Their executives likely see no wrong doing, as they boldly told a group of creators they invited out to Mountain View “thanks but no thanks”.

Google earnings are out https://t.co/Jq6Kyviw47 pic.twitter.com/GDHKbpI2qe

— Barry Schwartz (@rustybrick) October 29, 2024

When Fake Consensus is Fine

Fake consensus or as defined above “argument from authority” or “consensus peritorum” or whatever we want to call this fallacy is not all bad when it comes to search results quality. In some cases it might even be extremely helpful in weeding out scammers or harmful content. Remember earlier in paranthesis when I said the first version of the HCU was a bit of a nothingburger except in Medical and similar fields? Well, the website Mayo Clinic for example went from an estimate of 134 million monthly organic search clicks in July of 2022 to an estimate of 170 million organic search click in July of 2024 (Ahrefs data). Seemingly completely unphased by the HCU or any subsequent rollouts seeing only small seasonal declines before continuing growth again. While sites like WebMD and Healthline appear to have taken a hit (Ahrefs data).

If this data is accurate I see few issues with it. In established fields such as medical, legal, and personal finance there are often facts. “For example the mitochondria is the powerhouse of the cell” (hope you’re happy Mr. Boline and Mr. Fisher). Consensus can build in these fields amongst trusted sites, Google can have mechanisms in place to detect a new consensus building (or even a schism) and adjust how the engine treats the content. For example if the old knowledge is that “if you’re feeling dehydrated drink water” and the new medical knowledge is “if you’re feeling dehydrated and want to be weak drink water, but to be strong drink Brawndo” Google could determine this over the course of weeks, months, or years as trusted suppliers of such information change their content.

And who would be trusted? Most likely regulated or closely scrutinized groups (well-known hospitals & medical schools for example) and government organizations (NIH and CDC for example).

In these cases consensus is a public safety mechanism as well as an anti-spam mechanism. It works to protect consumers from salacious, false, or dangerous claims like using glue to make pizza.

However, there are 3 major issues with this false consensus approach:

1. Consumers and good actors should be aware they are part of this consensus system. Let’s say for example the CDC does not know that its content is being scanned by Google to determine if Brawndo really is what humans and plants crave. Instead of publishing a study that says this the organization sits on it for over year while websites who post this information get labeled as “unhelpful” and sink into oblivion. Oopsie!

2. Outside of areas where failing to protect a consumer could be catastrophic such as medical and health information, legal information, and personal finance and outside of areas where those you might trust the most would be heavily liable for publishing false, misleading, or harmful information – false consensus in other areas would be a lot more shaky. For example, consider that Google might use TripAdvisor a the sole trusted source on the Best Restaurants in Raleigh. Our friend Jake decides to start a blog called Jake Eats Raleigh or JER for short. He starts publishing lists of blog posts based on his personal taste preferences that go completely against what TripAdvisor’s data from tourists says, mostly because he eats at restaurants those tourists would have never found in the first place. Instead of being promoted by Google for his hard work and originality as Raleigh’s favorite food son, Jack is buried by the various Helpful Content Classifiers judging him for being so far out of step with the mainstream.

3. Consider this developing of false consensus has other social and political implications too. For example one political party / waring tribe / incumbent party takes over 50% of all major publications in the region assumed to be trusted by Google for factual relevancy and begins publishing complete lies with no evidence at all about their opposition’s members. This would then subvert the opposition quite easily since anything they say would be automatically silenced by Google’s algorithm for violation of consensus. Since this is a marketing blog I will leave it here and allow others cover this in more detail if they so desire.

What I believe happened

I believe that Google did train their MUM model on consensus at first only to impact specific queries that might otherwise be dangerous and to only impact the Featured Snippet or “call out” as Pandu sometimes refers to it. I believe consensus was either released alongside the original HCU rollout in August of 2022 OR was included in that algorithms rollout but only for a relatively small selection of queries most likely impacting the domains of health, finance, and law. If included in the original HCU it was likely dialed back and/or focused on specific cases.

Then a year later, based on the success of the initial rollout, MUM or Gemini or some other model we’re not yet aware of (let’s be honest, it was absolutely not Gemini at this time) was trained on consensus for the rest of the web and included in the September 2023 HCU. Here specific sites were tagged as authorities – likely automatically – by the model, these sites could fail to meet consensus and would simultaneously help establish it for other sites. Since I both believe Google is full of smart people and has plenty of “no people” in their ranks, I assume they also understood the potential downsides and placed mechanisms that would limit how this consensus scoring was applied. Unfortunately, it appears that simply applying the concept so broadly caused mass harm and confusion on the web.

In March of 2024 when the HCU was brought to retirement there were no changes made to how consensus works, only that it was probably added to a variety of different core algorithms.

Finally, during the August 2024 Core Update Google “softened” the blow of consensus slightly allowing a handful of recoveries to blossom.

Consensus as part of the helpful content classifier might even be recorded in Google’s memory for a domain, making recovery difficult or impossible. A website that once told kids to drink only water and failed to tell them about the health benefits of Brawndo for example could always revert back later, the content shouldn’t be allowed to improve just because it changed. This memory might take awhile to develop and perhaps the negative version of the HCU / HCC only runs periodically enough that swapping domains might show recovery temporarily. Over the long-term if the consensus stays maybe that website finally wins the HCU/HCC’s trust and starts ranking again, we’re talking a year or longer most likely.

I Could Be Wrong

This might all be a lot of speculation based on Google’s own statements prior to the HCU from their VP of Search and on observed data from the limited set of data we have access to here.

It is not the first attempt at describing what the HCU was / is, and I am doubtful it will be the last either.

However, getting this out there might inspire someone else to dig in their own data and find similar clues or maybe even inspired a whistleblower at Google to leak something on the topic, or maybe even convince Pandu, Danny, or Prabhakar to come out and tell us what the heck is going on.

Always test theories like this one on your own. See if violating consensus or following it helps your site. Examine the few recoveries that have happened and check archives to see if they adjusted anything to meet what might be consensus. Our attempts all validate this, yours may not.

One thing you should always do though – keep building.