PageRank Still Matters

Remember PageRank? I first started playing around with SEO in 2007. Then, PageRank was everything. Ok, I exaggerate. While it didn't tell the entire story on its own, it could in a bare second tell you a very large chunk of it. Here was one quick convenient metric that would tell you with a single digit the relative importance of any page or site on the web. You knew to be way more excited by one link from a PR6 page on a PR8 site than by a hundred links from a PR1 blog. When you did competition research, PageRank values would be among the first things you'd look at when trying to get a feel for hard it would be to go head to head with a particular page or site.

In 2015 you'll never hear the word uttered. It's not just because Google doesn't update toolbar PageRank anymore, leaving all the available data way out of date. It's that the whole thing seems to be completely irrelevant, the relic of a bygone age.

The popular consensus is that PageRank is dead. The standard advice to any newbie asking about PageRank is not to worry about it, that it used to be important but the world's moved on. In truth it's become more important than ever.

What is PageRank?

Who remembers the 90s? Ahh, my high school years; like any boy who couldn't kick a football I was already hooked on guitar and the internet. Those “good old days” of the web had buckets of charm. I feel like a huge part of the nostalgia value comes out of the fact that back then it was mostly impossible to use it for anything genuinely useful. Surfing the web back in 1995 was a lot less about intentionally looking for something you had an actual use for, and a lot more about just clicking around aimlessly to see what you might find.. knock knock jokes, detailed plans for pipe bombs, screen savers that showed you boobs, information on how to stop aliens from stealing your thoughts, erotic fan fiction depicting Bart and Lisa Simpson. Life was simpler back then and perhaps we were better for it.

There were search engines. They were very hit and miss. They started off bad and instead of improving as you might expect, they became horribly worse. Enough people worked out you could game search engines by stuffing keywords into meta tags; before long, all they showed was spam and porn. It was usually better to look for things on high quality directory sites like dmoz or Best of The Web. A lot of people had their browser set by default to start at their ISP's home page, or the Netscape home page, and so would start their clicking from there.

The PageRank algorithm was first developed at Stanford University in 1996 by Larry Page and Sergei Brinn as a way of mathematically modelling this “random surfer” browsing behaviour. The nuts and bolts of it all are pretty interesting if you're into that kind of thing. The algorithm gave every page on the web a score based on how many pages linked to it, and how important those pages were based on their own PageRank score, with a damping factor applied to model the fact that at some point the random surfer stops clicking links. This score calculated the probability that someone would find this page by following links from other pages.

This was Google's great innovation to search: don't just look at what's on the page, look at how the rest of the internet treats that page. After all, people can put anything they want on a web page. If you wanted to mislead search engines by cramming in a whole lot of irrelevant keywords, you had only your pride to stop you. Broadening the data considered from one page to the entire internet makes for a data set that's a lot harder to fake.

PageRank Broke PageRank

It's not impossible though. The original PageRank algorithm was a great way to measure the relative importance of pages on the web as it was in 1996. Back then, the only reason to put a hyperlink on a web page was so someone could click. Google's own success changed this forever.

One of Larry Page and Sergey Brin's Stanford papers on PageRank boasted that it's “virtually immune to manipulation by commercial interests”. This isn't how things worked out. At the time, it was probably hard to anticipate the enormous commercial incentive that businesses and marketers would one day have to acquire commercially relevant search rankings, because back in the '90s it was a bit weird to use the internet to buy anything. Most of us were still laughing at that dancing baby animated gif, while the others muttered that the whole fad would blow over soon.

Google rapidly became popular, and a maturing web saw more people feel comfortable about spending money online; the algorithm became more and more responsible for delivering commercially valuable website traffic. As the new century dawned, PageRank had become a valuable commodity in its own right. With that were all the various schemes to gather and hoard it.

It would take too long to cover every scheme to manipulate PageRank values, and longer still to review all the ways that Google has responded to them. Let's just give them a quick glance.

At the crudest end were spam networks and blog farms. Google had a simple response: they excised them from the index.

At some point people worked out that leaving a comment on a blog was a very easy way to get that website to link to yours, and so would automatically spam inane “nice post” comments on every blog they could find, and so quickly generate thousands of links back to their site. The “nofollow” value was introduced in 2005 to neuter this.

Other PageRank strategies were less obnoxious. If you made some kind of cool widget or theme for people to put on their site, and then gave it away for free, you could get a whole pile of links from every site who took it up. I was fond of this trick myself, back when it worked. I'd go to elance and find a great designer from foreign lands to make a truly awesome Wordpress theme, and then I'd put a link back to my site in the footer and I'd go and submit it to a whole bunch of free Wordpress theme galleries. A lousy hundred bucks and one afternoon's elbow grease would quickly bring in a metric shitload of PageRank: first it came in from the galleries you submitted to, then from website after website that took up the theme. This felt a lot cleaner than what the spammers were doing. It didn't depend on filling the internet up with garbage. Instead it was about creating something genuinely cool for people. It was still an artificial scheme to gather PageRank to your site.

There were more shenanigans: forum spam, mass directory submissions, web 2.0 strategies, article spinning, reciprocal linking.. we're still barely scratching the surface here. For many such antics, the Penguin update in 2012 was the hammer blow that shattered them.

The thing about all these spam networks, blog farms and article directories is that absolutely nobody reads the content or clicks the links. There's really no reason why you'd visit such a site unless you're doing SEO or investigating somebody else's. Life's too short to read spastic rubbish.

Other links might be on pages that get heaps of hits but still never get a single click. Blog comment spam can definitely fit this category. The same goes for keyword-stuffed links in the bottom of freely available Wordpress themes. No real human browser actually clicks on those links in the footer, or even sees them. I know this from experience because I used to have a whole bunch of active blogs linking to my old website through the footer link in the Wordpress theme I'd given away, and these links never delivered a single visitor to my website. It was amazing for my site's rankings though.

PageRank was supposed to be a model of real human browsing behaviour. The effectiveness of these schemes massively undermined this. Remember that a given page's PageRank value is meant to represent the probability that a random surfer would arrive at that page by clicking on links. As more and more PageRank got gathered and funnelled in directions that had nothing to do with where people click, the less that PageRank worked as an indicator of a page's quality or importance. A link from a high PageRank page on a popular, high quality blog - one written for a real readership - goes a long way to speak to the quality of the target page. A high quality blog with real readers is unlikely to link them to a phishing scam or to automatically generated garbage content. A link from a high PageRank page on a blog farm network doesn't give any such assurance.

Many saw Google's moves to neuter PageRank manipulation schemes as a sign they were moving away from PageRank.

The better way to view them is as a considered defence of the original idea.

PageRank ain't PageRank

One reason why so many people think PageRank doesn't matter anymore is because they never really understood it in the first place. For many site owners and SEO guys, PageRank was never anything more than a whole number from 0 to 10 that you got from the Google toolbar or the SEO tool of your choice. This “toolbar PageRank” figure was a very useful metric some years ago. If it didn't tell you everything you needed to know on it's own, it was still a free and readily available way to get a rough idea of things in an instant. But published toolbar PageRank figures were never all there was to PageRank.

Toolbar PageRank was only ever an approximation of the real PageRank value Google use in their algorithm. Toolbar PageRank would be updated once every few months, while real PageRank values are recalculated constantly.

One of the worst things for Google about publishing toolbar PageRank figures was that it gave black hat SEO guys a simple way to objectively measure the success of their schemes, and to verify it to the world at large. New toolbar PageRank figures were last published in 2013 and are probably the last that we'll see.

What to Take Away

Don't spend a second looking up published PageRank values. There's no point anymore. Don't blindly follow PageRank manipulation strategies from 2009. They don't work.

Instead, engage with the ideas behind PageRank. They are as important as ever. The random surfer still rides the web. He's just a bit more picky about where he clicks.

Let's look at Sergey Brin and Larry Page's own words from their Stanford paper “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, written back when Google was still a university research project. They wrote:

PageRank can be thought of as a model of user behavior. We assume there is a 'random surfer' who is given a web page at random and keeps clicking on links, never hitting 'back' but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the 'random surfer' will get bored and request another random page.

Another intuitive justification is that a page can have a high PageRank if there are many pages that point to it, or if there are some pages that point to it and have a high PageRank. Intuitively, pages that are well cited from many places around the web are worth looking at. Also, pages that have perhaps only one citation from something like the Yahoo! homepage are also generally worth looking at. If a page was not high quality, or was a broken link, it is quite likely that Yahoo's homepage would not link to it.

Two decades later, this pattern of reasoning is as alive as ever. The implementation of these ideas has changed greatly in response to a changing web, but the premises behind them have not. Google still mathematically approximates real browsing behaviour to judge the worth of a link. Every update to the Google algorithm has reinforced that the links you most want pointing to your site are those that real people click.

Every big change to Google's treatment of backlinks blindsides a great many site owners and SEO professionals. For many, these changes seem arbitrary and capricious - a constant “moving of the goalposts”. Really, nobody should be caught off guard. They told the world at large exactly how they think about these things way back in the '90s. All these twists and turns are all broadly consistent with the original reasoning behind PageRank. These algorithm changes are are a decades-long refinement of the same original idea.

PageRank Still Matters

By James Mawson, August 2015

What is PageRank?

PageRank Broke PageRank

PageRank ain't PageRank

What to Take Away