Understanding The Google PageRank Algorithm in 126 Lines of Python

in Categories Beyond nixCraft, Links, Open source coding last updated December 16, 2006

According to wikipedia
PageRank is a link analysis algorithm which assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of “measuring” its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is also called the PageRank of E and denoted by PR(E).

In short it is based upon mathematically formula which use uses links as votes (more votes mean you are at top of the Google search engine).

This article explains
=> The myth
=> The algorithm
=> And the actual 126 line python code for Pagerank.

A great read to understand both math and Python code behind Pagerank. More information about PageRank can be found at following location:
=> PageRank from Wikipedia, the free encyclopedia
=> How Google Finds Your Needle in the Web’s Haystack

Posted by: Vivek Gite

The author is the creator of nixCraft and a seasoned sysadmin and a trainer for the Linux operating system/Unix shell scripting. He has worked with global clients and in various industries, including IT, education, defense and space research, and the nonprofit sector. Follow him on Twitter, Facebook, Google+.

Share this on (or read 3 comments/add one below):

3 comment

  1. Best not to get too involved with the maths. As long as you remember to accumulate inbound links from relevant sites of equal or higher PageRank you’re laughing. Best not to link to too many sites in return.

  2. Yup, it is a good idea not to mess up with Google pagerank. This program just gives us idea about math though?

    Appreciate your post.

    Have a question? Post it on our forum!