The Anatomy of a Search Engine
The "in and out" of PageRank, reasoning behind usage of Anchor text, the other smart features such as location information (proximity) or font size or weigh : the bigger the better (headlines)... with this paper you discover the foundation of the most popular search engine : Google.
Very instructive,and probably still very actual, as Google must have essentially expanded on and profit from the initial concept.
Reading the paragraph about "the difference between the Web and well controlled collection", I wonder... At the time this paper was published blogs were barely nescent, certainly not as largely spread as they became in the last 3 years.
BLogs introduce a new dimension to search. It's data mining made by people... a lot of people... which also produce their own input which qualifies even further a particular web page. That looks like a relevant information to track and build indexes on, no?
Technorati, Feedster, Bloglines, do propose blogs search engines, but so far I am not convinced by their results...
I would love to know if blogs happen to be next on Google's lab roadmap...
Another one that got me thinking in the Appendix A, talking about Advertizing and Mixed Motives:
"But less blatant bias are likely to be tolerated by the market. For example, a search engine could add a small factor to search results from "friendly" companies, and subtract a factor from results from competitors. This type of bias is very difficult to detect but could still have a significant effect on the market."
...hmmm... Maybe today's corporate Google use such 'hard to detect' techniques to damage their competitors? How does Yahoo perform in Google as a result of relevant queries would be interesting to watch !
Posted by agnes at 10:50 PM