Since the dawn of the Interwebs, there have been three major activities people online have engaged in:
- Generating content (building websites, uploading media, etc).
- Searching for content.
- Viewing content (created by other people).
People reached content through a process which could generally be described as searching.
Web search had three distinct eras:
The Yahoo Era: They decide the results for your search
A single company maintained directories, and built lists They decided what site fitted in what category.
Why this approach eventually failed:
No single company can decide the relevance or accuracy of search.
This failure led to the second era of search,
The Google Era: Webmasters and algorithms determine search results
Google came up with a better idea - how about we let other people with websites decide - Google currently uses PageRank as a method of deciding the relevance of results (they use other technologies but PageRank is predominantly responsible for assigning the order of results.
PageRank, grossly simplified, works something like this: Websites decide each others relevancy for a particular term.
e.g. a user searches for McDonalds. Google checks its directory to see “which site has the most incoming links labeled McDonalds” and delivers that site.
Why this method WILL eventually fail / what is wrong with this approach:
When you let other websites decide, you are still at the mercy of those websites. (See the concept of GoogleBombing, for example).
Basically, this concept has one major flaw: It’s open to gaming by other webmasters - those with money can buy their way in, and/or game results through many different ways (complex interlinking, paid links, etc).
An ideal search should deliver results which are accurate, not just popular
The Digg Era (Not really search but it’s content delivery still):
With Digg (Or Reddit, FARK, etc), two interesting things happened to search:
- Search comes to you - Social sites automatically display the most popular content for the day, according to ordinary visitors. So instead of you heading out and searching for content, the content comes to you through a process of statistical analysis by your peers.
- Popularity is decided by ordinary people - Not a single entity (Yahoo) or other people with websites (Google). This makes the results more democratic, but still not perfect.
Note: Digg (and the other social sites) are not search engines but I believe they are used as search engines and the concept might be applied to other sites.
What’s wrong with the Digg model:
- Friend networks: Digg may be considered democratic, but still the vast majority of content comes from a few users or groups. It’s not rigged as such, but it just isn’t perfect yet.
- The Reputation concept: Under digg, something Robert Scoble posts is more likely to make it, than something I post. Yes, this makes sense to most people, but it still isn’t perfect - In a perfect system all content should be equally judged on merit, not just on who wrote it, or what website it was published on.
So how do I want to see the future of Search on the web?
In a perfect system, every piece of content would have an equal weight, and be judged purely on its own merit, not on who posted it, what IP they had, which website they said it on, what their skin color, nationality or political affiliation was, etc.
So how could that ever be achieved? What other concepts do I hope to see in search engines of the future:
- Content existing as itself in blocks, Search Engines of the future should see data independent of creator - i.e. without attribution - so as to remove bias. In this perfect system
- Clustering: complex relationships explored in search, including relevance and relationship between similar concepts and searches.
- Independent formats (microformats?): Data items (for example this post) would exist as an independent unit (data may exist independent of formatting, layout, perhaps even independent of language),
- Future Engines should ‘understand’ data: i.e. a search engine would do more than just copying this to its cache and tagging keywords - it would attempt to place this post and explore the concepts and/or compare it to others in its archives.
- True democracy: a search engine would see content independent of the web site it is hosted on, or the person who created it, and rank it on purely on its intrinsic value.
Add comment January 3rd, 2007