Acing the Search 1 Comment

Originally published in the October 2010 issue of SPIDER Magazine.

Some (or indeed all) of the information here might be outdated.

Acing Online Search

The Internet is an unimaginably vast and infinitely valuable resource of the world’s most remarkable information. Updated by the millisecond, it holds around 500 billion gigabytes of data and even as you read this a few more have been uploaded and downloaded several times.

The Internet is often simply described as ‘a network of networks.’  However, it is much more than just a series of tubes. From traditional Aboriginal recipes and fashion advice to the musings of a teenager from rural Kentucky–-the Internet (or the Web specifically) is everyone and everything. It is the collective consciousness of all humanity; a record of everything we were, are and want to be both as individuals and as a race. You can, quite literally, find anything you want on the Internet.

But how exactly do you find a microscopic needle in a haystack the size of a planet?

The de facto answer to most questions like that is answered with the phrase “with a well-worded Google search of course!” and a snort of contempt. But to search the mind of the human race requires a little more effort, since many times what you’re trying to find is a little more complicated than a list of page-ranked results can satisfy. The problem becomes even more apparent when you’re trying to consolidate different sources of information to try and reach a decision.

There are a few different ways search engine developers have chosen to tackle this problem. Setting Google and Bing aside for a while, we’ll take a look at some of them here.

Specialist searches

One way to solve the problem of information overload is to focus on just one area and develop expertise therein.  Services like OmniMedicalSearch, Nolo (for legal information), VideoSurf and AppStoreHQ are websites that focus specifically on searching for content in their own particular niches. These engines therefore search only a subset of the web; AppStoreHQ for instance, specializes in finding mobile apps for the various smartphone platforms and ranks results by the buzz they generate on social networks across the web. KidRex or Quintura, on the other hand, specialize in children-safe webpages.

The power of today’s Web undoubtedly lies in the constantly updated and highly personal social networks. Product reviews, recommendations and personal opinions matter a lot, and searching the blogosphere and twittersphere for opinions on a particular product or service can be a great tool for anyone looking. By analyzing the language of blog and Twitter postings, engines like Tweet Sentiments and Social Mention can give you a top-down look at what the Internet (i.e. the inhabitants thereof) generally thinks of something. Another facet of social search is people search across the many networks. Pipl, yoName and whozat sport special capabilities to search out a user from the dozens of social networks by name or username.

Video and image search has also taken giant leaps forward, with sites like TinEye and Viewdle that can analyze image and video content (respectively) to provide similar and relevant results.

Searching between the lines

Traditional search engines function on keyword search, i.e. they take each word in an entered query and look for it against their database of webpages. The process is a little more complicated than that of course with page and source rankings, but that’s basically it.

Semantic search engines, on the other hand, can understand what you’re asking-–the semantics of the search query. The biggest names in this space are Powerset (now part of Bing) and TrueKnowledge, two natural-language search engines that specialize in their field.

Many of the semantic search engines brand themselves as ‘answer engines’ instead of simple search, since with them you can ask a natural question instead of a keyword-based phrase. For example, I am trying to find out whether Mt. Everest is higher than Angel Falls. I can try two approaches: first, I could simply search for the absolute heights of the two natural wonders and compare them myself, but then what are computers for anyway? The other way would be to simply enter the question, “Is Mt. Everest higher than Angel Falls?” into the query box at TrueKnowledge and instantly get the answer “Yes.”

TrueKnowledge parses my question, recognizes that ‘higher than’ is a left comparison for height, gets the height of Mt. Everest and Angel Falls, compares them and answers my question. This chain of reasoning is then also presented to the searcher.

The power of semantic search lies in its ability to identify concepts rather than just keywords. The same search above would have delivered confusing and entirely unsatisfactory results in the regular sites like Google and Bing, which is a definite one-up for these startups.

Cpedia, by the search company Cuil, is another way of parsing the vast amounts of information available online. Dubbed as an automated encyclopedia, it generates reports on-the-fly based on the query entered, instead of simply providing a list of relevant pages. For example, a search for ‘Pakistan floods’ returns an appropriately formatted and sourced document with statements extracted from relevant pages. Cpedia is the first of its kind and is currently in alpha. Since the time of this writing Cuil has been taken offline, and will most probably stay dead.

WolframAlpha is another big name in the answer engine category. Developed by the makers of Wolfram Mathematica, it is primarily a tool for mathematical analysis, but is surprisingly powerful at understanding user input. For example, searching for a particular date will tell you the difference between then and now, important events and other notable facts. Another thing it’s great at is data consolidation: for instance, if for some reason I want to know what the weather in Belgium was the day Steve Jobs was born, I would simply ask “what was the weather in Belgium the day Steve Jobs was born?

Locally speaking

Amidst controversy over the PTA banning search results on Google and Bing arose also the news of two Pakistani teenagers launching their very own search engine, aptly called PKSearch. Developed by Moiz (19) and Muqeet (16) Qadir in Karachi, PKSearch indexes Pakistani websites exclusively, and might prove to be an excellent local search tool. Their algorithm seemingly works similarly, at least in concept, to that of Google’s PageRank and according to their Facebook page is undergoing some revision. Future plans include an image search feature and an advertising network.

While PKSearch offers nothing as exciting as WolframAlpha or TrueKnowledge, and can be accused of reinventing the wheel, it is certainly a commendable effort and a refreshing change to see Pakistani developers trying to hit it big.

Powers combined

As computers become smarter with the way they understand language, search engines will naturally become better at parsing queries for meaning and context. It would be folly to assume Google and Microsoft are ignoring these developments into natural language processing. While their algorithms are proprietary, it can be safely said that at least some weight is applied to semantic analysis of search queries.

So what can we expect from the search engines of tomorrow? For one, they will have a better grasp of what we are trying to ask and therefore will more closely approximate the answers we want; they will also be more adept at combining different data sets and presenting an overview of the results. In short, they will be more of decision engines and less search engines.

One Response to Acing the Search

Add your own!

  1. I think its amazing how search engine tech has evolved over teh years. Semantic search or “natural language search” as you put it like TrueKnowledge is especially impressive with its logic decision tree. With speech recognition getting better and better, pretty soon I’m sure we will have robots listening to and comprehending what we mean! Future, here we come!

Leave a Reply

Your email address will not be published. Required fields are marked *

Connect with Facebook

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>