Just what is Kumo, anyway? (And is it open source?)

kumo_thumb_2659fa12 News Some 90,000 Microsoft employees are currently testing a new version of Live Search under the name Kumo (well, some of them are testing it, the other half are using Google, apparently).  Expected to be released (or at least announced) in early June, Kumo, or whatever it ends up being called, seems to be taking on a life of its own as Microsoft stays mum and Twitter and the blogs have a field day. 

(Note that the Kumo name may not be final, from what we hear, and that the final product may still indeed be called Bing or Sift or something else entirely. We’ll call it Kumo for now, and wait to see if the name sticks).

Is Kumo just a rebranding effort, a bold foray into the semantic web, or just a rebate program for eBay?  Lots of speculation but not much in the way of facts out there, so just what is Kumo, anyway?  Let’s take a look at what to expect:

Rebranding

First and foremost, Kumo is a rebranding effort.  For whatever reasons, the Live Search brand has fallen out of favor at Microsoft.  And good thing too, with the rise recently of real time searches  like Twitter Search and many others which are being called, well, “live search”.  We wrote a few months ago about our feeling that a strongly Microsoft branded search product might not fly in a Yahoo! partnership, and that may be part of the thinking.  In any event, Microsoft is committed to a new brand for Live Search, with a new $100 million dollar ad campaign to follow. 

Live Search moved some time ago out of Windows Live (it’s in the Online Serves Division now, under Qi Lu and along with MSN and Microsoft Advertising), and a rebrand may do as much to strengthen the Windows Live brand moving forward as it will to establish a new identity for “the thing formerly known as Live Search”.

Ten Blue Lines

The mantra around search at Microsoft is that there is still lots left to be done, that the “ten blue lines” presented by Google in search results just isn’t enough.  When Microsoft made the switch internally from Live Search to Kumo in early March, some screenshots appeared at some of the big tech news orgs showing a new search results page:

kumoscreenshot_thumb_1c94929a News  

We’re hearing that the page has evolved quite a bit from the look you see here, but the idea is a better presentation of search results on the page, to make search more useful and attractive, and to differentiate from those “ten blue lines”.

Powerset

Microsoft acquired San Francisco based Powerset last July, promising at that time that it would help to bring search “to the next level”.  Semantic search, or attempting to understand the intent of the search phrase and not just looking up keywords, might just be the next big thing.  The Powerset blog is beginning a series of podcasts on semantic search.  The first one, published today, takes a high level look at semantic search:

Powercast with Dave Fayram from officialpowerset on Vimeo.

It isn’t yet clear how much of Powerset is in Kumo (our guess is not much), but as Powerset had only indexed Wikipedia (a fine proof of concept, but of limited usefulness), it is not much more than wishful thinking to say that Kumo=Powerset, at least at this point.  Kumo at its core is still Live Search by another name, not a whole new search offering.

Open Source

Much has been made recently about some comments by members of the Powerset team about their use of open source software, specifically Hadoop.  From that, blogs and tweets have had fun connecting dots: “Kumo includes Powerset and Powerset=Open Source, so Kumo=Open Source, Wheeee!” 

Just a quick primer: there is a specialized area of database development where extremely large data sets (for example search indexes) are involved. These specialized efforts consist of two main parts, a file storage system, and a programming model to query the stored information. Google has its own system, with the Google File System and Bigtable as the storage part, and MapReduce, using C++ and Sawzall as an execution environment.  Yahoo! developed its own (open source) system, using Hadoop, and Pig (or Pig Latin).

Microsoft has its own system, too.  Cosmos is built to run on Microsoft servers, and SCOPE is a SQL based scripting language to query against it.  Cosmos is in general use internally, is used by Live Search and Windows Live, and according to a white paper published last summer, has significant advantages over the Google or Yahoo! offerings:

Both Google and Yahoo! use a MapReduce execution environment. MapReduce is very rigid, forcing every computation to be structured as a sequence of map-reduce pairs. The Cosmos execution environment is significantly more flexible, handling execution of any computation that can be expressed as an acyclic dataflow graph.

Microsoft may find itself engulfed in Hadoop if a search deal with Yahoo! does come about, and in some ways Powerset may actually help ease a transition.  However again, as far as we are aware, Microsoft is not ditching its own system for Hadoop, certainly not for the launch of Kumo.

Search games

Live Search Club, the recently retired Live Search Perks, and Live Search cashback have all been attempts to push people towards Live Search, and to capture the most lucrative searchers (the ones looking to buy something), in the process.  While these programs have met with limited success, there have been hints of more to come.  This may be an area where some of that $100 million ad money might be well spent.

We’ll soon find out

If the countdown clock is correct, we should know a lot more about Kumo, or Live Search v.Next, soon.  We’ll be right here to bring you all the latest.