Better search for the Twitter age
Search engines are cropping up to sift through constantly updated sites like Twitter, Flickr, and blogs, but they still have a long way to go.
NEW YORK (Fortune) -- When Iran cracked down on journalists following its recent election, international focus turned to Twitter as citizen journalists posted 140-character reports and links to photos and videos to the site. Trouble was, it was hard to sift the useful and reliable nuggets of information from scores of tweets that included plenty of spam, useless remarks, and stray sentiments.
Few events more clearly define the newest problem on the Web: how to make sense of all that real-time information bubbling up from Twitter, Facebook, Flickr, blogs, and really every single other self-publishing platform in cyberspace. It simply overwhelms.
Google (GOOG, Fortune 500) -- the tool consumers have relied on for a decade to organize the Internet for us -- isn't cutting it. At Google's Zeitgeist conference in May, co-founder Larry Page acknowledged the company has fallen behind Twitter, saying they've "done a relatively poor job of doing things that work on a per second basis."
A fast-growing group of startups aspires to pick up the slack by offering real-time search results. With names like Collecta, OneRiot and Scoopler, these companies attempt to provide answers to the question: what is happening on the web right now? Nearly every week, a new one is added to the mix. IDC analyst Hadley Reynolds explains, "Yahoo, Microsoft and Google will take a while to figure out how to cope with this. There is definitely a window of business opportunity for startups."
But providing these results is not straightforward. Inevitably, there is tension between information that is most recent, stuff that is most popular, and a subjective concept of content that is most important. There is no perfect solution.
Information filtered only by time is nearly as unwieldy as the data stream itself. But once you begin to add filters, weighting the results according to the authority of the publisher, for example, or the rate at which they're spreading across the web, you risk missing important trends and information tidbits because they are not popular.
The bulk of this stream of constantly updating information comes through Twitter. The site's search engine, a startup it purchased in 2008 called Summize, turns up results filtered only by time. Collecta, a search firm started this month, also filters by time, but draws from other blogs and social media sites on the web.
Time might not be the magic filter for search results -- or the best way to make sense of the information stream. Many of these startups, like Scoopler, have developed algorithms that attempt to unearth not just the latest tweet, but also the most popular. Among the more well known so far, OneRiot relies on a system called PulseRank that takes into account how fresh information is, how much authority the information's author has, and how quickly it's spreading to determine its rank.
So far, however, none of these search engines is particularly good. The interfaces are largely gaudy and none of the results are reliably useful. The day after Bernie Madoff was sentenced to 150 years in prison, a quick search on a smattering of these sites revealed: a "goodbye Bernie" YouTube video, a tweet linking to a Daily Show video, and another tweet asking who should play Bernie Madoff in a potential movie.
Maybe it's inevitably true that even a short lag lets the web sort out the kind of information that's ultimately useful. Google's search the day after Madoff's sentencing yielded a Los Angeles Times news story on the hearing followed by the fraudster's Wikipedia entry.