BusinessWeek Logo

Why Technorati feels slow

Posted by: Stephen Baker on July 08

The last few weeks, I’ve found Technorati frustratingly slow. So I called the company asked what’s up. Adam Hertz, vice president and chief engineer, gave me the lowdown. In short, Technorati is struggling to keep pace with explosive growth of blogs. Adjustments, he says, are like “changing a flat tire on a moving car.” New services on the site add more complications. These challenges show no signs of slowing. The upshot? While Technorati is the leading brand in blog search, it’s in a daunting tech race. This spells opportunities for others, from Google to PubSub, if they muster the machinery and algorithms to master the blogosphere.


In the last year, with the blogosphere doubling twice in size, Technorati has had to re-engineer its system. Originally, says Hertz, it dealt with all the data in one big (and ever-expanding) pool. In the last nine months, engineers have rearranged the data in different segments. At the same time, they're enabling it to comb through the data more intelligently, sorting each piece so that it can be cross-referenced. For example, this post can be associated with me as a blogger, with Blogspotting, with BW, with Technorati, with the search industry, and with any of you who link to it. Each one of those relations has meaning and value. But offering all these dimensions adds layer upon layer of complexity to blog search. "In general, our traffic isn’t the big gating factor," he says. "It’s the amount of new data that we’re managing."

New services will continue to add to the complexity. In the future, says Hertz, Technorati will organize bloggers by their specialties, and perhaps even rank the authority they have on certain subject matters. (Just imagine the controversies that will create: A blogger writes a post slamming Intel's new chip, and another, boosting a far higher semiconductor rank in Technorati, rebuts it.)

For many, the first impulse when faced with a crush of blog data would be to add servers. That's the easiest part, says Hertz. "The trick here is when we have to break things into pieces, or invent brand new systems to do the data management."

What's more, blog search engines, unlike Google, have to update this data continuously. They're providing a look at time as it passes. Yesterday, with the London bombing, traffic exploded, taxing the Technorati system. Instead of the usual 800,000 new posts, Technorati was on track yesterday to process 1.2 million of them.

I'll attach the notes here for those who want to read more. Download file

TrackBack URL for this entry: http://blogs.businessweek.com/mt/mt-tb.cgi/

Reader Comments

Bob Wyman

July 8, 2005 01:50 PM

There is no question that the Blogosphere is growing rapidly. For some indication of the current number of messages begin published, I recommend that you take a look at some of the statistics that we provide at PubSub.com. Take a look at the chart at:
http://www.pubsub.com/linkcounts_graphs.php?type=newentries
There you'll see that we've been averaging over 1.2 million new entries per day for the last month. There was definitely an increase in posts yesterday and I anticipate another today.

The challenge in providing online search services is definitely one of scaling and will remain so. Our expectation is that the Blogosphere is likely to reach 100 million blogs by the end of this year -- if not before. Thus, we all need to be working on scaling for that load today. We can't wait until it appears or we won't be able to keep up.

Unlike most database applications, search services typically need to be scaled on at least two dimensions -- not just one. We need to scale both "horizontally" and "vertically." By this I mean, in the case of a retrospective search engine, first partitioning the query load across multiple replicated servers as well as partitioning the database copies across multiple machines. The result is a "grid," "cube," or multidimensional "hypercube" of servers rather than a single monolithic server. An interesting research paper which provides an easy to understand discussion of such scaling can be found on Yahoo!'s research site. See: http://research.yahoo.com/publications/15.pdf
(Note: The stuff in the back of the paper is covered by a patent, however, the discussion on the first few pages is "common knowledge and practice" in this business.)

Fortunately, a great deal was been learned about search engine scaling during the 90's and now the techniques are fairly commonly understood. Of course, as simple as the techniques may be to describe, there always remains a sigificant engineering and deployment challenge to making it happen.

bob wyman
CTO, PubSub.com

Randy Charles Morin

August 9, 2005 01:31 PM

Here's a new search engine to try out...

http://www.kbcafe.com/links.aspx?q=http%3A%2F%2Fwww.businessweek.com%2Fthe_thread%2Fblogspotting%2F

Heather Green

August 9, 2005 02:49 PM

Hey Randy,

Thanks much for the link. Really interesting service!

Post a comment

 

About

In Blogspotting Senior Writer Stephen Baker and Associate Editor Heather Green take a look at how cutting-edge technologies are changing business and society. Whether its blogs or wikis, data crunching or data targeting, technology’s advances are reshaping the world that we live in.

BW Mall - Sponsored Links