?? Tug of war at BW between online edition and blogs |
| Yahoo is testing blog and RSS search ??
July 08, 2005
Why Technorati feels slow
The last few weeks, I've found Technorati frustratingly slow. So I called the company asked what's up. Adam Hertz, vice president and chief engineer, gave me the lowdown. In short, Technorati is struggling to keep pace with explosive growth of blogs. Adjustments, he says, are like "changing a flat tire on a moving car." New services on the site add more complications. These challenges show no signs of slowing. The upshot? While Technorati is the leading brand in blog search, it's in a daunting tech race. This spells opportunities for others, from Google to PubSub, if they muster the machinery and algorithms to master the blogosphere.
In the last year, with the blogosphere doubling twice in size, Technorati has had to re-engineer its system. Originally, says Hertz, it dealt with all the data in one big (and ever-expanding) pool. In the last nine months, engineers have rearranged the data in different segments. At the same time, they're enabling it to comb through the data more intelligently, sorting each piece so that it can be cross-referenced. For example, this post can be associated with me as a blogger, with Blogspotting, with BW, with Technorati, with the search industry, and with any of you who link to it. Each one of those relations has meaning and value. But offering all these dimensions adds layer upon layer of complexity to blog search. "In general, our traffic isn?? the big gating factor," he says. "It?? the amount of new data that we??e managing."
New services will continue to add to the complexity. In the future, says Hertz, Technorati will organize bloggers by their specialties, and perhaps even rank the authority they have on certain subject matters. (Just imagine the controversies that will create: A blogger writes a post slamming Intel's new chip, and another, boosting a far higher semiconductor rank in Technorati, rebuts it.)
For many, the first impulse when faced with a crush of blog data would be to add servers. That's the easiest part, says Hertz. "The trick here is when we have to break things into pieces, or invent brand new systems to do the data management."
What's more, blog search engines, unlike Google, have to update this data continuously. They're providing a look at time as it passes. Yesterday, with the London bombing, traffic exploded, taxing the Technorati system. Instead of the usual 800,000 new posts, Technorati was on track yesterday to process 1.2 million of them.
I'll attach the notes here for those who want to read more. Download file
TrackBack URL for this entry:
Listed below are links to weblogs that reference Why Technorati feels slow:
?? Technorati and the Giant Blog Beanstalk from Micro Persuasion
Steve Baker at BusinessWeek got the scoop on why Technorati's been so slow lately. Apparently, Technorati is struggling to keep up with the explosive growth in blogging. Like Steve, I smell an opportunity for Google. By the way, Steve has [Read More]
Tracked on July 8, 2005 11:33 AM
?? Explanation for why Technorati is slow from Emergence Marketing
Like most people, I have been extremely frustrated with Technorati's unreliable server behavior - especially the search feature and the feature that keeps track of in-links. Stephen Baker over at Business Week's Blogspotting spoke with the Chief Engine... [Read More]
Tracked on July 8, 2005 11:37 AM
?? Why Companies Must Track Blogs - Redux from B.L. Ochman's weblog - Internet strategy, marketing, public relations, politics with news and commentary
Shel Holtz notes today that Technorati has never responded to his query about why his tags are not being picked up on their site. (Mine aren't either.) And that Apple hasn't responded to two queries either. "I might as well have folded them into paper ... [Read More]
Tracked on July 8, 2005 01:04 PM
?? The (Terrible?) State of Blog Search from Backweave
Steve Rubel exposes Yahoo's stealth blog/RSS search engine, which has since been taken down. From Rubel's screenshot, it looks like Yahoo simply applied its web search engine to a corpus of blog posts, ie. yet another Feedster. Unsurprising [Read More]
Tracked on July 8, 2005 06:46 PM
?? Reasons Behind Technorati Slow Down from WiRED.Pod
In a recent phone conversation between Stephen Baker (of BusinessWeek) and Adam Hertz (Technorati?? vice president and chief engineer), Hertz mentioned that the slow down was caused by sudden explosive growth of blogs and new services added to... [Read More]
Tracked on July 9, 2005 12:18 PM
?? Yahoo testing RSS and blog search from Sniptools
yahoo rss search [Read More]
Tracked on July 10, 2005 11:26 AM
?? Why Technorati feels slow from New Media Marketer
Why Technorati feels slow. Technorati is struggling to keep pace with explosive growth of blogs. [Read More]
Tracked on July 10, 2005 05:30 PM
?? Who Will Be #1 In Blog Search? from Somewhat Frank
There are a number of blog search engines out there today, however, no single one has emerged as the clear champion in this realm. While Technorati has become the current blog search directory front-runner, the recent explosion in the number [Read More]
Tracked on July 10, 2005 07:49 PM
?? Technorati's speed from Preoccupations
Technorati has been frustratingly slow f [Read More]
Tracked on July 12, 2005 05:37 AM
There is no question that the Blogosphere is growing rapidly. For some indication of the current number of messages begin published, I recommend that you take a look at some of the statistics that we provide at PubSub.com. Take a look at the chart at:
There you'll see that we've been averaging over 1.2 million new entries per day for the last month. There was definitely an increase in posts yesterday and I anticipate another today.
The challenge in providing online search services is definitely one of scaling and will remain so. Our expectation is that the Blogosphere is likely to reach 100 million blogs by the end of this year -- if not before. Thus, we all need to be working on scaling for that load today. We can't wait until it appears or we won't be able to keep up.
Unlike most database applications, search services typically need to be scaled on at least two dimensions -- not just one. We need to scale both "horizontally" and "vertically." By this I mean, in the case of a retrospective search engine, first partitioning the query load across multiple replicated servers as well as partitioning the database copies across multiple machines. The result is a "grid," "cube," or multidimensional "hypercube" of servers rather than a single monolithic server. An interesting research paper which provides an easy to understand discussion of such scaling can be found on Yahoo!'s research site. See: http://research.yahoo.com/publications/15.pdf
(Note: The stuff in the back of the paper is covered by a patent, however, the discussion on the first few pages is "common knowledge and practice" in this business.)
Fortunately, a great deal was been learned about search engine scaling during the 90's and now the techniques are fairly commonly understood. Of course, as simple as the techniques may be to describe, there always remains a sigificant engineering and deployment challenge to making it happen.
Posted by: Bob Wyman at July 8, 2005 01:50 PM
Here's a new search engine to try out...
Posted by: Randy Charles Morin at August 9, 2005 01:31 PM
Thanks much for the link. Really interesting service!
Posted by: Heather Green at August 9, 2005 02:49 PM