Posted by: Rob Hof on October 1, 2009
Udi Manber, Google’s vice-president of technology for core search, joined the company almost four years ago after stints running Amazon.com’s A9 search project and serving as chief scientist at Yahoo. He, like some other leaders on the search quality team, has 20 years of experience in search going back into academia—meaning before the World Wide Web, when it was known as information retrieval.
I talked with Manber on two occasions in recent weeks for my story on Google’s search operation, which appears in the latest issue of BusinessWeek. Customarily stingy with details about Google’s inner workings, to thwart competitors, Manber nonetheless provided a lot of insight into how Google’s core search quality team does its magic. And while he sounds quite confident about Google’s prospects—at first, I wondered, maybe too confident?—it’s also clear that he realizes the threat always looming for any technology company in the form of some unexpected breakthrough from an upstart. After all, Google was one of those upstarts not so very long ago.
This transcript, like the others to come over the next several days, is fairly long. I opted to leave in as much as possible to provide more details for people who are really interested in the inner workings.
Q: How does Google approach the process of improving search?
A: We want to improve search, but unlike networks or disks or some operating system, it’s not easy to measure. If you want to improve a network, you make it faster and bigger and more robust. For search, it’s much more complicated.
You also want to be faster, bigger, more robust. But that’s not the main measure. The main measure is whether we help our users and give people what they need. And that now has to do with people, not necessarily some mathematical measurements. You really have to understand and kind of feel it all the time.
In some sense, it’s against the grain for a computer scientist. As a computer scientist, you learn to deal with science and how you improve particular parameters. A lot of computer science now is directly about helping people.
Q: As you said at Google’s Searchology event in May, the 21st century will be about understanding people.
A: Exactly. But that means that we need to build our systems that not only understand that …but that (allow us to) actually measure things. So we built a measurement system to really understand what people need. That means for us to improve search, we have to figure out what is missing and why it is not perfect for people.
When we suggest an improvement, we need to know whether it’s really an improvement. We have hundreds of engineers and they have great ideas, but a lot of this is kind of intuitive. You may have a great idea and you implement it and it looks good, but it turns out it might hurt more cases than it helps.
Q: How do you determine that a change actually improves a set of results?
A: We ran over 5,000 experiments last year. Probably 10 experiments for every successful launch. We launch on the order of 100 to 120 a quarter. We have dozens of people working just on the measurement part. We have statisticians who know how to analyze data, we have engineers to build the tools. We have at least five or 10 tools where I can go and see here are five bad things that happened. Like this particular query got bad results because it didn’t find something or the pages were slow or we didn’t get some spell correction.
Q: But can those tools really determine what must often be a subjective assessment of whether results were good for a particular query?
A: You have to have really good intuition because you can’t do one query at a time. If you look at all the unique queries we get in a particular day, about a third of them we’ve never seen before. One-third of them every day are unique. If you normalize for traffic in any random query … still about one-sixth of the total traffic is completely new every given day.
I wish there was a formula. But this formula would have to simulate a person. We don’t know how to simulate a person. Yet. Maybe someday somebody will.
Q: What are some of the measures you use?
A: There are obvious things we can do. If somebody did a search for something and didn’t click on anything, that’s a sign that maybe the results are not good. It’s not a good enough sign. All these signs are just an approximation.
Here’s a very good example: You want to try to improve spell-checking, right? So what you expect is that if the spell-checking is correct, people will click on it. That’s not the case. People use us as a spell-checker—they get the right answer and go away. So even though we didn’t get any click, we got the perfect result. So you want to have no clicks.
There are other cases where the results are in the snippet. You don’t have to click on it. It was the perfect result. So you have to be careful. Sometimes when you improve search, you actually get less clicks. And that’s fine.
Q: Microsoft recently said its studies showed people weren’t getting the results they wanted the vast majority of times they search.
A: I don’t know how they computed that. But the numbers they had, I can tell you, were way too high. Maybe that’s true for them. But I shouldn’t say that because I don’t know what they mean by that.
Q: Can you give me a sense of the types of methods you use to improve search?
A: Humans are involved, formulas are involved, experiments are involved. We often do A/B tests, give one set of people an algorithm, give another set of people another set of algorithms and see how they behave. We measure lots of things, not just clicks.
Q: I can imagine that one change that’s good for particular results can have unintended effects on other results. I’m thinking of the butterfly flapping its wings in South America ultimately causing a storm in the North Atlantic.
A: Oh, we absolutely get that. We improve spell-checking somewhere, and suddenly, something completely unrelated changes.
The way you rank is, you score. So every result has a quality score, which is a result of hundreds of different things, and then you sort the results in order. So you change one thing, and one score goes from, say, 5,000 to 5,001. It’s a small change. But it turns out that the three top results were all (scored) 5,000 and suddenly the 5,001 goes to result No. 1. These things happen all the time.
Q: You’ve been in search 20 years, almost four years at Google. Have you changed the process of how you try to improve search in that time?
A: The processes are much more smooth now. Especially the evaluation process and the decision process is much more smooth. We’re very comfortable with that. To do a ranking improvement, you have all the infrastructure in the world to try your idea quickly, sometimes in a day, and get enough data to know whether it’s good enough. Then you have more infrastructure to do a more thorough analysis and we give you all the support.
It’s not like you need to get approval for anything. Any engineer can come up with any idea, can test his idea very, very well, and then analysis and all the work to evaluate an idea, and then they just have to come to a meeting. We know what numbers to look for. And we can make a decision in five minutes. All that allows people to innovate.
Q: You have a weekly meeting of search quality team leaders. What happens there?
A: The crux of those meetings is usually: what’s wrong, how can we fix it, how can we use this insight to do other things. It’s 20 people that are top in their area. We consult and advise them on what else to do.
In general, this is just an overview of certain areas. But what people try to achieve from those talks is to try to discover some things that may not work well and try to use what they’ve done in other areas. Sometimes we bring other groups and grill them. And sometimes we bring our own groups and grill them. But it’s all productive. We’re not saying “you’re bad.” It’s all technical.
Q: Are people searching for different things or in a different way than five years ago that you need to address today?
A: Yes, it has changed quite a bit. For one thing, we’re much better at search. We’re better at the evaluation part. And there are more people who understand the search engine much better. Five years ago, there were maybe three or four people who knew it, and today there are probably 50. That’s probably than all the rest outside of Google. It takes a very, very good engineer about two years to really understand search.
Q: So what’s important today specifically—real-time? Social?
A: All those things are important. We look at all those things. Our job is to give people what they need in terms of search. So if want to follow somebody on Twitter, you should go to Twitter and follow somebody. But if you want to know what happened on that particular topic right now, that’s real-time search. So we need to bring tools. We already have quite a bit, and will have more.
If something is written on the Web that is important, we should bring it back to you in seconds. Right now we’re in minutes. Five years ago, it was once a month. We’ll try to make it faster and faster. Clearly we have the ability to do this. It’s getting possible. Now it’s five minutes, and everybody goes, “Five whole minutes? It should be five seconds.”
Q: But you can’t index everything or even a good number of Web sites that fast.
A: Sure, but not that many things change that fast.
Q: So you have to determine what does change and focus on indexing that?
A: We have to determine from the query whether it can benefit from something in real-time. Like “history of the Renaissance.” It’s possible that somebody on Twitter just mentioned that. But a) it’s not that likely and b) it’s probably not what you want. You want the best article on the Renaissance. So time is not as important on that kind of query.
But search for “earthquake” and time is much more important. Or a particular celebrity that had news in the last five minutes. So we have to change the algorithm based on the query. We do that now.
Q: Aren’t a lot of queries going to be ambiguous in intent?
A: Of course. That’s why we can’t be perfect. But a lot of it is not.
I have a really good example: “New York Times address.” You think, what can be more clear than that query? In fact, the first snippet had the actual address. And all the other links had things like the address and the headquarters. Turns out that’s not what the user wanted. And I can tell because I can see what they clicked on. We inserted somewhere a very new result, a fresh result, that talked about an address given by a New York Times reporter the day before. That’s what the query was about. We got it in the top 10. It’s possible the next day that result will go down.
Q: At some point, does there need to be an interface change that lets you say, “I want results in the last five minutes”?
A: We have that right now. We have (in Search Options the choice to choose) the last 24 hours. We can add “five minutes.” That would be one very clear way to do this.
But that’s not going to be the main thing that we’ll do, because most people don’t want to go in and understand what are the features. We’ll have to understand what you need and bring it to you, and guess as much as we can. It can’t be 100%, so we should give you all those options. But most people just want the right result.
Q: How much of Google’s ability to provide real-time and social search will depend on access to the data, which is limited in the case of both Twitter and Facebook?
A: There are lots of different ways to allow us to get that data.
Q: What about social search, that is searching either your friends’ posts or using their posts or links to inform searches?
A: Sometimes you want to get information your friends recommend. It’s not necessarily the best information. If you want to go to a restaurant or a movie, you can argue that if I give you 7,000 reviews all over the Web, that may have the best information, but you really want to know what the three people that you trust say. For you, that’s better information than everybody.
It’s tricky. We actually experiment with this quite a bit. The question is how often do you really want to be influenced by your friends and how often do you prefer to be influenced by more, by the total knowledge out there?
The way we do it is through personalized search, based on your previous queries if you allow us to do that. I have a feeling for restaurant reviews and movie reviews, these are very specific areas. We’re probably not doing as good a job as specific sites dedicated to that can do. But I think we do pretty well even for that.
I’m of two minds. Maybe I just don’t have enough friends. I find it’s about average. Sometimes I get good things, sometimes not so good.
Q: We’re seeing a lot of companies offering specific kinds of search. Will search become more splintered among many services, or will most people continue to rely on one search engine for most of their needs?
A: Google is central. There will always be cases where very focused, very specialized content will be better for them. That’s absolutely fine. If you’re a researcher and you want medical information, you’re going to go to Medline. It has a much better interface than we have, there’s no question about it. The best we can do is to point you there. We can give you a lot of financial information, but but if you’re a trader, you probably want to have data within microseconds.
Q: Will one size fits all be good enough in the future?
A: Aggregation of so many different things (on Google) gives you more chances of finding what you want. If you know ahead of time that what you need is in a particular niche, that’s fine. But if you can go to one place and there’s a good chance you’ll find it there, why not?
I don’t see it as a failure on our part if somebody does a search somewhere else. It doesn’t even bother me. I think that’s good. It’s good to have a lot of diversity. It’s good to have a lot of competition. In some sense, it’ll drive us to improve if it turns out to be something a lot of people need.
Q: Might not it affect Google’s business if people in larger numbers go to more specialized search engines?
A: Sure. (It’s just that) we cannot be one size fits everything. That’s impossible. If it turns that somebody offers a better service than we do, that’s a concern. If it turns out that we don’t satisfy needs, that’s a problem. What happens now is we satisfy more and more needs.
Q: How do you know that?
A: I see more diverse queries, I see more hard queries, and our market share is going up.
Q: How important is the user interface for search today vs. a few years ago?
A: More and more people are more comfortable with search, and they want more power tools. So we want to provide it to them, even if only 1% or 2% of the population will use it. Maybe in five years, more people will need it.
One conflict I run against is that people want simplicity. But to have really powerful interfaces, you have to have some complexity. So how do you introduce complexity in such a way that you don’t keep people out of that? It has to be optional. And it has to be something you run into slowly or can get it intuitively.
If you search for Harry Potter, it’s going to be very hard to find that person. I bet most of the results are for Harry Potter the book. At some point, we’d like people to be able to say “I want the Harry Potter that’s not in the book.”
Q: A lot of this is what you might call sustaining innovations. Do you have ways to encourage more disruptive kinds of things? Or is that not such a good thing to try to do given Google’s leadership position?
A: Yes, we do. We want to be disruptive. Hill climbing makes it easier for you step on top of some hill and think you’re on top of the world. But you’re not. We’re very cognizant of that.
So yes, we do it in several ways. One of them is an annual, sometimes semiannual event where we take groups and ask them to take a week or two weeks and build something weird. They take off work and for two weeks they don’t do anything else and build this.
We did it with user interface things, called it Demo Days. People came and spent a whole week building a demo, a working demo. We met once a day to track progress. Thirty-five teams built amazing things. We try to pick some things to move forward. We did it also for ranking, though we gave them more time. That was actually more teams—300 people, mostly in teams of two.
You may see some poster (around the Google headquarters), I call it CSI—it stood for Crazy Search Ideas. We encouraged people to do things that are actually crazy. Something that’s obvious, and everybody agrees it’s a good thing, it will be rejected.
Q: What kind of things do you mean?
A: It’s things that you might find it hard to be approved because they are too small or they’re too controversial or they’re too left-field, but we want people to do them anyway, because they could be disruptive.
Q: Can you give me an example?
A: One thing launched that was technical, and it had to do with how we do ranking in Chinese. There’s something that should launch soon. [Note: Manber corrected himself later; that particular one, which he wouldn’t describe, is still being explored.] It’s something that probably would not have happened otherwise. There were 118 ideas, and we highlighted four.
In this field, there’s so many things you can do that if you just go in one proven, easy path—which you (also) have to do—that’s not good, you have to do all kinds of things.
Q: Some people, even former Googlers, raise the possibility that Google is too much in a groove in search.
A: There’s definitely a risk. We’re aware of that. My main job is to make sure it doesn’t happen. Sometimes it’s not a groove, it’s a hill. My worry is that we’re stuck on top of a hill, but it’s not the right hill. We’re not in a rut. We’re on some kind of a ridge. But that’s not good enough.
Q: How are you feeling about Microsoft’s Bing search engine and the marketing they’re doing vs. Google?
A: I think competition is great. We can use more competition. It’s good for other people, but it’s also good for us. It triggers more innovation. It’s all good. People like (Bing) for some reason, for good reasons. It gives us more motivation to work harder.