A Library as Big as the World


By Heather Green Brewster Kahle is tackling a big task. And despite some looming clouds, he's pretty darn excited about it. Kahle, a 41-year-old serial entrepreneur, is building the Digital Age's equivalent of the ancient library of Alexandria.

The first installment of his project launched last October. That's when the public could finally use the Internet Archive, a collection of 10 billion pages, including Internet sites, movies, and Usenet postings five times larger than the amount of information at the Library of Congress.

PRESERVING THE PAST. Kahle began building the digital library in 1996, using copies of the entire Web collected by Alexa Internet, a Web software company he founded that gathers information on sites and tracks Net usage. Today, a single copy of everything that's on the Net -- equal to 15,000 copies of Encyclopedia Britannica -- is added to the archive every two months. The National Science Foundation, Library of Congress, Markle Foundation, Compaq, and Alexa all donate money, software, and equipment to keep the Internet Archive up and running.

While skeptics say "Why bother?" Kahle insists that the Net is key to preserving the past. During the opening presentation last fall, he brought up a copy of the Whitehouse.gov Web site from Sept. 10, 1996. On the site was an announcement about measures the Clinton Administration planned to take to prevent terrorist attacks on airlines. "The Net is the No. 1 resource for people," Kahle says. "This is how students learn, it's how business is done. If we don't have a memory, we're living in an Orwellian world of our own making."

Now, Kahle has bigger plans to make that resource more helpful. This year, his nonprofit company will develop research tools that will make it easier to use the archive to answer involved questions, such as "What were the lessons of Enron?" or "What's going on in the Italian biotech industry?" Kahle also wants to expand its content to include digital copies of radio programs, TV shows, and copies of books that are in the public domain.

THE BIG OBSTACLE. That goal is landing him squarely in the middle of one of the digital era's biggest debates: What to do about copyright protection. Lawrence Lessig, a Stanford lawyer who's a critic of copyright laws, calls Kahle his "hero." In his recent book, The Future of Ideas, Lessig outlines his view that publishing companies' moves are endangering innovation and public access to creative works. Lessig is also the lead lawyer in a landmark case, Eldred vs. Ashcroft, that contests a 1998 law extending existing and future term limits on copyrighted works for 20 more years. The Supreme Court will hear the case in the fall.

Kahle's goal to create a huge digital library is shedding light on just how restrictions on the universal access to published works are growing, says Lessig. "He has the technology, he has the money, and he has the business plan," Lessig says. "All he needs is the permission of the lawyers, and he won't get it."

Indeed, the Internet Archive submitted an amicus brief about the Eldred case to the Supreme Court. The archive's lawyers explain in the brief how the site would like to publish the significant number of works that are in the public domain but out of print. For example, of the 10,027 books published in 1930, only 174 are still in print. Yet, while Kahle would like to publish them, the 1998 law means they now won't be available to the public until 2005.

COPY KILLERS. In an attempt to thwart potential piracy, music, book, and movie publishers want to limit consumers' rights to copy content they have purchased. The publishing industry, worried that illegal copying will undercut its business model, is backing laws that would make it more difficult to use or duplicate sections of digital works, even if the use is generally accepted, scholarly, and educational. This year music labels began selling copyright-protected CDs that can't be played on computers or used to make copies.

Already, the American Library Assn. is struggling with how libraries can maintain their traditional role if they have to pay for each use or access of digital content. "You see it slowing down the activities of the libraries," says Kahle, "If the Net is the information resource of all people, we had better figure out as a society how we can help keep it open."

Kahle seems to have tried every way possible to turn technology into a helpful research assistant. His fascination with digital libraries started in the late 1970s, when he studied artificial intelligence as a Massachusetts Institute of Technology undergrad. The spread of the Internet in academic communities like MIT provided a glimpse of what it would be like if people could have automatic access to any information they needed.

"A GOOD THING." He had unalloyed faith that technology could tackle this task -- a belief he picked up from his father, who was a mechanical engineer. "Way back when I tried to figure out what good things you could do with technology, building a big library that anyone in the world could use seemed like a good thing," Kahle says.

Kahle's first attempt was in using early-'80s technology: supercomputers and artificial intelligence. In 1983, Kahle helped start Thinking Machines under his MIT mentor, Danny Hillis. Hillis pioneered a new technique for building supercomputers using parallel processing to harnesses the power of many microprocessors to solve complex problems.

For Kahle, what was interesting in parallel processing was the promise of getting computers to sift through words. The problem, though, with being a pioneer in this field was that computers were expensive and difficult to program. It took years to figure out how to get the processors to work together and quickly. Kahle helped build software that the supercomputer used to search and find patterns in Dow Jones's archives of 500 newspapers and magazines. When that project was finally launched, the response wasn't what he had hoped for. "I thought the sun was going to come up a different color," he says. "It was a pretty good service, but it didn't have the impact we thought it would have. The question then was: 'Why not?'"

KNOWLEDGE STOREHOUSE. The answer, the Thinking Machines team figured, was that people needed to be able to use smaller servers over the Net along with the supercomputer, which functions as a master search engine, to get at information stored on different machines. To make that possible, Kahle came up with WAIS, or wide-area information server technology, which connected a supercomputer with information on corporate and wide-area networks.

With WAIS, he realized there was a way to get more publishers online, creating a storehouse of knowledge. So, in 1992, he founded WAIS Inc., an electronic publishing company that provided services and tools that allowed publicatons including the Wall Street Journal, New York Times, and Encyclopedia Britannica to publish on the Internet. Kahle sold WAIS Inc. in 1995 to America Online for $15 million.

Kahle's "ah-ha" moment came a little before he sold WAIS. In late 1994, Digital Equipment's research lab was working on a new search engine called AltaVista that could store every word on every page of the Net, making it possible to search the entire Web. Until then, the pioneering search engines like Yahoo! and Excite were directories of selected amounts of information online. "It's a mind-changing idea that was inspiring," says Kahle. "That was the same concept of the library of Alexandria, which had a charter to get a copy of all the books in the world."

OPEN TO ALL. When he started Alexa Internet in 1995, he also founded the Internet Archive. He included a contract in Alexa's business plan requiring the company to contribute copies of everything it collected to the Internet Archive. For five years, he quietly collected copies of the Internet, spending millions of dollars to chronicle the debates of the time period, and the birth, evolution, and death of sites. Alexa was bought by Amazon.com in April, 1999, for an undisclosed sum.

Last year, Kahle finally began devoting more time to making the Internet Archive bloom. Archive employees developed a program called the Wayback Machine, fondly named after the time traveling WABAC Machine developed by the erudite canine Mr. Peabody on the Rocky & Bullwinkle cartoon. The program organizes the billions of pages and allows anyone online to look up the contents of the archive. Now, Kahle is hoping that he can surpass the goal of the ancient library of Alexandria by not only collecting human knowledge -- but making it universally available. Green covers e-commerce for BusinessWeek in New York


Best LBO Ever
LIMITED-TIME OFFER SUBSCRIBE NOW
 
blog comments powered by Disqus