The key to Google’s efficiency was buying low-quality equipment dirt cheap and applying brainpower to work around the inevitably high failure rate. It was an outgrowth of Google’s earliest days, when Page and Brin had built a server housed by Lego blocks. “Larry and Sergey proposed that we design and build our own servers as cheaply as we can—massive numbers of servers connected to a high-speed network,” says Reese. The conventional wisdom was that an equipment failure should be regarded as, well, a failure. Generally the server failure rate was between 4 and 10 percent. To keep the failures at the lower end of the range, technology companies paid for high-end equipment from Sun Microsystems or EMC. “Our idea was completely opposite,” says Reese. “We’re going to build hundreds and thousands of cheap servers knowing from the get-go that a certain percentage, maybe 10 percent, are going to fail,” says Reese. Google’s first CIO, Douglas Merrill, once noted that the disk drives Google purchased were “poorer quality than you would put into your kid’s computer at home.”
But Google designed around the flaws. “We built capabilities into the software, the hardware, and the network—the way we hook them up, the load balancing, and so on—to build in redundancy, to make the system fault-tolerant,” says Reese. The Google File System, written by Jeff Dean and Sanjay Ghemawat, was invaluable in this process: it was designed to manage failure by “sharding” data, distributing it to multiple servers. If Google search called for certain information at one server and didn’t get a reply after a couple of milliseconds, there were two other Google servers that could fulfill the request.
“The Google business model was constrained by cost, especially at the very beginning,” says Erik Teetzel, who worked with Google’s data centers. “Every time we would serve a query it cost us money, and generating ad money didn’t happen until later, so Larry and Sergey and Urs set out to build the cheapest infrastructure they could. They didn’t buy the prescribed notion that you must buy your servers from HP and couple it with a Cisco router and software from Linux or Windows. They looked at it holistically, to have control from soup to nuts. That set the stage for this holistic picture where we could do very
efficient computing.”
In The Plex by Steven Levy p.183