library

Umberto Eco on books

From Umberto Eco’s “Vegetal and mineral memory: The future of books” (Al-Ahram Weekly: 20—26 November 2003):

Libraries, over the centuries, have been the most important way of keeping our collective wisdom. They were and still are a sort of universal brain where we can retrieve what we have forgotten and what we still do not know. If you will allow me to use such a metaphor, a library is the best possible imitation, by human beings, of a divine mind, where the whole universe is viewed and understood at the same time. A person able to store in his or her mind the information provided by a great library would emulate in some way the mind of God. In other words, we have invented libraries because we know that we do not have divine powers, but we try to do our best to imitate them. …

First of all, we know that books are not ways of making somebody else think in our place; on the contrary, they are machines that provoke further thoughts. Only after the invention of writing was it possible to write such a masterpiece of spontaneous memory as Proust’s A la Recherche du Temps Perdu. Secondly, if once upon a time people needed to train their memories in order to remember things, after the invention of writing they had also to train their memories in order to remember books. Books challenge and improve memory; they do not narcotise it. …

YET IT IS EXACTLY AT THIS POINT that our unravelling activity must start because by hypertextual structure we usually mean two very different phenomena. First, there is the textual hypertext. In a traditional book one must read from left to right (or right to left, or up to down, according to different cultures) in a linear way. One can obviously skip through the pages, one—once arrived at page 300—can go back to check or re- read something at page 10—but this implies physical labour. In contrast to this, a hypertextual text is a multidimensional network or a maze in which every point or node can be potentially connected with any other node. Second, there is the systemic hypertext. The WWW is the Great Mother of All Hypertexts, a world-wide library where you can, or you will in short time, pick up all the books you wish. The Web is the general system of all existing hypertexts. …

Simply, books have proved to be the most suitable instrument for transmitting information. There are two sorts of book: those to be read and those to be consulted. As far as books-to-be-read are concerned, the normal way of reading them is the one that I would call the ‘detective story way’. You start from page one, where the author tells you that a crime has been committed, you follow every path of the detection process until the end, and finally you discover that the guilty one was the butler. End of the book and end of your reading experience. …

Then they are books to be consulted, like handbooks and encyclopaedias. Encyclopaedias are conceived in order to be consulted and never read from the first to the last page. …

Hypertexts will certainly render encyclopaedias and handbooks obsolete. Yesterday, it was possible to have a whole encyclopaedia on a CD-ROM; today, it is possible to have it on line with the advantage that this permits cross references and the non-linear retrieval of information. …

Books belong to those kinds of instruments that, once invented, have not been further improved because they are already alright, such as the hammer, the knife, spoon or scissors. …

TWO NEW INVENTIONS, however, are on the verge of being industrially exploited. One is printing on demand: after scanning the catalogues of many libraries or publishing houses a reader can select the book he needs, and the operator will push a button, and the machine will print and bind a single copy using the font the reader likes. … Simply put: every book will be tailored according to the desires of the buyer, as happened with old manuscripts.

The second invention is the e-book where by inserting a micro- cassette in the book’s spine or by connecting it to the internet one can have a book printed out in front of us. Even in this case, however, we shall still have a book, though as different from our current ones as ours are different from old manuscripts on parchment, and as the first Shakespeare folio of 1623 is different from the last Penguin edition. Yet, up to now e-books have not proved to be commercially successful as their inventors hoped. … E-books will probably prove to be useful for consulting information, as happens with dictionaries or special documents. …

Indeed, there are a lot of new technological devices that have not made previous ones obsolete. Cars run faster than bicycles, but they have not rendered bicycles obsolete, and no new technological improvements can make a bicycle better than it was before. The idea that a new technology abolishes a previous one is frequently too simplistic. Though after the invention of photography painters did not feel obliged to serve any longer as craftsmen reproducing reality, this did not mean that Daguerre’s invention only encouraged abstract painting. There is a whole tradition in modern painting that could not have existed without photographic models: think, for instance, of hyper-realism. Here, reality is seen by the painter’s eye through the photographic eye. This means that in the history of culture it has never been the case that something has simply killed something else. Rather, a new invention has always profoundly changed an older one. …

The computer creates new modes of production and diffusion of printed documents. …

Today there are new hypertextual poetics according to which even a book-to-read, even a poem, can be transformed to hypertext. At this point we are shifting to question two, since the problem is no longer, or not only, a physical one, but rather one that concerns the very nature of creative activity, of the reading process, and in order to unravel this skein of questions we have first of all to decide what we mean by a hypertextual link. …

In order to understand how texts of this genre can work we should decide whether the textual universe we are discussing is limited and finite, limited but virtually infinite, infinite but limited, or unlimited and infinite.

First of all, we should make a distinction between systems and texts. A system, for instance a linguistic system, is the whole of the possibilities displayed by a given natural language. A finite set of grammatical rules allows the speaker to produce an infinite number of sentences, and every linguistic item can be interpreted in terms of other linguistic or other semiotic items—a word by a definition, an event by an example, an animal or a flower by an image, and so on and so forth. …

Grammars, dictionaries and encyclopaedias are systems: by using them you can produce all the texts you like. But a text itself is not a linguistic or an encyclopaedic system. A given text reduces the infinite or indefinite possibilities of a system to make up a closed universe. If I utter the sentence, ‘This morning I had for breakfast…’, for example, the dictionary allows me to list many possible items, provided they are all organic. But if I definitely produce my text and utter, ‘This morning I had for breakfast bread and butter’, then I have excluded cheese, caviar, pastrami and apples. A text castrates the infinite possibilities of a system. …

Take a fairy tale, like Little Red Riding Hood. The text starts from a given set of characters and situations—a little girl, a mother, a grandmother, a wolf, a wood—and through a series of finite steps arrives at a solution. Certainly, you can read the fairy tale as an allegory and attribute different moral meanings to the events and to the actions of the characters, but you cannot transform Little Red Riding Hood into Cinderella. … This seems trivial, but the radical mistake of many deconstructionists was to believe that you can do anything you want with a text. This is blatantly false. …

Now suppose that a finite and limited text is organised hypertextually by many links connecting given words with other words. In a dictionary or an encyclopaedia the word wolf is potentially connected to every other word that makes up part of its possible definition or description (wolf is connected to animal, to mammal to ferocious, to legs, to fur, to eyes, to woods, to the names of the countries in which wolves exist, etc.). In Little Red Riding Hood, the wolf can be connected only with the textual sections in which it shows up or in which it is explicitly evoked. The series of possible links is finite and limited. How can hypertextual strategies be used to ‘open’ up a finite and limited text?

The first possibility is to make the text physically unlimited, in the sense that a story can be enriched by the successive contributions of different authors and in a double sense, let us say either two-dimensionally or three-dimensionally. By this I mean that given, for instance, Little Red Riding Hood, the first author proposes a starting situation (the girl enters the wood) and different contributors can then develop the story one after the other, for example, by having the girl meet not the wolf but Ali Baba, by having both enter an enchanted castle, having a confrontation with a magic crocodile, and so on, so that the story can continue for years. But the text can also be infinite in the sense that at every narrative disjunction, for instance, when the girl enters the wood, many authors can make many different choices. For one author, the girl may meet Pinocchio, for another she may be transformed into a swan, or enter the Pyramids and discover the treasury of the son of Tutankhamen. …

AT THIS POINT one can raise a question about the survival of the very notion of authorship and of the work of art, as an organic whole. And I want simply to inform my audience that this has already happened in the past without disturbing either authorship or organic wholes. The first example is that of the Italian Commedia dell’arte, in which upon a canovaccio, that is, a summary of the basic story, every performance, depending on the mood and fantasy of the actors, was different from every other so that we cannot identify any single work by a single author called Arlecchino servo di due padroni and can only record an uninterrupted series of performances, most of them definitely lost and all certainly different one from another.

Another example would be a jazz jam session. … What I want to say is that we are already accustomed to the idea of the absence of authorship in popular collective art in which every participant adds something, with experiences of jazz-like unending stories. …

A hypertext can give the illusion of opening up even a closed text: a detective story can be structured in such a way that its readers can select their own solution, deciding at the end if the guilty one should be the butler, the bishop, the detective, the narrator, the author or the reader. They can thus build up their own personal story. Such an idea is not a new one. Before the invention of computers, poets and narrators dreamt of a totally open text that readers could infinitely re-compose in different ways. Such was the idea of Le Livre, as extolled by Mallarmé. Raymond Queneau also invented a combinatorial algorithm by virtue of which it was possible to compose, from a finite set of lines, millions of poems. In the early sixties, Max Saporta wrote and published a novel whose pages could be displaced to compose different stories, and Nanni Balestrini gave a computer a disconnected list of verses that the machine combined in different ways to compose different poems. …

All these physically moveable texts give an impression of absolute freedom on the part of the reader, but this is only an impression, an illusion of freedom. The machinery that allows one to produce an infinite text with a finite number of elements has existed for millennia, and this is the alphabet. Using an alphabet with a limited number of letters one can produce billions of texts, and this is exactly what has been done from Homer to the present days. In contrast, a stimulus-text that provides us not with letters, or words, but with pre-established sequences of words, or of pages, does not set us free to invent anything we want. …

At the last borderline of free textuality there can be a text that starts as a closed one, let us say, Little Red Riding Hood or The Arabian Nights, and that I, the reader, can modify according to my inclinations, thus elaborating a second text, which is no longer the same as the original one, whose author is myself, even though the affirmation of my authorship is a weapon against the concept of definite authorship. …

A BOOK OFFERS US A TEXT which, while being open to multiple interpretations, tells us something that cannot be modified. … Alas, with an already written book, whose fate is determined by repressive, authorial decision, we cannot do this. We are obliged to accept fate and to realise that we are unable to change destiny. A hypertextual and interactive novel allows us to practice freedom and creativity, and I hope that such inventive activity will be implemented in the schools of the future. But the already and definitely written novel War and Peace does not confront us with the unlimited possibilities of our imagination, but with the severe laws governing life and death. …

Umberto Eco on books Read More »

Religion, God, history, morality

From Steve Paulson’s interview with Robert Wright, “God, He’s moody” (Salon: 24 June 2009):

Do you think religions share certain core principles?

Not many. People in the modern world, certainly in America, think of religion as being largely about prescribing moral behavior. But religion wasn’t originally about that at all. To judge by hunter-gatherer religions, religion was not fundamentally about morality before the invention of agriculture. It was trying to figure out why bad things happen and increasing the frequency with which good things happen. Why do you sometimes get earthquakes, storms, disease and get slaughtered? But then sometimes you get nice weather, abundant game and you get to do the slaughtering. Those were the religious questions in the beginning.

And bad things happened because the gods were against you or certain spirits had it out for you?

Yes, you had done something to offend a god or spirit. However, it was not originally a moral lapse. That’s an idea you see as societies get more complex. When you have a small group of hunter-gatherers, a robust moral system is not a big challenge. Everyone knows everybody, so it’s hard to conceal anything you steal. If you mess with somebody too much, there will be payback. Moral regulation is not a big problem in a simple society. But as society got more complex with the invention of agriculture and writing, morality did become a challenge. Religion filled that gap.

For people who claim that Israel was monotheistic from the get-go and its flirtations with polytheism were rare aberrations, it’s interesting that the Jerusalem temple, according to the Bible’s account, had all these other gods being worshiped in it. Asherah was in the temple. She seemed to be a consort or wife of Yahweh. And there were vessels devoted to Baal, the reviled Canaanite god. So Israel was fundamentally polytheistic at this point. Then King Josiah goes on a rampage as he tries to consolidate his own power by wiping out the other gods.

You make the point that the Quran is a different kind of sacred text than the Bible. It was probably written over the course of two decades, while the stories collected in the Bible were written over centuries. That’s why the Bible is such a diverse document.

We think of the Bible as a book, but in ancient times it would have been thought of as a library. There were books written by lots of different people, including a lot of cosmopolitan elites. You also see elements of Greek philosophy. The Quran is just one guy talking. In the Muslim view, he’s mediating the word of God. He’s not especially cosmopolitan. He is, according to Islamic tradition, illiterate. So it’s not surprising that the Quran didn’t have the intellectual diversity and, in some cases, the philosophical depth that you find in the Bible. I do think he was actually a very modern thinker. Muhammad’s argument for why you should be devoted exclusively to this one God is very modern.

Are you also saying we can be religious without believing in God?

By some definitions, yes. It’s hard to find a definition of religion that encompasses everything we call religion. The definition I like comes from William James. He said, “Religious belief consists of the belief that there is an unseen order and that our supreme good lies in harmoniously adjusting to that order.” In that sense, you can be religious without believing in God. In that sense, I’m religious. On the God question, I’m not sure.

Religion, God, history, morality Read More »

What Google’s book settlement means

Google Book Search
Image via Wikipedia

From Robert Darnton’s “Google & the Future of Books” (The New York Review of Books: 12 February 2009):

As the Enlightenment faded in the early nineteenth century, professionalization set in. You can follow the process by comparing the Encyclopédie of Diderot, which organized knowledge into an organic whole dominated by the faculty of reason, with its successor from the end of the eighteenth century, the Encyclopédie méthodique, which divided knowledge into fields that we can recognize today: chemistry, physics, history, mathematics, and the rest. In the nineteenth century, those fields turned into professions, certified by Ph.D.s and guarded by professional associations. They metamorphosed into departments of universities, and by the twentieth century they had left their mark on campuses…

Along the way, professional journals sprouted throughout the fields, subfields, and sub-subfields. The learned societies produced them, and the libraries bought them. This system worked well for about a hundred years. Then commercial publishers discovered that they could make a fortune by selling subscriptions to the journals. Once a university library subscribed, the students and professors came to expect an uninterrupted flow of issues. The price could be ratcheted up without causing cancellations, because the libraries paid for the subscriptions and the professors did not. Best of all, the professors provided free or nearly free labor. They wrote the articles, refereed submissions, and served on editorial boards, partly to spread knowledge in the Enlightenment fashion, but mainly to advance their own careers.

The result stands out on the acquisitions budget of every research library: the Journal of Comparative Neurology now costs $25,910 for a year’s subscription; Tetrahedron costs $17,969 (or $39,739, if bundled with related publications as a Tetrahedron package); the average price of a chemistry journal is $3,490; and the ripple effects have damaged intellectual life throughout the world of learning. Owing to the skyrocketing cost of serials, libraries that used to spend 50 percent of their acquisitions budget on monographs now spend 25 percent or less. University presses, which depend on sales to libraries, cannot cover their costs by publishing monographs. And young scholars who depend on publishing to advance their careers are now in danger of perishing.

The eighteenth-century Republic of Letters had been transformed into a professional Republic of Learning, and it is now open to amateurs—amateurs in the best sense of the word, lovers of learning among the general citizenry. Openness is operating everywhere, thanks to “open access” repositories of digitized articles available free of charge, the Open Content Alliance, the Open Knowledge Commons, OpenCourseWare, the Internet Archive, and openly amateur enterprises like Wikipedia. The democratization of knowledge now seems to be at our fingertips. We can make the Enlightenment ideal come to life in reality.

What provoked these jeremianic- utopian reflections? Google. Four years ago, Google began digitizing books from research libraries, providing full-text searching and making books in the public domain available on the Internet at no cost to the viewer. For example, it is now possible for anyone, anywhere to view and download a digital copy of the 1871 first edition of Middlemarch that is in the collection of the Bodleian Library at Oxford. Everyone profited, including Google, which collected revenue from some discreet advertising attached to the service, Google Book Search. Google also digitized an ever-increasing number of library books that were protected by copyright in order to provide search services that displayed small snippets of the text. In September and October 2005, a group of authors and publishers brought a class action suit against Google, alleging violation of copyright. Last October 28, after lengthy negotiations, the opposing parties announced agreement on a settlement, which is subject to approval by the US District Court for the Southern District of New York.[2]

The settlement creates an enterprise known as the Book Rights Registry to represent the interests of the copyright holders. Google will sell access to a gigantic data bank composed primarily of copyrighted, out-of-print books digitized from the research libraries. Colleges, universities, and other organizations will be able to subscribe by paying for an “institutional license” providing access to the data bank. A “public access license” will make this material available to public libraries, where Google will provide free viewing of the digitized books on one computer terminal. And individuals also will be able to access and print out digitized versions of the books by purchasing a “consumer license” from Google, which will cooperate with the registry for the distribution of all the revenue to copyright holders. Google will retain 37 percent, and the registry will distribute 63 percent among the rightsholders.

Meanwhile, Google will continue to make books in the public domain available for users to read, download, and print, free of charge. Of the seven million books that Google reportedly had digitized by November 2008, one million are works in the public domain; one million are in copyright and in print; and five million are in copyright but out of print. It is this last category that will furnish the bulk of the books to be made available through the institutional license.

Many of the in-copyright and in-print books will not be available in the data bank unless the copyright owners opt to include them. They will continue to be sold in the normal fashion as printed books and also could be marketed to individual customers as digitized copies, accessible through the consumer license for downloading and reading, perhaps eventually on e-book readers such as Amazon’s Kindle.

After reading the settlement and letting its terms sink in—no easy task, as it runs to 134 pages and 15 appendices of legalese—one is likely to be dumbfounded: here is a proposal that could result in the world’s largest library. It would, to be sure, be a digital library, but it could dwarf the Library of Congress and all the national libraries of Europe. Moreover, in pursuing the terms of the settlement with the authors and publishers, Google could also become the world’s largest book business—not a chain of stores but an electronic supply service that could out-Amazon Amazon.

An enterprise on such a scale is bound to elicit reactions of the two kinds that I have been discussing: on the one hand, utopian enthusiasm; on the other, jeremiads about the danger of concentrating power to control access to information.

Google is not a guild, and it did not set out to create a monopoly. On the contrary, it has pursued a laudable goal: promoting access to information. But the class action character of the settlement makes Google invulnerable to competition. Most book authors and publishers who own US copyrights are automatically covered by the settlement. They can opt out of it; but whatever they do, no new digitizing enterprise can get off the ground without winning their assent one by one, a practical impossibility, or without becoming mired down in another class action suit. If approved by the court—a process that could take as much as two years—the settlement will give Google control over the digitizing of virtually all books covered by copyright in the United States.

Google alone has the wealth to digitize on a massive scale. And having settled with the authors and publishers, it can exploit its financial power from within a protective legal barrier; for the class action suit covers the entire class of authors and publishers. No new entrepreneurs will be able to digitize books within that fenced-off territory, even if they could afford it, because they would have to fight the copyright battles all over again. If the settlement is upheld by the court, only Google will be protected from copyright liability.

Google’s record suggests that it will not abuse its double-barreled fiscal-legal power. But what will happen if its current leaders sell the company or retire? The public will discover the answer from the prices that the future Google charges, especially the price of the institutional subscription licenses. The settlement leaves Google free to negotiate deals with each of its clients, although it announces two guiding principles: “(1) the realization of revenue at market rates for each Book and license on behalf of the Rightsholders and (2) the realization of broad access to the Books by the public, including institutions of higher education.”

What will happen if Google favors profitability over access? Nothing, if I read the terms of the settlement correctly. Only the registry, acting for the copyright holders, has the power to force a change in the subscription prices charged by Google, and there is no reason to expect the registry to object if the prices are too high. Google may choose to be generous in it pricing, and I have reason to hope it may do so; but it could also employ a strategy comparable to the one that proved to be so effective in pushing up the price of scholarly journals: first, entice subscribers with low initial rates, and then, once they are hooked, ratchet up the rates as high as the traffic will bear.

What Google’s book settlement means Read More »

My new book – Google Apps Deciphered – is out!

I’m really proud to announce that my 5th book is now out & available for purchase: Google Apps Deciphered: Compute in the Cloud to Streamline Your Desktop. My other books include:

(I’ve also contributed to two others: Ubuntu Hacks: Tips & Tools for Exploring, Using, and Tuning Linux and Microsoft Vista for IT Security Professionals.)

Google Apps Deciphered is a guide to setting up Google Apps, migrating to it, customizing it, and using it to improve productivity, communications, and collaboration. I walk you through each leading component of Google Apps individually, and then show my readers exactly how to make them work together for you on the Web or by integrating them with your favorite desktop apps. I provide practical insights on Google Apps programs for email, calendaring, contacts, wikis, word processing, spreadsheets, presentations, video, and even Google’s new web browser Chrome. My aim was to collect together and present tips and tricks I’ve gained by using and setting up Google Apps for clients, family, and friends.

Here’s the table of contents:

  • 1: Choosing an Edition of Google Apps
  • 2: Setting Up Google Apps
  • 3: Migrating Email to Google Apps
  • 4: Migrating Contacts to Google Apps
  • 5: Migrating Calendars to Google Apps
  • 6: Managing Google Apps Services
  • 7: Setting Up Gmail
  • 8: Things to Know About Using Gmail
  • 9: Integrating Gmail with Other Software and Services
  • 10: Integrating Google Contacts with Other Software and Services
  • 11: Setting Up Google Calendar
  • 12: Things to Know About Using Google Calendar
  • 13: Integrating Google Calendar with Other Software and Services
  • 14: Things to Know About Using Google Docs
  • 15: Integrating Google Docs with Other Software and Services
  • 16: Setting Up Google Sites
  • 17: Things to Know About Using Google Sites
  • 18: Things to Know About Using Google Talk
  • 19: Things to Know About Using Start Page
  • 20: Things to Know About Using Message Security and Recovery
  • 21: Things to Know About Using Google Video
  • Appendix A: Backing Up Google Apps
  • Appendix B: Dealing with Multiple Accounts
  • Appendix C: Google Chrome: A Browser Built for Cloud Computing

If you want to know more about Google Apps and how to use it, then I know you’ll enjoy and learn from Google Apps Deciphered. You can read about and buy the book at Amazon (http://www.amazon.com/Google-Apps-Deciphered-Compute-Streamline/dp/0137004702) for $26.39. If you have any questions or comments, don’t hesitate to contact me at scott at granneman dot com.

My new book – Google Apps Deciphered – is out! Read More »

An analysis of Google’s technology, 2005

From Stephen E. Arnold’s The Google Legacy: How Google’s Internet Search is Transforming Application Software (Infonortics: September 2005):

The figure Google’s Fusion: Hardware and Software Engineering shows that Google’s technology framework has two areas of activity. There is the software engineering effort that focuses on PageRank and other applications. Software engineering, as used here, means writing code and thinking about how computer systems operate in order to get work done quickly. Quickly means the sub one-second response times that Google is able to maintain despite its surging growth in usage, applications and data processing.

Google is hardware plus software

The other effort focuses on hardware. Google has refined server racks, cable placement, cooling devices, and data center layout. The payoff is lower operating costs and the ability to scale as demand for computing resources increases. With faster turnaround and the elimination of such troublesome jobs as backing up data, Google’s hardware innovations give it a competitive advantage few of its rivals can equal as of mid-2005.

How Google Is Different from MSN and Yahoo

Google’s technologyis simultaneously just like other online companies’ technology, and very different. A data center is usually a facility owned and operated by a third party where customers place their servers. The staff of the data center manage the power, air conditioning and routine maintenance. The customer specifies the computers and components. When a data center must expand, the staff of the facility may handle virtually all routine chores and may work with the customer’s engineers for certain more specialized tasks.

Before looking at some significant engineering differences between Google and two of its major competitors, review this list of characteristics for a Google data center.

1. Google data centers – now numbering about two dozen, although no one outside Google knows the exact number or their locations. They come online and automatically, under the direction of the Google File System, start getting work from other data centers. These facilities, sometimes filled with 10,000 or more Google computers, find one another and configure themselves with minimal human intervention.

2. The hardware in a Google data center can be bought at a local computer store. Google uses the same types of memory, disc drives, fans and power supplies as those in a standard desktop PC.

3. Each Google server comes in a standard case called a pizza box with one important change: the plugs and ports are at the front of the box to make access faster and easier.

4. Google racks are assembled for Google to hold servers on their front and back sides. This effectively allows a standard rack, normally holding 40 pizza box servers, to hold 80.

5. A Google data center can go from a stack of parts to online operation in as little as 72 hours, unlike more typical data centers that can require a week or even a month to get additional resources online.

6. Each server, rack and data center works in a way that is similar to what is called “plug and play.” Like a mouse plugged into the USB port on a laptop, Google’s network of data centers knows when more resources have been connected. These resources, for the most part, go into operation without human intervention.

Several of these factors are dependent on software. This overlap between the hardware and software competencies at Google, as previously noted, illustrates the symbiotic relationship between these two different engineering approaches. At Google, from its inception, Google software and Google hardware have been tightly coupled. Google is not a software company nor is it a hardware company. Google is, like IBM, a company that owes its existence to both hardware and software. Unlike IBM, Google has a business model that is advertiser supported. Technically, Google is conceptually closer to IBM (at one time a hardware and software company) than it is to Microsoft (primarily a software company) or Yahoo! (an integrator of multiple softwares).

Software and hardware engineering cannot be easily segregated at Google. At MSN and Yahoo hardware and software are more loosely-coupled. Two examples will illustrate these differences.

Microsoft – with some minor excursions into the Xbox game machine and peripherals – develops operating systems and traditional applications. Microsoft has multiple operating systems, and its engineers are hard at work on the company’s next-generation of operating systems.

Several observations are warranted:

1. Unlike Google, Microsoft does not focus on performance as an end in itself. As a result, Microsoft gets performance the way most computer users do. Microsoft buys or upgrades machines. Microsoft does not fiddle with its operating systems and their subfunctions to get that extra time slice or two out of the hardware.

2. Unlike Google, Microsoft has to support many operating systems and invest time and energy in making certain that important legacy applications such as Microsoft Office or SQLServer can run on these new operating systems. Microsoft has a boat anchor tied to its engineer’s ankles. The boat anchor is the need to ensure that legacy code works in Microsoft’s latest and greatest operating systems.

3. Unlike Google, Microsoft has no significant track record in designing and building hardware for distributed, massively parallelised computing. The mice and keyboards were a success. Microsoft has continued to lose money on the Xbox, and the sudden demise of Microsoft’s entry into the home network hardware market provides more evidence that Microsoft does not have a hardware competency equal to Google’s.

Yahoo! operates differently from both Google and Microsoft. Yahoo! is in mid-2005 a direct competitor to Google for advertising dollars. Yahoo! has grown through acquisitions. In search, for example, Yahoo acquired 3721.com to handle Chinese language search and retrieval. Yahoo bought Inktomi to provide Web search. Yahoo bought Stata Labs in order to provide users with search and retrieval of their Yahoo! mail. Yahoo! also owns AllTheWeb.com, a Web search site created by FAST Search & Transfer. Yahoo! owns the Overture search technology used by advertisers to locate key words to bid on. Yahoo! owns Alta Vista, the Web search system developed by Digital Equipment Corp. Yahoo! licenses InQuira search for customer support functions. Yahoo has a jumble of search technology; Google has one search technology.

Historically Yahoo has acquired technology companies and allowed each company to operate its technology in a silo. Integration of these different technologies is a time-consuming, expensive activity for Yahoo. Each of these software applications requires servers and systems particular to each technology. The result is that Yahoo has a mosaic of operating systems, hardware and systems. Yahoo!’s problem is different from Microsoft’s legacy boat-anchor problem. Yahoo! faces a Balkan-states problem.

There are many voices, many needs, and many opposing interests. Yahoo! must invest in management resources to keep the peace. Yahoo! does not have a core competency in hardware engineering for performance and consistency. Yahoo! may well have considerable competency in supporting a crazy-quilt of hardware and operating systems, however. Yahoo! is not a software engineering company. Its engineers make functions from disparate systems available via a portal.

The figure below provides an overview of the mid-2005 technical orientation of Google, Microsoft and Yahoo.

2005 focuses of Google, MSN, and Yahoo

The Technology Precepts

… five precepts thread through Google’s technical papers and presentations. The following snapshots are extreme simplifications of complex, yet extremely fundamental, aspects of the Googleplex.

Cheap Hardware and Smart Software

Google approaches the problem of reducing the costs of hardware, set up, burn-in and maintenance pragmatically. A large number of cheap devices using off-the-shelf commodity controllers, cables and memory reduces costs. But cheap hardware fails.

In order to minimize the “cost” of failure, Google conceived of smart software that would perform whatever tasks were needed when hardware devices fail. A single device or an entire rack of devices could crash, and the overall system would not fail. More important, when such a crash occurs, no full-time systems engineering team has to perform technical triage at 3 a.m.

The focus on low-cost, commodity hardware and smart software is part of the Google culture.

Logical Architecture

Google’s technical papers do not describe the architecture of the Googleplex as self-similar. Google’s technical papers provide tantalizing glimpses of an approach to online systems that makes a single server share features and functions of a cluster of servers, a complete data center, and a group of Google’s data centers.

The collections of servers running Google applications on the Google version of Linux is a supercomputer. The Googleplex can perform mundane computing chores like taking a user’s query and matching it to documents Google has indexed. Further more, the Googleplex can perform side calculations needed to embed ads in the results pages shown to user, execute parallelized, high-speed data transfers like computers running state-of-the-art storage devices, and handle necessary housekeeping chores for usage tracking and billing.

When Google needs to add processing capacity or additional storage, Google’s engineers plug in the needed resources. Due to self-similarity, the Googleplex can recognize, configure and use the new resource. Google has an almost unlimited flexibility with regard to scaling and accessing the capabilities of the Googleplex.

In Google’s self-similar architecture, the loss of an individual device is irrelevant. In fact, a rack or a data center can fail without data loss or taking the Googleplex down. The Google operating system ensures that each file is written three to six times to different storage devices. When a copy of that file is not available, the Googleplex consults a log for the location of the copies of the needed file. The application then uses that replica of the needed file and continues with the job’s processing.

Speed and Then More Speed

Google uses commodity pizza box servers organized in a cluster. A cluster is group of computers that are joined together to create a more robust system. Instead of using exotic servers with eight or more processors, Google generally uses servers that have two processors similar to those found in a typical home computer.

Through proprietary changes to Linux and other engineering innovations, Google is able to achieve supercomputer performance from components that are cheap and widely available.

… engineers familiar with Google believe that read rates may in some clusters approach 2,000 megabytes a second. When commodity hardware gets better, Google runs faster without paying a premium for that performance gain.

Another key notion of speed at Google concerns writing computer programs to deploy to Google users. Google has developed short cuts to programming. An example is Google’s creating a library of canned functions to make it easy for a programmer to optimize a program to run on the Googleplex computer. At Microsoft or Yahoo, a programmer must write some code or fiddle with code to get different pieces of a program to execute simultaneously using multiple processors. Not at Google. A programmer writes a program, uses a function from a Google bundle of canned routines, and lets the Googleplex handle the details. Google’s programmers are freed from much of the tedium associated with writing software for a distributed, parallel computer.

Eliminate or Reduce Certain System Expenses

Some lucky investors jumped on the Google bandwagon early. Nevertheless, Google was frugal, partly by necessity and partly by design. The focus on frugality influenced many hardware and software engineering decisions at the company.

Drawbacks of the Googleplex

The Laws of Physics: Heat and Power 101

In reality, no one knows. Google has a rapidly expanding number of data centers. The data center near Atlanta, Georgia, is one of the newest deployed. This state-of-the-art facility reflects what Google engineers have learned about heat and power issues in its other data centers. Within the last 12 months, Google has shifted from concentrating its servers at about a dozen data centers, each with 10,000 or more servers, to about 60 data centers, each with fewer machines. The change is a response to the heat and power issues associated with larger concentrations of Google servers.

The most failure prone components are:

  • Fans.
  • IDE drives which fail at the rate of one per 1,000 drives per day.
  • Power supplies which fail at a lower rate.

Leveraging the Googleplex

Google’s technology is one major challenge to Microsoft and Yahoo. So to conclude this cursory and vastly simplified look at Google technology, consider these items:

1. Google is fast anywhere in the world.

2. Google learns. When the heat and power problems at dense data centers surfaced, Google introduced cooling and power conservation innovations to its two dozen data centers.

3. Programmers want to work at Google. “Google has cachet,” said one recent University of Washington graduate.

4. Google’s operating and scaling costs are lower than most other firms offering similar businesses.

5. Google squeezes more work out of programmers and engineers by design.

6. Google does not break down, or at least it has not gone offline since 2000.

7. Google’s Googleplex can deliver desktop-server applications now.

8. Google’s applications install and update without burdening the user with gory details and messy crashes.

9. Google’s patents provide basic technology insight pertinent to Google’s core functionality.

An analysis of Google’s technology, 2005 Read More »

Library book returned 92 years late

From AP’s “Borrowed books returned to museum — 92 years later” (CNN: 6 November 2000):

The Field Museum of Natural History recently returned 10 volumes to the American Museum of Natural History in New York — 92 years late.

It seems a researcher from the New York museum took the books with him when he accepted a job at the Field Museum in 1908. American Museum officials suspect anthropologist Bertholt Laufer was using the books for research when he was hired away. …

Laufer had purchased 500 volumes — including texts on medicine and natural history — for the American Museum during an archaeological expedition to China from 1901 to 1904.

The American Museum didn’t even know 10 of the books — each belonging to a larger set — were missing until it decided in 1990 to computerize its collection.

Library book returned 92 years late Read More »

The largest library fine … ever.

I was an undergraduate at Washington University in St. Louis from 1985-1989, and a graduate student in English Lit. from 1989-1996. During that time, I racked up my share of library fines (not hard to do when the fines were $0.10 a day, per book), a couple of times into three digits. In fact, I always said that Olin Library was one day going to name an extension room after me: the Granneman Procrastination room.

Recently I started teaching at Wash. U. Desiring a library book, I walked into Olin Library for the first time in seven years and tried to get the volume. The student behind the desk told me that there was a problem with my account, but he was puzzled as to what it actually was. He told me that he would talk to his supervisor, who would send me an email once everything was straightened out.

A couple of days later, I received this email:

From: Lisa W—
To: scott@granneman.com
Subject: Olin library Privileges
Date: Tue, 24 Dec 2002 09:40:40

Sir,

Your record has been updated to show current status as a faculty member of UCollege. As to the fines, I looked them up in our archive and there seems to be some disagreement between our archives and your library record. We are showing fines of $714. I’ve showed this to my supervisor, letting her know that you have material you want to put on reserve for a class, and she decided to simply ignore the $714 fine and reduce the $81.60 fine to $20. If this is a bit confusing, we do have the archive printout available for you to look at. The fines seem to date from around 1989 to 1995. The $20 can be paid at the circulation desk. From that point your record will be completely current. If you do have any questions, please let me know.

Thank you.

Wow. This has to be a record!

Needless to say, I paid the $20. Gratefully.

The largest library fine … ever. Read More »