google

David Pogue’s insights about tech over time

From David Pogue’s “The Lessons of 10 Years of Talking Tech” (The New York Times: 25 November 2010):

As tech decades go, this one has been a jaw-dropper. Since my first column in 2000, the tech world has not so much blossomed as exploded. Think of all the commonplace tech that didn’t even exist 10 years ago: HDTV, Blu-ray, GPS, Wi-Fi, Gmail, YouTube, iPod, iPhone, Kindle, Xbox, Wii, Facebook, Twitter, Android, online music stores, streaming movies and on and on.

With the turkey cooking, this seems like a good moment to review, to reminisce — and to distill some insight from the first decade in the new tech millennium.

Things don’t replace things; they just splinter. I can’t tell you how exhausting it is to keep hearing pundits say that some product is the “iPhone killer” or the “Kindle killer.” Listen, dudes: the history of consumer tech is branching, not replacing.

Things don’t replace things; they just add on. Sooner or later, everything goes on-demand. The last 10 years have brought a sweeping switch from tape and paper storage to digital downloads. Music, TV shows, movies, photos and now books and newspapers. We want instant access. We want it easy.

Some people’s gadgets determine their self-esteem. … Today’s gadgets are intensely personal. Your phone or camera or music player makes a statement, reflects your style and character. No wonder some people interpret criticisms of a product as a criticism of their choices. By extension, it’s a critique of them.

Everybody reads with a lens. … feelings run just as strongly in the tech realm. You can’t use the word “Apple,” “Microsoft” or “Google” in a sentence these days without stirring up emotion.

It’s not that hard to tell the winners from the losers. … There was the Microsoft Spot Watch (2003). This was a wireless wristwatch that could display your appointments and messages — but cost $10 a month, had to be recharged nightly and wouldn’t work outside your home city unless you filled out a Web form in advance.

Some concepts’ time may never come. The same “breakthrough” ideas keep surfacing — and bombing, year after year. For the love of Mike, people, nobody wants videophones!

Teenagers do not want “communicators” that do nothing but send text messages, either (AT&T Ogo, Sony Mylo, Motorola V200). People do not want to surf the Internet on their TV screens (WebTV, AOLTV, Google TV). And give it up on the stripped-down kitchen “Internet appliances” (3Com Audrey, Netpliance i-Opener, Virgin Webplayer). Nobody has ever bought one, and nobody ever will.

Forget about forever — nothing lasts a year. Of the thousands of products I’ve reviewed in 10 years, only a handful are still on the market. Oh, you can find some gadgets whose descendants are still around: iPod, BlackBerry, Internet Explorer and so on.

But it’s mind-frying to contemplate the millions of dollars and person-years that were spent on products and services that now fill the Great Tech Graveyard: Olympus M-Robe. PocketPC. Smart Display. MicroMV. MSN Explorer. Aibo. All those PlaysForSure music players, all those Palm organizers, all those GPS units you had to load up with maps from your computer.

Everybody knows that’s the way tech goes. The trick is to accept your
gadget’s obsolescence at the time you buy it…

Nobody can keep up. Everywhere I go, I meet people who express the same reaction to consumer tech today: there’s too much stuff coming too fast. It’s impossible to keep up with trends, to know what to buy, to avoid feeling left behind. They’re right. There’s never been a period of greater technological change. You couldn’t keep up with all of it if you tried.

David Pogue’s insights about tech over time Read More »

What Google’s book settlement means

Google Book Search
Image via Wikipedia

From Robert Darnton’s “Google & the Future of Books” (The New York Review of Books: 12 February 2009):

As the Enlightenment faded in the early nineteenth century, professionalization set in. You can follow the process by comparing the Encyclopédie of Diderot, which organized knowledge into an organic whole dominated by the faculty of reason, with its successor from the end of the eighteenth century, the Encyclopédie méthodique, which divided knowledge into fields that we can recognize today: chemistry, physics, history, mathematics, and the rest. In the nineteenth century, those fields turned into professions, certified by Ph.D.s and guarded by professional associations. They metamorphosed into departments of universities, and by the twentieth century they had left their mark on campuses…

Along the way, professional journals sprouted throughout the fields, subfields, and sub-subfields. The learned societies produced them, and the libraries bought them. This system worked well for about a hundred years. Then commercial publishers discovered that they could make a fortune by selling subscriptions to the journals. Once a university library subscribed, the students and professors came to expect an uninterrupted flow of issues. The price could be ratcheted up without causing cancellations, because the libraries paid for the subscriptions and the professors did not. Best of all, the professors provided free or nearly free labor. They wrote the articles, refereed submissions, and served on editorial boards, partly to spread knowledge in the Enlightenment fashion, but mainly to advance their own careers.

The result stands out on the acquisitions budget of every research library: the Journal of Comparative Neurology now costs $25,910 for a year’s subscription; Tetrahedron costs $17,969 (or $39,739, if bundled with related publications as a Tetrahedron package); the average price of a chemistry journal is $3,490; and the ripple effects have damaged intellectual life throughout the world of learning. Owing to the skyrocketing cost of serials, libraries that used to spend 50 percent of their acquisitions budget on monographs now spend 25 percent or less. University presses, which depend on sales to libraries, cannot cover their costs by publishing monographs. And young scholars who depend on publishing to advance their careers are now in danger of perishing.

The eighteenth-century Republic of Letters had been transformed into a professional Republic of Learning, and it is now open to amateurs—amateurs in the best sense of the word, lovers of learning among the general citizenry. Openness is operating everywhere, thanks to “open access” repositories of digitized articles available free of charge, the Open Content Alliance, the Open Knowledge Commons, OpenCourseWare, the Internet Archive, and openly amateur enterprises like Wikipedia. The democratization of knowledge now seems to be at our fingertips. We can make the Enlightenment ideal come to life in reality.

What provoked these jeremianic- utopian reflections? Google. Four years ago, Google began digitizing books from research libraries, providing full-text searching and making books in the public domain available on the Internet at no cost to the viewer. For example, it is now possible for anyone, anywhere to view and download a digital copy of the 1871 first edition of Middlemarch that is in the collection of the Bodleian Library at Oxford. Everyone profited, including Google, which collected revenue from some discreet advertising attached to the service, Google Book Search. Google also digitized an ever-increasing number of library books that were protected by copyright in order to provide search services that displayed small snippets of the text. In September and October 2005, a group of authors and publishers brought a class action suit against Google, alleging violation of copyright. Last October 28, after lengthy negotiations, the opposing parties announced agreement on a settlement, which is subject to approval by the US District Court for the Southern District of New York.[2]

The settlement creates an enterprise known as the Book Rights Registry to represent the interests of the copyright holders. Google will sell access to a gigantic data bank composed primarily of copyrighted, out-of-print books digitized from the research libraries. Colleges, universities, and other organizations will be able to subscribe by paying for an “institutional license” providing access to the data bank. A “public access license” will make this material available to public libraries, where Google will provide free viewing of the digitized books on one computer terminal. And individuals also will be able to access and print out digitized versions of the books by purchasing a “consumer license” from Google, which will cooperate with the registry for the distribution of all the revenue to copyright holders. Google will retain 37 percent, and the registry will distribute 63 percent among the rightsholders.

Meanwhile, Google will continue to make books in the public domain available for users to read, download, and print, free of charge. Of the seven million books that Google reportedly had digitized by November 2008, one million are works in the public domain; one million are in copyright and in print; and five million are in copyright but out of print. It is this last category that will furnish the bulk of the books to be made available through the institutional license.

Many of the in-copyright and in-print books will not be available in the data bank unless the copyright owners opt to include them. They will continue to be sold in the normal fashion as printed books and also could be marketed to individual customers as digitized copies, accessible through the consumer license for downloading and reading, perhaps eventually on e-book readers such as Amazon’s Kindle.

After reading the settlement and letting its terms sink in—no easy task, as it runs to 134 pages and 15 appendices of legalese—one is likely to be dumbfounded: here is a proposal that could result in the world’s largest library. It would, to be sure, be a digital library, but it could dwarf the Library of Congress and all the national libraries of Europe. Moreover, in pursuing the terms of the settlement with the authors and publishers, Google could also become the world’s largest book business—not a chain of stores but an electronic supply service that could out-Amazon Amazon.

An enterprise on such a scale is bound to elicit reactions of the two kinds that I have been discussing: on the one hand, utopian enthusiasm; on the other, jeremiads about the danger of concentrating power to control access to information.

Google is not a guild, and it did not set out to create a monopoly. On the contrary, it has pursued a laudable goal: promoting access to information. But the class action character of the settlement makes Google invulnerable to competition. Most book authors and publishers who own US copyrights are automatically covered by the settlement. They can opt out of it; but whatever they do, no new digitizing enterprise can get off the ground without winning their assent one by one, a practical impossibility, or without becoming mired down in another class action suit. If approved by the court—a process that could take as much as two years—the settlement will give Google control over the digitizing of virtually all books covered by copyright in the United States.

Google alone has the wealth to digitize on a massive scale. And having settled with the authors and publishers, it can exploit its financial power from within a protective legal barrier; for the class action suit covers the entire class of authors and publishers. No new entrepreneurs will be able to digitize books within that fenced-off territory, even if they could afford it, because they would have to fight the copyright battles all over again. If the settlement is upheld by the court, only Google will be protected from copyright liability.

Google’s record suggests that it will not abuse its double-barreled fiscal-legal power. But what will happen if its current leaders sell the company or retire? The public will discover the answer from the prices that the future Google charges, especially the price of the institutional subscription licenses. The settlement leaves Google free to negotiate deals with each of its clients, although it announces two guiding principles: “(1) the realization of revenue at market rates for each Book and license on behalf of the Rightsholders and (2) the realization of broad access to the Books by the public, including institutions of higher education.”

What will happen if Google favors profitability over access? Nothing, if I read the terms of the settlement correctly. Only the registry, acting for the copyright holders, has the power to force a change in the subscription prices charged by Google, and there is no reason to expect the registry to object if the prices are too high. Google may choose to be generous in it pricing, and I have reason to hope it may do so; but it could also employ a strategy comparable to the one that proved to be so effective in pushing up the price of scholarly journals: first, entice subscribers with low initial rates, and then, once they are hooked, ratchet up the rates as high as the traffic will bear.

What Google’s book settlement means Read More »

The limitations of Windows 7 on netbooks

From Farhad Manjoo’s “I, for One, Welcome Our New Android Overlords” (Slate: 5 June 2008):

Microsoft promises that Windows 7 will be able to run on netbooks, but it has announced a risky strategy to squeeze profits from these machines. The company plans to cripple the cheapest versions of the new OS in order to encourage PC makers to pay for premium editions. If you buy a netbook that comes with the low-priced Windows 7 Starter Edition, you won’t be able to change your screen’s background or window colors, you won’t be able to play DVDs, you can’t connect it to another monitor, and you won’t see many of the user-interface advances found in other versions. If you’d like more flexibility, you’ll need to upgrade to a more expensive version of Windows—which will, of course, defeat the purpose of your cheap PC. (Microsoft had originally planned to limit Starter Edition even further—you wouldn’t be able to run more than three programs at a time. It removed that limitation after howls of protest.)

The limitations of Windows 7 on netbooks Read More »

A better alternative to text CAPTCHAs

From Rich Gossweiler, Maryam Kamvar, & Shumeet Baluja’s “What’s Up CAPTCHA?: A CAPTCHA Based On Image Orientation” (Google: 20-24 April 2009):

There are several classes of images which can be successfully oriented by computers. Some objects, such as faces, cars, pedestrians, sky, grass etc.

Many images, however, are difficult for computers to orient. For example, indoor scenes have variations in lighting sources, and abstract and close-up images provide the greatest challenge to both computers and people, often because no clear anchor points or lighting sources exist.

The average performance on outdoor photographs, architecture photographs and typical tourist type photographs was significantly higher than the performance on abstract photographs, close-ups and backgrounds. When an analysis of the features used to make the discriminations was done, it was found that the edge features play a significant role.

It is important not to simply select random images for this task. There are many cues which can quickly reveal the upright orientation of an image to automated systems; these images must be filtered out. For example, if typical vacation or snapshot photos are used, automated rotation accuracies can be in the 90% range. The existence of any of the cues in the presented images will severely limit the effectiveness of the approach. Three common cues are listed below:

1. Text: Usually the predominant orientation of text in an image reveals the upright orientation of an image.

2. Faces and People: Most photographs are taken with the face(s) / people upright in the image.

3. Blue skies, green grass, and beige sand: These are all revealing clues, and are present in many travel/tourist photographs found on the web. Extending this beyond color, in general, the sky often has few texture/edges in comparison to the ground. Additional cues found important in human tests include "grass", "trees", "cars", "water" and "clouds".

Second, due to sometimes warped objects, lack of shading and lighting cues, and often unrealistic colors, cartoons also make ideal candidates. … Finally, although we did not alter the content of the image, it may be possible to simply alter the color- mapping, overall lighting curves, and hue/saturation levels to reveal images that appear unnatural but remain recognizable to people.

To normalize the shape and size of the images, we scaled each image to a 180×180 pixel square and we then applied a circular mask to remove the image corners.

We have created a system that has sufficiently high human- success rates and sufficiently low computer-success rates. When using three images, the rotational CAPTCHA system results in an 84% human success metric, and a .009% bot-success metric (assuming random guessing). These metrics are based on two variables: the number of images we require a user to rotate and the size of the acceptable error window (the degrees from upright which we still consider to be upright). Predictably, as the number of images shown becomes greater, the probability of correctly solving them decreases. However, as the error window increases, the probability of correctly solving them increases. The system which results in an 84% human success rate and .009% bot success rate asks the user to rotate three images, each within 16° of upright (8-degrees on either side of upright).

A CAPTCHA system which displayed ≥ 3 images with a ≤ 16-degree error window would achieve a guess success rate of less than 1 in 10,000, a standard acceptable computer success rates for CAPTCHAs.

In our experiments, users moved a slider to rotate the image to its upright position. On small display devices such as a mobile phone, they could directly manipulate the image using a touch screen, as seen in Figure 12, or can rotate it via button presses.

A better alternative to text CAPTCHAs Read More »

Extreme male brains

From Joe Clark’s “The extreme Google brain” (Fawny: 26 April 2009):

… Susan Pinker’s The Sexual Paradox, which explains, using scientific findings, why large majorities of girls and women behave almost identically at different stages of their lives – while large minorities of boys and men show vast variability compared to each other and to male norms.

Some of these boys and men exhibit extreme-male-brain tendencies, including an ability to focus obsessively for long periods of time, often on inanimate objects or abstractions (hence male domination of engineering and high-end law). Paradoxically, other male brains in these exceptional cases may have an ability to experiment with many options for short periods each. Pejoratively diagnosed as attention-deficit disorder, Pinker provides evidence this latter ability is actually a strength for some entrepreneurs.

The male brain, extreme or not, is compatible with visual design. It allows you to learn every font in the Letraset catalogue and work from a grid. In fact, the male-brain capacity for years-long single-mindedness explains why the heads of large ad agencies and design houses are overwhelmingly male. (It isn’t a sexist conspiracy.)

In the computer industry, extreme male brains permit years of concentration on hardware and software design, while also iterating those designs seemingly ad infinitum. The extreme male brain is really the extreme Google brain. It’s somewhat of a misnomer, because such is actually the average brain inside the company, but I will use that as a neologism.

Google was founded by extreme-male-brain nerds and, by all outward appearances, seems to hire only that type of person, not all of them male.

Extreme male brains Read More »

More on Google’s server farms

From Joel Hruska’s “The Beast unveiled: inside a Google server” (Ars Technica: 2 April 2009):

Each Google server is hooked to an independent 12V battery to keep the units running in the event of a power outage. Data centers themselves are built and housed in shipping containers (we’ve seen Sun pushing this trend as well), a practice that went into effect after the brownouts of 2005. Each container holds a total of 1,160 servers and can theoretically draw up to 250kW. Those numbers might seem a bit high for a data center optimized for energy efficiency—it breaks down to around 216W per system—but there are added cooling costs to be considered in any type of server deployment. These sorts of units were built for parking under trees (or at sea, per Google’s patent application).

By using individual batteries hooked to each server (instead of a UPS), the company is able to use the available energy much more efficiently (99.9 percent efficiency vs. 92-95 percent efficiency for a typical battery) and the rack-mounted servers are 2U with 8 DIMM slots. Ironically, for a company talking about power efficiency, the server box in question is scarcely a power sipper. The GA-9IVDP is a custom-built motherboard—I couldn’t find any information about it in Gigabyte’s website—but online research and a scan of Gigabyte’s similarly named products implies that this is a Socket 604 dual-Xeon board running dual Nocono (Prescott) P4 processors.

More on Google’s server farms Read More »

Google’s server farm revealed

From Nicholas Carr’s “Google lifts its skirts” (Rough Type: 2 April 2009):

I was particularly surprised to learn that Google rented all its data-center space until 2005, when it built its first center. That implies that The Dalles, Oregon, plant (shown in the photo above) was the company’s first official data smelter. Each of Google’s containers holds 1,160 servers, and the facility’s original server building had 45 containers, which means that it probably was running a total of around 52,000 servers. Since The Dalles plant has three server buildings, that means – and here I’m drawing a speculative conclusion – that it might be running around 150,000 servers altogether.

Here are some more details, from Rich Miller’s report:

The Google facility features a “container hanger” filled with 45 containers, with some housed on a second-story balcony. Each shipping container can hold up to 1,160 servers, and uses 250 kilowatts of power, giving the container a power density of more than 780 watts per square foot. Google’s design allows the containers to operate at a temperature of 81 degrees in the hot aisle. Those specs are seen in some advanced designs today, but were rare indeed in 2005 when the facility was built.

Google’s design focused on “power above, water below,” according to [Jimmy] Clidaras, and the racks are actually suspended from the ceiling of the container. The below-floor cooling is pumped into the hot aisle through a raised floor, passes through the racks and is returned via a plenum behind the racks. The cooling fans are variable speed and tightly managed, allowing the fans to run at the lowest speed required to cool the rack at that moment …

[Urs] Holzle said today that Google opted for containers from the start, beginning its prototype work in 2003. At the time, Google housed all of its servers in third-party data centers. “Once we saw that the commercial data center market was going to dry up, it was a natural step to ask whether we should build one,” said Holzle.

Google’s server farm revealed Read More »

My new book – Google Apps Deciphered – is out!

I’m really proud to announce that my 5th book is now out & available for purchase: Google Apps Deciphered: Compute in the Cloud to Streamline Your Desktop. My other books include:

(I’ve also contributed to two others: Ubuntu Hacks: Tips & Tools for Exploring, Using, and Tuning Linux and Microsoft Vista for IT Security Professionals.)

Google Apps Deciphered is a guide to setting up Google Apps, migrating to it, customizing it, and using it to improve productivity, communications, and collaboration. I walk you through each leading component of Google Apps individually, and then show my readers exactly how to make them work together for you on the Web or by integrating them with your favorite desktop apps. I provide practical insights on Google Apps programs for email, calendaring, contacts, wikis, word processing, spreadsheets, presentations, video, and even Google’s new web browser Chrome. My aim was to collect together and present tips and tricks I’ve gained by using and setting up Google Apps for clients, family, and friends.

Here’s the table of contents:

  • 1: Choosing an Edition of Google Apps
  • 2: Setting Up Google Apps
  • 3: Migrating Email to Google Apps
  • 4: Migrating Contacts to Google Apps
  • 5: Migrating Calendars to Google Apps
  • 6: Managing Google Apps Services
  • 7: Setting Up Gmail
  • 8: Things to Know About Using Gmail
  • 9: Integrating Gmail with Other Software and Services
  • 10: Integrating Google Contacts with Other Software and Services
  • 11: Setting Up Google Calendar
  • 12: Things to Know About Using Google Calendar
  • 13: Integrating Google Calendar with Other Software and Services
  • 14: Things to Know About Using Google Docs
  • 15: Integrating Google Docs with Other Software and Services
  • 16: Setting Up Google Sites
  • 17: Things to Know About Using Google Sites
  • 18: Things to Know About Using Google Talk
  • 19: Things to Know About Using Start Page
  • 20: Things to Know About Using Message Security and Recovery
  • 21: Things to Know About Using Google Video
  • Appendix A: Backing Up Google Apps
  • Appendix B: Dealing with Multiple Accounts
  • Appendix C: Google Chrome: A Browser Built for Cloud Computing

If you want to know more about Google Apps and how to use it, then I know you’ll enjoy and learn from Google Apps Deciphered. You can read about and buy the book at Amazon (http://www.amazon.com/Google-Apps-Deciphered-Compute-Streamline/dp/0137004702) for $26.39. If you have any questions or comments, don’t hesitate to contact me at scott at granneman dot com.

My new book – Google Apps Deciphered – is out! Read More »

A single medium, with a single search engine, & a single info source

From Nicholas Carr’s “All hail the information triumvirate!” (Rough Type: 22 January 2009):

Today, another year having passed, I did the searches [on Google] again. And guess what:

World War II: #1
Israel: #1
George Washington: #1
Genome: #1
Agriculture: #1
Herman Melville: #1
Internet: #1
Magna Carta: #1
Evolution: #1
Epilepsy: #1

Yes, it’s a clean sweep for Wikipedia.

The first thing to be said is: Congratulations, Wikipedians. You rule. Seriously, it’s a remarkable achievement. Who would have thought that a rag-tag band of anonymous volunteers could achieve what amounts to hegemony over the results of the most popular search engine, at least when it comes to searches for common topics.

The next thing to be said is: what we seem to have here is evidence of a fundamental failure of the Web as an information-delivery service. Three things have happened, in a blink of history’s eye: (1) a single medium, the Web, has come to dominate the storage and supply of information, (2) a single search engine, Google, has come to dominate the navigation of that medium, and (3) a single information source, Wikipedia, has come to dominate the results served up by that search engine. Even if you adore the Web, Google, and Wikipedia – and I admit there’s much to adore – you have to wonder if the transformation of the Net from a radically heterogeneous information source to a radically homogeneous one is a good thing. Is culture best served by an information triumvirate?

It’s hard to imagine that Wikipedia articles are actually the very best source of information for all of the many thousands of topics on which they now appear as the top Google search result. What’s much more likely is that the Web, through its links, and Google, through its search algorithms, have inadvertently set into motion a very strong feedback loop that amplifies popularity and, in the end, leads us all, lemminglike, down the same well-trod path – the path of least resistance. You might call this the triumph of the wisdom of the crowd. I would suggest that it would be more accurately described as the triumph of the wisdom of the mob. The former sounds benign; the latter, less so.

A single medium, with a single search engine, & a single info source Read More »

Many layers of cloud computing, or just one?

From Nicholas Carr’s “Further musings on the network effect and the cloud” (Rough Type: 27 October 2008):

I think O’Reilly did a nice job of identifying the different layers of the cloud computing business – infrastructure, development platform, applications – and I think he’s right that they’ll have different economic and competitive characteristics. One thing we don’t know yet, though, is whether those layers will in the long run exist as separate industry sectors or whether they’ll collapse into a single supply model. In other words, will the infrastructure suppliers also come to dominate the supply of apps? Google and Microsoft are obviously trying to play across all three layers, while Amazon so far seems content to focus on the infrastructure business and Salesforce is expanding from the apps layer to the development platform layer. The degree to which the layers remain, or don’t remain, discrete business sectors will play a huge role in determining the ultimate shape, economics, and degree of consolidation in cloud computing.

Let me end on a speculative note: There’s one layer in the cloud that O’Reilly failed to mention, and that layer is actually on top of the application layer. It’s what I’ll call the device layer – encompassing all the various appliances people will use to tap the cloud – and it may ultimately come to be the most interesting layer. A hundred years ago, when Tesla, Westinghouse, Insull, and others were building the cloud of that time – the electric grid – companies viewed the effort in terms of the inputs to their business: in particular, the power they needed to run the machines that produced the goods they sold. But the real revolutionary aspect of the electric grid was not the way it changed business inputs – though that was indeed dramatic – but the way it changed business outputs. After the grid was built, we saw an avalanche of new products outfitted with electric cords, many of which were inconceivable before the grid’s arrival. The real fortunes were made by those companies that thought most creatively about the devices that consumers would plug into the grid. Today, we’re already seeing hints of the device layer – of the cloud as output rather than input. Look at the way, for instance, that the little old iPod has shaped the digital music cloud.

Many layers of cloud computing, or just one? Read More »

Preserve links after a website move with mod_rewrite

My blog was at http://www.granneman.com/blog, but I then moved it, after several years of living at its old address, to http://blog.granneman.com. I wanted to preserve all my links, however, so that someone going to http://www.granneman.com/blog/2008/04/20/after-a-stroke-he-can-write-but-cant-read/ would instead end up at http://blog.granneman.com/2008/04/20/after-a-stroke-he-can-write-but-cant-read/.

To do this, I edited the .htaccess file in http://www.granneman.com/blog to read as follows (For =LT=, substitute a < , and for =GT=, substitute a >):

=LT=IfModule mod_rewrite.c=GT=
RewriteEngine On 
RewriteCond %{HTTP_HOST} ^granneman.com$ 
RewriteRule ^(.*)$ http://blog.granneman.com/$1 [R=301,L] 
RewriteCond %{HTTP_HOST} ^www.granneman.com$ 
RewriteRule ^(.*)$ http://blog.granneman.com/$1 [R=301,L]
=LT=/IfModule=GT=

Works perfectly.

Preserve links after a website move with mod_rewrite Read More »

An analysis of Google’s technology, 2005

From Stephen E. Arnold’s The Google Legacy: How Google’s Internet Search is Transforming Application Software (Infonortics: September 2005):

The figure Google’s Fusion: Hardware and Software Engineering shows that Google’s technology framework has two areas of activity. There is the software engineering effort that focuses on PageRank and other applications. Software engineering, as used here, means writing code and thinking about how computer systems operate in order to get work done quickly. Quickly means the sub one-second response times that Google is able to maintain despite its surging growth in usage, applications and data processing.

Google is hardware plus software

The other effort focuses on hardware. Google has refined server racks, cable placement, cooling devices, and data center layout. The payoff is lower operating costs and the ability to scale as demand for computing resources increases. With faster turnaround and the elimination of such troublesome jobs as backing up data, Google’s hardware innovations give it a competitive advantage few of its rivals can equal as of mid-2005.

How Google Is Different from MSN and Yahoo

Google’s technologyis simultaneously just like other online companies’ technology, and very different. A data center is usually a facility owned and operated by a third party where customers place their servers. The staff of the data center manage the power, air conditioning and routine maintenance. The customer specifies the computers and components. When a data center must expand, the staff of the facility may handle virtually all routine chores and may work with the customer’s engineers for certain more specialized tasks.

Before looking at some significant engineering differences between Google and two of its major competitors, review this list of characteristics for a Google data center.

1. Google data centers – now numbering about two dozen, although no one outside Google knows the exact number or their locations. They come online and automatically, under the direction of the Google File System, start getting work from other data centers. These facilities, sometimes filled with 10,000 or more Google computers, find one another and configure themselves with minimal human intervention.

2. The hardware in a Google data center can be bought at a local computer store. Google uses the same types of memory, disc drives, fans and power supplies as those in a standard desktop PC.

3. Each Google server comes in a standard case called a pizza box with one important change: the plugs and ports are at the front of the box to make access faster and easier.

4. Google racks are assembled for Google to hold servers on their front and back sides. This effectively allows a standard rack, normally holding 40 pizza box servers, to hold 80.

5. A Google data center can go from a stack of parts to online operation in as little as 72 hours, unlike more typical data centers that can require a week or even a month to get additional resources online.

6. Each server, rack and data center works in a way that is similar to what is called “plug and play.” Like a mouse plugged into the USB port on a laptop, Google’s network of data centers knows when more resources have been connected. These resources, for the most part, go into operation without human intervention.

Several of these factors are dependent on software. This overlap between the hardware and software competencies at Google, as previously noted, illustrates the symbiotic relationship between these two different engineering approaches. At Google, from its inception, Google software and Google hardware have been tightly coupled. Google is not a software company nor is it a hardware company. Google is, like IBM, a company that owes its existence to both hardware and software. Unlike IBM, Google has a business model that is advertiser supported. Technically, Google is conceptually closer to IBM (at one time a hardware and software company) than it is to Microsoft (primarily a software company) or Yahoo! (an integrator of multiple softwares).

Software and hardware engineering cannot be easily segregated at Google. At MSN and Yahoo hardware and software are more loosely-coupled. Two examples will illustrate these differences.

Microsoft – with some minor excursions into the Xbox game machine and peripherals – develops operating systems and traditional applications. Microsoft has multiple operating systems, and its engineers are hard at work on the company’s next-generation of operating systems.

Several observations are warranted:

1. Unlike Google, Microsoft does not focus on performance as an end in itself. As a result, Microsoft gets performance the way most computer users do. Microsoft buys or upgrades machines. Microsoft does not fiddle with its operating systems and their subfunctions to get that extra time slice or two out of the hardware.

2. Unlike Google, Microsoft has to support many operating systems and invest time and energy in making certain that important legacy applications such as Microsoft Office or SQLServer can run on these new operating systems. Microsoft has a boat anchor tied to its engineer’s ankles. The boat anchor is the need to ensure that legacy code works in Microsoft’s latest and greatest operating systems.

3. Unlike Google, Microsoft has no significant track record in designing and building hardware for distributed, massively parallelised computing. The mice and keyboards were a success. Microsoft has continued to lose money on the Xbox, and the sudden demise of Microsoft’s entry into the home network hardware market provides more evidence that Microsoft does not have a hardware competency equal to Google’s.

Yahoo! operates differently from both Google and Microsoft. Yahoo! is in mid-2005 a direct competitor to Google for advertising dollars. Yahoo! has grown through acquisitions. In search, for example, Yahoo acquired 3721.com to handle Chinese language search and retrieval. Yahoo bought Inktomi to provide Web search. Yahoo bought Stata Labs in order to provide users with search and retrieval of their Yahoo! mail. Yahoo! also owns AllTheWeb.com, a Web search site created by FAST Search & Transfer. Yahoo! owns the Overture search technology used by advertisers to locate key words to bid on. Yahoo! owns Alta Vista, the Web search system developed by Digital Equipment Corp. Yahoo! licenses InQuira search for customer support functions. Yahoo has a jumble of search technology; Google has one search technology.

Historically Yahoo has acquired technology companies and allowed each company to operate its technology in a silo. Integration of these different technologies is a time-consuming, expensive activity for Yahoo. Each of these software applications requires servers and systems particular to each technology. The result is that Yahoo has a mosaic of operating systems, hardware and systems. Yahoo!’s problem is different from Microsoft’s legacy boat-anchor problem. Yahoo! faces a Balkan-states problem.

There are many voices, many needs, and many opposing interests. Yahoo! must invest in management resources to keep the peace. Yahoo! does not have a core competency in hardware engineering for performance and consistency. Yahoo! may well have considerable competency in supporting a crazy-quilt of hardware and operating systems, however. Yahoo! is not a software engineering company. Its engineers make functions from disparate systems available via a portal.

The figure below provides an overview of the mid-2005 technical orientation of Google, Microsoft and Yahoo.

2005 focuses of Google, MSN, and Yahoo

The Technology Precepts

… five precepts thread through Google’s technical papers and presentations. The following snapshots are extreme simplifications of complex, yet extremely fundamental, aspects of the Googleplex.

Cheap Hardware and Smart Software

Google approaches the problem of reducing the costs of hardware, set up, burn-in and maintenance pragmatically. A large number of cheap devices using off-the-shelf commodity controllers, cables and memory reduces costs. But cheap hardware fails.

In order to minimize the “cost” of failure, Google conceived of smart software that would perform whatever tasks were needed when hardware devices fail. A single device or an entire rack of devices could crash, and the overall system would not fail. More important, when such a crash occurs, no full-time systems engineering team has to perform technical triage at 3 a.m.

The focus on low-cost, commodity hardware and smart software is part of the Google culture.

Logical Architecture

Google’s technical papers do not describe the architecture of the Googleplex as self-similar. Google’s technical papers provide tantalizing glimpses of an approach to online systems that makes a single server share features and functions of a cluster of servers, a complete data center, and a group of Google’s data centers.

The collections of servers running Google applications on the Google version of Linux is a supercomputer. The Googleplex can perform mundane computing chores like taking a user’s query and matching it to documents Google has indexed. Further more, the Googleplex can perform side calculations needed to embed ads in the results pages shown to user, execute parallelized, high-speed data transfers like computers running state-of-the-art storage devices, and handle necessary housekeeping chores for usage tracking and billing.

When Google needs to add processing capacity or additional storage, Google’s engineers plug in the needed resources. Due to self-similarity, the Googleplex can recognize, configure and use the new resource. Google has an almost unlimited flexibility with regard to scaling and accessing the capabilities of the Googleplex.

In Google’s self-similar architecture, the loss of an individual device is irrelevant. In fact, a rack or a data center can fail without data loss or taking the Googleplex down. The Google operating system ensures that each file is written three to six times to different storage devices. When a copy of that file is not available, the Googleplex consults a log for the location of the copies of the needed file. The application then uses that replica of the needed file and continues with the job’s processing.

Speed and Then More Speed

Google uses commodity pizza box servers organized in a cluster. A cluster is group of computers that are joined together to create a more robust system. Instead of using exotic servers with eight or more processors, Google generally uses servers that have two processors similar to those found in a typical home computer.

Through proprietary changes to Linux and other engineering innovations, Google is able to achieve supercomputer performance from components that are cheap and widely available.

… engineers familiar with Google believe that read rates may in some clusters approach 2,000 megabytes a second. When commodity hardware gets better, Google runs faster without paying a premium for that performance gain.

Another key notion of speed at Google concerns writing computer programs to deploy to Google users. Google has developed short cuts to programming. An example is Google’s creating a library of canned functions to make it easy for a programmer to optimize a program to run on the Googleplex computer. At Microsoft or Yahoo, a programmer must write some code or fiddle with code to get different pieces of a program to execute simultaneously using multiple processors. Not at Google. A programmer writes a program, uses a function from a Google bundle of canned routines, and lets the Googleplex handle the details. Google’s programmers are freed from much of the tedium associated with writing software for a distributed, parallel computer.

Eliminate or Reduce Certain System Expenses

Some lucky investors jumped on the Google bandwagon early. Nevertheless, Google was frugal, partly by necessity and partly by design. The focus on frugality influenced many hardware and software engineering decisions at the company.

Drawbacks of the Googleplex

The Laws of Physics: Heat and Power 101

In reality, no one knows. Google has a rapidly expanding number of data centers. The data center near Atlanta, Georgia, is one of the newest deployed. This state-of-the-art facility reflects what Google engineers have learned about heat and power issues in its other data centers. Within the last 12 months, Google has shifted from concentrating its servers at about a dozen data centers, each with 10,000 or more servers, to about 60 data centers, each with fewer machines. The change is a response to the heat and power issues associated with larger concentrations of Google servers.

The most failure prone components are:

  • Fans.
  • IDE drives which fail at the rate of one per 1,000 drives per day.
  • Power supplies which fail at a lower rate.

Leveraging the Googleplex

Google’s technology is one major challenge to Microsoft and Yahoo. So to conclude this cursory and vastly simplified look at Google technology, consider these items:

1. Google is fast anywhere in the world.

2. Google learns. When the heat and power problems at dense data centers surfaced, Google introduced cooling and power conservation innovations to its two dozen data centers.

3. Programmers want to work at Google. “Google has cachet,” said one recent University of Washington graduate.

4. Google’s operating and scaling costs are lower than most other firms offering similar businesses.

5. Google squeezes more work out of programmers and engineers by design.

6. Google does not break down, or at least it has not gone offline since 2000.

7. Google’s Googleplex can deliver desktop-server applications now.

8. Google’s applications install and update without burdening the user with gory details and messy crashes.

9. Google’s patents provide basic technology insight pertinent to Google’s core functionality.

An analysis of Google’s technology, 2005 Read More »

An analysis of splogs: spam blogs

From Charles C. Mann’s “Spam + Blogs = Trouble” (Wired: September 2006):

Some 56 percent of active English-language blogs are spam, according to a study released in May by Tim Finin, a researcher at the University of Maryland, Baltimore County, and two of his students. “The blogosphere is growing fast,” Finin says. “But the splogosphere is now growing faster.”

A recent survey by Mitesh Vasa, a Virginia-based software engineer and splog researcher, found that in December 2005, Blogger was hosting more than 100,000 sploggers. (Many of these are likely pseudonyms for the same people.)

Some Title, the splog that commandeered my name, was created by Dan Goggins, the proud possessor of a 2005 master’s degree in computer science from Brigham Young University. Working out of his home in a leafy subdivision in Springville, Utah, Goggins, his BYU friend and partner, John Jonas, and their handful of employees operate “a few thousand” splogs. “It’s not that many,” Goggins says modestly. “Some people have a lot of sites.” Trolling the Net, I came across a PowerPoint presentation for a kind of spammers’ conference that details some of the earnings of the Goggins-Jonas partnership. Between August and October of 2005, they made at least $71,136.89.

In addition to creating massive numbers of phony blogs, sploggers sometimes take over abandoned real blogs. More than 10 million of the 12.9 million profiles on Blogger surveyed by splog researcher Vasa in June were inactive, either because the bloggers had stopped blogging or because they never got started.

Not only do sploggers create fake blogs or take over abandoned ones, they use robo-software to flood real blogs with bogus comments that link back to the splog. (“Great post! For more on this subject, click here!”) Statistics compiled by Akismet, a system put together by WordPress developer Mullenweg that tries to filter out blog spam, suggest that more than nine out of 10 comments in the blogosphere are spam.

Maryland researcher Finin and his students found that splogs produce about three-quarters of the pings from English-language blogs. Another way of saying this is that the legitimate blogosphere generates about 300,000 posts a day, but the splogosphere emits 900,000, inundating the ping servers.

Another giveaway: Both Some Title and the grave-robbing page it links to had Web addresses in the .info domain. Spammers flock to .info, which was created as an alternative to the crowded .com, because its domain names are cheaper – registrars often let people use them gratis for the first year – which is helpful for those, like sploggers, who buy Internet addresses in bulk. Splogs so commonly have .info addresses that many experts simply assume all blogs from that domain are fake.

An analysis of splogs: spam blogs Read More »

Google PageRank explained

From Danny Sullivan’s “What Is Google PageRank? A Guide For Searchers & Webmasters” (Search Engine Land: 26 April 2007):

Let’s start with what Google says. In a nutshell, it considers links to be like votes. In addition, it considers that some votes are more important than others. PageRank is Google’s system of counting link votes and determining which pages are most important based on them. These scores are then used along with many other things to determine if a page will rank well in a search.

PageRank is only a score that represents the importance of a page, as Google estimates it (By the way, that estimate of importance is considered to be Google’s opinion and protected in the US by the First Amendment. When Google was once sued over altering PageRank scores for some sites, a US court ruled: “PageRanks are opinions–opinions of the significance of particular Web sites as they correspond to a search query….the court concludes Google’s PageRanks are entitled to full constitutional protection.)

Google PageRank explained Read More »

Tim O’Reilly defines cloud computing

From Tim O’Reilly’s “Web 2.0 and Cloud Computing” (O’Reilly Radar: 26 October 2008):

Since “cloud” seems to mean a lot of different things, let me start with some definitions of what I see as three very distinct types of cloud computing:

1. Utility computing. Amazon’s success in providing virtual machine instances, storage, and computation at pay-as-you-go utility pricing was the breakthrough in this category, and now everyone wants to play. Developers, not end-users, are the target of this kind of cloud computing.

This is the layer at which I don’t presently see any strong network effect benefits (yet). Other than a rise in Amazon’s commitment to the business, neither early adopter Smugmug nor any of its users get any benefit from the fact that thousands of other application developers have their work now hosted on AWS. If anything, they may be competing for the same resources.

That being said, to the extent that developers become committed to the platform, there is the possibility of the kind of developer ecosystem advantages that once accrued to Microsoft. More developers have the skills to build AWS applications, so more talent is available. But take note: Microsoft took charge of this developer ecosystem by building tools that both created a revenue stream for Microsoft and made developers more reliant on them. In addition, they built a deep — very deep — well of complex APIs that bound developers ever-tighter to their platform.

So far, most of the tools and higher level APIs for AWS are being developed by third-parties. In the offerings of companies like Heroku, Rightscale, and EngineYard (not based on AWS, but on their own hosting platform, while sharing the RoR approach to managing cloud infrastructure), we see the beginnings of one significant toolchain. And you can already see that many of these companies are building into their promise the idea of independence from any cloud infrastructure vendor.

In short, if Amazon intends to gain lock-in and true competitive advantage (other than the aforementioned advantage of being the low-cost provider), expect to see them roll out their own more advanced APIs and developer tools, or acquire promising startups building such tools. Alternatively, if current trends continue, I expect to see Amazon as a kind of foundation for a Linux-like aggregation of applications, tools and services not controlled by Amazon, rather than for a Microsoft Windows-like API and tools play. There will be many providers of commodity infrastructure, and a constellation of competing, but largely compatible, tools vendors. Given the momentum towards open source and cloud computing, this is a likely future.

2. Platform as a Service. One step up from pure utility computing are platforms like Google AppEngine and Salesforce’s force.com, which hide machine instances behind higher-level APIs. Porting an application from one of these platforms to another is more like porting from Mac to Windows than from one Linux distribution to another.

The key question at this level remains: are there advantages to developers in one of these platforms from other developers being on the same platform? force.com seems to me to have some ecosystem benefits, which means that the more developers are there, the better it is for both Salesforce and other application developers. I don’t see that with AppEngine. What’s more, many of the applications being deployed there seem trivial compared to the substantial applications being deployed on the Amazon and force.com platforms. One question is whether that’s because developers are afraid of Google, or because the APIs that Google has provided don’t give enough control and ownership for serious applications. I’d love your thoughts on this subject.

3. Cloud-based end-user applications. Any web application is a cloud application in the sense that it resides in the cloud. Google, Amazon, Facebook, twitter, flickr, and virtually every other Web 2.0 application is a cloud application in this sense. However, it seems to me that people use the term “cloud” more specifically in describing web applications that were formerly delivered locally on a PC, like spreadsheets, word processing, databases, and even email. Thus even though they may reside on the same server farm, people tend to think of gmail or Google docs and spreadsheets as “cloud applications” in a way that they don’t think of Google search or Google maps.

This common usage points up a meaningful difference: people tend to think differently about cloud applications when they host individual user data. The prospect of “my” data disappearing or being unavailable is far more alarming than, for example, the disappearance of a service that merely hosts an aggregated view of data that is available elsewhere (say Yahoo! search or Microsoft live maps.) And that, of course, points us squarely back into the center of the Web 2.0 proposition: that users add value to the application by their use of it. Take that away, and you’re a step back in the direction of commodity computing.

Ideally, the user’s data becomes more valuable because it is in the same space as other users’ data. This is why a listing on craigslist or ebay is more powerful than a listing on an individual blog, why a listing on amazon is more powerful than a listing on Joe’s bookstore, why a listing on the first results page of Google’s search engine, or an ad placed into the Google ad auction, is more valuable than similar placement on Microsoft or Yahoo!. This is also why every social network is competing to build its own social graph rather than relying on a shared social graph utility.

This top level of cloud computing definitely has network effects. If I had to place a bet, it would be that the application-level developer ecosystems eventually work their way back down the stack towards the infrastructure level, and the two meet in the middle. In fact, you can argue that that’s what force.com has already done, and thus represents the shape of things. It’s a platform I have a strong feeling I (and anyone else interested in the evolution of the cloud platform) ought to be paying more attention to.

Tim O’Reilly defines cloud computing Read More »

How technologies have changed politics, & how Obama uses tech

From Marc Ambinder’s “HisSpace” (The Atlantic: June 2008):

Improvements to the printing press helped Andrew Jackson form and organize the Democratic Party, and he courted newspaper editors and publishers, some of whom became members of his Cabinet, with a zeal then unknown among political leaders. But the postal service, which was coming into its own as he reached for the presidency, was perhaps even more important to his election and public image. Jackson’s exploits in the War of 1812 became well known thanks in large measure to the distribution network that the postal service had created, and his 1828 campaign—among the first to distribute biographical pamphlets by mail—reinforced his heroic image. As president, he turned the office of postmaster into a patronage position, expanded the postal network further—the historian Richard John has pointed out that by the middle of Jackson’s first term, there were 2,000 more postal workers in America than soldiers in the Army—and used it to keep his populist base rallied behind him.

Abraham Lincoln became a national celebrity, according to the historian Allen Guelzo’s new book, Lincoln and Douglas: The Debates That Defined America, when transcripts of those debates were reprinted nationwide in newspapers, which were just then reaching critical mass in distribution beyond the few Eastern cities where they had previously flourished. Newspapers enabled Lincoln, an odd-looking man with a reed-thin voice, to become a viable national candidate …

Franklin Delano Roosevelt used radio to make his case for a dramatic redefinition of government itself, quickly mastering the informal tone best suited to the medium. In his fireside chats, Roosevelt reached directly into American living rooms at pivotal moments of his presidency. His talks—which by turns soothed, educated, and pressed for change—held the New Deal together.

And of course John F. Kennedy famously rode into the White House thanks in part to the first televised presidential debate in U.S. history, in which his keen sense of the medium’s visual impact, plus a little makeup, enabled him to fashion the look of a winner (especially when compared with a pale and haggard Richard Nixon). Kennedy used TV primarily to create and maintain his public image, not as a governing tool, but he understood its strengths and limitations before his peers did …

[Obama’s] speeches play well on YouTube, which allows for more than the five-second sound bites that have characterized the television era. And he recognizes the importance of transparency and consistency at a time when access to everything a politician has ever said is at the fingertips of every voter. But as Joshua Green notes in the preceding pages, Obama has truly set himself apart by his campaign’s use of the Internet to organize support. No other candidate in this or any other election has ever built a support network like Obama’s. The campaign’s 8,000 Web-based affinity groups, 750,000 active volunteers, and 1,276,000 donors have provided him with an enormous financial and organizational advantage in the Democratic primary.

What Obama seems to promise is, at its outer limits, a participatory democracy in which the opportunities for participation have been radically expanded. He proposes creating a public, Google-like database of every federal dollar spent. He aims to post every piece of non-emergency legislation online for five days before he signs it so that Americans can comment. A White House blog—also with comments—would be a near certainty. Overseeing this new apparatus would be a chief technology officer.

There is some precedent for Obama’s vision. The British government has already used the Web to try to increase interaction with its citizenry, to limited effect. In November 2006, it established a Web site for citizens seeking redress from their government, http://petitions.pm.gov.uk/. More than 29,000 petitions have since been submitted, and about 9.5 percent of Britons have signed at least one of them. The petitions range from the class-conscious (“Order a independent report to identify reasons that the living conditions of working class people are poor in relation to higher classes”) to the parochial (“We the undersigned petition the Prime Minister to re-open sunderland ice rink”).

How technologies have changed politics, & how Obama uses tech Read More »

I for one welcome our new OS overlords: Google Chrome

As some of you may have heard, Google has announced its own web browser, Chrome. It’s releasing the Windows version today, with Mac & Linux versions to follow.

To educate people about the new browser & its goals, they release a 38 pg comic book drawn by the brilliant Scott McCloud. It’s a really good read, but it gets a bit technical at times. However, someone did a “Reader’s Digest” version, which you can read here:

http://technologizer.com/2008/09/01/google-chrome-comic-the-readers-digest-version

I highly encourage you to read it. This browser is doing some very interesting, smart things. And it’s open source, so other browsers can use its code & ideas.

If you want to read the full comic, you can do so here:

http://www.google.com/googlebooks/chrome/

BTW … I don’t think Chrome has the potential of becoming the next big browser; I think instead it has the potential to become the next big operating system. See http://www.techcrunch.com/2008/09/01/meet-chrome-googles-windows-killer/ for more on that.

I for one welcome our new OS overlords: Google Chrome Read More »

How Google motivates employees

From Larry Page’s “How to Motivate Your Staff” (Business 2.0: December 2003: 90):

We wrote a program that asks every engineer what they did every week. It sends them e-mail on Monday, and concatenates the e-mails together in a document that everyone can read. And it then sends that out to everyone and shames those who did not answer by putting them on the top of the list. It has run reliably every week since we started, so for every week of our company’s history we have a record of what everyone did. It’s good for performance reviews, and if you’re joining a project team, in five minutes you can read what your team members did the last few weeks or months.

How Google motivates employees Read More »

Great, wonderfully-designed consumer products

From Farhad Manjoo’s “iPod: I love you, you’re perfect, now change” (Salon: 23 October 2006):

There are very few consumer products about which you’d want to read a whole book — the Google search engine, the first Mac, the Sony Walkman, the VW Beetle. Levy proves that the iPod, which turns five years old today, belongs to that club.

Great, wonderfully-designed consumer products Read More »