google

Google’s number tricks

From “Fuzzy maths” (The Economist: 11 May 2006):

MATHEMATICALLY confident drivers stuck in the usual jam on highway 101 through Silicon Valley were recently able to pass time contemplating a billboard that read: “{first 10-digit prime found in consecutive digits of e}.com.” The number in question, 7427466391, is a sequence that starts at the 101st digit of e, a constant that is the base of the natural logarithm. The select few who worked this out and made it to the right website then encountered a “harder” riddle. Solving it led to another web page where they were finally invited to submit their curriculum vitae.

If a billboard can capture the soul of a company, this one did, because the anonymous advertiser was Google, whose main product is the world’s most popular internet search engine. With its presumptuous humour, its mathematical obsessions, its easy, arrogant belief that it is the natural home for geniuses, the billboard spoke of a company that thinks it has taken its rightful place as the leader of the technology industry, a position occupied for the past 15 years by Microsoft. …

To outsiders, however, googley-ness often implies audacious ambition, a missionary calling to improve the world and the equation of nerdiness with virtue.

The main symptom of this, prominently displayed on the billboard, is a deification of mathematics. Google constantly leaves numerical puns and riddles for those who care to look in the right places. When it filed the regulatory documents for its stockmarket listing in 2004, it said that it planned to raise $2,718,281,828, which is $e billion to the nearest dollar. A year later, it filed again to sell another batch of shares – precisely 14,159,265, which represents the first eight digits after the decimal in the number pi (3.14159265). …

Google’s number tricks Read More »

Google’s data trove tempts the bad guys

From “Fuzzy maths” (The Economist: 11 May 2006):

Slowly, the company is realising that it is so important that it may not be able to control the ramifications of its own actions. “As more and more data builds up in the company’s disk farms,” says Edward Felten, an expert on computer privacy at Princeton University, “the temptation to be evil only increases. Even if the company itself stays non-evil, its data trove will be a massive temptation for others to do evil.”

Google’s data trove tempts the bad guys Read More »

Google on the Google File System (& Linux)

From Sanjay Ghemawat, Howard Gobioff, & Shun-Tak Leung’s “The Google File System“:

We have designed and implemented the Google File Sys- tem, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. …

The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our ser- vice as well as research and development efforts that require large data sets. The largest cluster to date provides hun- dreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. …

We have seen problems caused by application bugs, operating system bugs, human errors, and the failures of disks, memory, connectors, networking, and power sup- plies. Therefore, constant monitoring, error detection, fault tolerance, and automatic recovery must be integral to the system.

Second, files are huge by traditional standards. Multi-GB files are common. Each file typically contains many applica- tion objects such as web documents. When we are regularly working with fast growing data sets of many TBs comprising billions of objects, it is unwieldy to manage billions of ap- proximately KB-sized files even when the file system could support it. As a result, design assumptions and parameters such as I/O operation and blocksizes have to be revisited.

Third, most files are mutated by appending new data rather than overwriting existing data. Random writes within a file are practically non-existent. Once written, the files are only read, and often only sequentially. …

Multiple GFS clusters are currently deployed for different purposes. The largest ones have over 1000 storage nodes, over 300 TB of diskstorage, and are heavily accessed by hundreds of clients on distinct machines on a continuous basis. …

Despite occasional problems, the availability of Linux code has helped us time and again to explore and understand system behavior. When appropriate, we improve the kernel and share the changes with the open source community.

Google on the Google File System (& Linux) Read More »