open_source

Google on the Google File System (& Linux)

From Sanjay Ghemawat, Howard Gobioff, & Shun-Tak Leung’s “The Google File System“:

We have designed and implemented the Google File Sys- tem, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. …

The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our ser- vice as well as research and development efforts that require large data sets. The largest cluster to date provides hun- dreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients. …

We have seen problems caused by application bugs, operating system bugs, human errors, and the failures of disks, memory, connectors, networking, and power sup- plies. Therefore, constant monitoring, error detection, fault tolerance, and automatic recovery must be integral to the system.

Second, files are huge by traditional standards. Multi-GB files are common. Each file typically contains many applica- tion objects such as web documents. When we are regularly working with fast growing data sets of many TBs comprising billions of objects, it is unwieldy to manage billions of ap- proximately KB-sized files even when the file system could support it. As a result, design assumptions and parameters such as I/O operation and blocksizes have to be revisited.

Third, most files are mutated by appending new data rather than overwriting existing data. Random writes within a file are practically non-existent. Once written, the files are only read, and often only sequentially. …

Multiple GFS clusters are currently deployed for different purposes. The largest ones have over 1000 storage nodes, over 300 TB of diskstorage, and are heavily accessed by hundreds of clients on distinct machines on a continuous basis. …

Despite occasional problems, the availability of Linux code has helped us time and again to explore and understand system behavior. When appropriate, we improve the kernel and share the changes with the open source community.

Google on the Google File System (& Linux) Read More »

Tim O’Reilly’s definition of open source

From Tim O’Reilly’s “Lessons from open source software development”, Communications of the ACM 41 (4): 33-7:

Open source is a term that has recently gained currency as a way to describe the tradition of open standards, shared source code, and collaborative development behind software such as the Linux and FreeBSD operating systems, the Apache Web server, the Perl, Tcl, and Python languages, and much of the Internet infrastructure, including Bind (The Berkley Internet Name Daemon servers that run the Domain Name System), the Sendmail mail server, and many other programs. … [But] open source … means more than the source code is available. The source must be available for redistribution without restriction and without charge, and the license must permit the creation of modifications and derivative works, and must allow those derivatives to be redistributed under the same terms as the original work.

Tim O’Reilly’s definition of open source Read More »

Professions and clubs

From Giampaolo Garzarelli’s Open Source Software and the Economics of Organization:

Deborah Savage, in an innovative piece, proposes the following economic definition of a profession: a ‘profession is a network of strategic alliances across ownership boundaries among practitioners who share a core competence’ [Savage, D. A. (1994) “The Professions in theory and history: the case of pharmacy”, Business and Economic History 23 (2): 129-60.] …

In sum, the general organizational implications of Savage’s theory of professions are considerable. The most germane implications for our purposes seem to be the following.

  • The theory allows to narrowly define the area of operation of a profession because of its emphasis on core competencies – for example, pharmaceuticals, software, semiconductors, etc. – around which other capabilities and routines evolve and revolve.
  • It allows to distinguish professions from other forms of organization, such as firms, because integration of ownership is not a condicio sine qua non.
  • Professionals are autonomous and authoritative in their fields for their competencies allow them, on the one hand, ‘to solve routine problems easily and non-routine problems routinely’ (Savage 1994: 140) and, on the other, enable them to evaluate, and only be challenged by, other professionals. More concretely, they are independent yet interact in a coordinated and fertile fashion.
  • Professions are decentralized networks in that there’s not a central authority in command. The ‘organization’ of a profession is guaranteed by the exchange of knowledge that reduces uncertainty and stimulates trust amongst members. Professions are thus self-organizing.
  • Relatedly, there’s the role played by reputation as a signalling of quality, viz., reputation is a positive externality. Thus, professions can be interpreted as self-regulating organizations …

In a seminal article published in 1965, ‘An economic theory of clubs’, Buchanan described and formalized the institutional properties of a new category of good (or product) lying between the public and private polar extremes, conventionally called shared good. The good is usually enjoyed only by members participating in a voluntary association – i.e., a club – whose membership may be regulated by some dues. The theory of clubs, in a nutshell, studies the different institutional arrangements governing the supply and demand of the shared good. [Buchanan, J. M. (1965) “An economic theory of clubs”, Economica, N.S., 32 (125): 1-14.] …

Professions and clubs Read More »

Cave or community

From Sandeep Krishnamurthy’s Cave or Community?: An Empirical Examination of 100 Mature Open Source Projects:

I systematically look at the actual number of developers involved in the production of one hundred mature OSS products. What I found is more consistent with the lone developer (or cave) model of production rather than a community model (with a few glaring exceptions, of course). …

… My contention is only that communities do things other than produce the actual product- e.g. provide feature suggestions, try products out as lead users, answer questions etc. …

To be more specific the top 100 most active projects (based on Sourceforge’s activity percentile) in the mature class were chosen for this study. …

Finding 1: The vast majority of mature OSS programs are developed by a small number of individuals. …

Moreover, as shown in Table 2, only 29% of all projects had more than 5 developers while 51% of projects had 1 project administrator. Only 19 out of 100 projects had more than 10 developers. On the other extreme, 22% of projects had only one developer associated with them. …

Finding 2: Very few OSS products generate a lot of discussion. Most products do not generate too much discussion. …

Finding 3: Products with more developers tend to be viewed and downloaded more often. …

Finding 4: The number of developers working on a OSS program was unrelated to the release date.

It could be argued that older projects may have more developers associated with them. However, we found no relationship between the release date and the number of developers associated with a program. …

Even though the discussion here may seem like an example of extreme free- riding, the reader needs to know that all free-riding is not necessarily “bad”. For instance, consider public radio stations in the United States. Even the most successful stations have about a 10% contribution rate or a 90% free-ridership rate. But, they are still able to meet their goals! Similarly, the literature on lurking in e-mail lists has suggested that if everyone in a community contributes it may actually be counter-productive.

Similarly, a recent survey of participants in open-source projects conducted by the Boston Consulting Group and MIT provides more insight. The top five motivations of open-source participants were

1. To take part in an intellectually stimulating project.
2. To improve their skill.
3. To take the opportunity to work with open-source code.
4. Non-work functionality.
5. Work-related functionality.

Cave or community Read More »

Man, I lived a lot of this

Ode to the 90s
Found on FuckedCompany.com
I part-time telecommuted
as a Webmaster
for a dot com
in Y2K consulting.
They said it was
temp-to-perm.
it didn't pay
but there were options.
I swung by the office to make trades.
(Not that there's anything
wrong with that.)
cause we had a T1 Line
and there was a bull market
with a strong,
virile President.
and you never knew
when it could
crash.
I was a millionaire at 27
for thirty seconds.
I dug grunge.
then eighties.
Tony Bennet.
then Chumbawumba.
how bizzare.
how bizzare.
smoked Cohibas.
(Not that there's anything
wrong with that.)
but I didn't inhale.
Alrighty, then...
I learned HTML
and swing dancing.
moved to Seattle
but I was back on the redeye.
why did I eat
those krispy kremes?
it all seemed like a good idea
at the time.
I had a Pentium III
yeah
baby
yeah
with 9 gigs and a DVD.
It can do anythingh
even play movies.
I fell in love
in a chatroom
with a .BMP
I got the .JPEG
I wasn't so sure.....
I got emails,
but I couldn't Reply
my server was down
and our IT can't handle the MIS.
And my email didn't allow enclosures...
her ICQ was in my PDA
but I upgraded and
the memory's gone.

[Boing Boing Blog]

Man, I lived a lot of this Read More »

What makes a great hacker?

From Paul Graham’s "Great Hackers":

… In programming, as in many fields, the hard part isn’t solving problems, but deciding what problems to solve. …

What do hackers want? Like all craftsmen, hackers like good tools. In fact, that’s an understatement. Good hackers find it unbearable to use bad tools. They’ll simply refuse to work on projects with the wrong infrastructure. …

Great hackers also generally insist on using open source software. Not just because it’s better, but because it gives them more control. Good hackers insist on control. This is part of what makes them good hackers: when something’s broken, they need to fix it. …

After software, the most important tool to a hacker is probably his office. Big companies think the function of office space is to express rank. But hackers use their offices for more than that: they use their office as a place to think in. And if you’re a technology company, their thoughts are your product. So making hackers work in a noisy, distracting environment is like having a paint factory where the air is full of soot. …

Indeed, these statistics about Cobol or Java being the most popular language can be misleading. What we ought to look at, if we want to know what tools are best, is what hackers choose when they can choose freely– that is, in projects of their own. When you ask that question, you find that open source operating systems already have a dominant market share, and the number one language is probably Perl. …

Along with good tools, hackers want interesting projects. …

This is an area where managers can make a difference. Like a parent saying to a child, I bet you can’t clean up your whole room in ten minutes, a good manager can sometimes redefine a problem as a more interesting one. Steve Jobs seems to be particularly good at this, in part simply by having high standards. …

Along with interesting problems, what good hackers like is other good hackers. Great hackers tend to clump together …

When I was in grad school I used to hang around the MIT AI Lab occasionally. It was kind of intimidating at first. Everyone there spoke so fast. But after a while I learned the trick of speaking fast. You don’t have to think any faster; just use twice as many words to say everything. …

I’ve found that people who are great at something are not so much convinced of their own greatness as mystified at why everyone else seems so incompetent. …

The key to being a good hacker may be to work on what you like. When I think about the great hackers I know, one thing they have in common is the extreme difficulty of making them work on anything they don’t want to. I don’t know if this is cause or effect; it may be both. …

The best hackers tend to be smart, of course, but that’s true in a lot of fields. Is there some quality that’s unique to hackers? I asked some friends, and the number one thing they mentioned was curiosity. I’d always supposed that all smart people were curious– that curiosity was simply the first derivative of knowledge. But apparently hackers are particularly curious, especially about how things work. That makes sense, because programs are in effect giant descriptions of how things work.

Several friends mentioned hackers’ ability to concentrate– their ability, as one put it, to ‘tune out everything outside their own heads.’ …

Notes

It’s hard to say exactly what constitutes research in the computer world, but as a first approximation, it’s software that doesn’t have users.

What makes a great hacker? Read More »

Mozilla fixes a bug … fast

One of the arguments anti-open sourcers often try to advance is that open source has just as many security holes as closed source software. On top of that one, the anti-OSS folks then go on to say that once open source software is as widely used as their closed source equivalents, they’ll suffer just as many attacks. Now, I’ve argued before that this is a wrong-headed attitude, at least as far as email viruses are concerned, and I think the fact that Apache is the most-widely used Web server in the world, yet sees only a fraction of the constant stream of security disasters that IIS does, pretty much belies the argument.

Now a blogger named sacarny has created a timeline detailing a vulnerability that was found in Mozilla and the time it took to fix it. It starts on July 7, at 13:46 GMT, and ends on July 8, at 21:57 GMT – in other words, it took a little over 24 hours for the Mozilla developers to fix a serious hole. And best of all, the whole process was open and documented. Sure, open source has bugs – all software does – but it tends to get fixed. Fast.

Mozilla fixes a bug … fast Read More »