programming

Unix: An Oral History

From ‘s “Unix: An Oral History” (: ):

Multics

Gordon M. Brown

[Multics] was designed to include fault-free continuous operation capabilities, convenient remote terminal access and selective information sharing. One of the most important features of Multics was to follow the trend towards integrated multi-tasking and permit multiple programming environments and different human interfaces under one operating system.

Moreover, two key concepts had been picked up on in the development of Multics that would later serve to define Unix. These were that the less important features of the system introduced more complexity and, conversely, that the most important property of algorithms was simplicity. Ritchie explained this to Mahoney, articulating that:

The relationship of Multics to [the development of Unix] is actually interesting and fairly complicated. There were a lot of cultural things that were sort of taken over wholesale. And these include important things, [such as] the hierarchical file system and tree-structure file system – which incidentally did not get into the first version of Unix on the PDP-7. This is an example of why the whole thing is complicated. But any rate, things like the hierarchical file system, choices of simple things like the characters you use to edit lines as you’re typing, erasing characters were the same as those we had. I guess the most important fundamental thing is just the notion that the basic style of interaction with the machine, the fact that there was the notion of a command line, the notion was an explicit shell program. In fact the name shell came from Multics. A lot of extremely important things were completely internalized, and of course this is the way it is. A lot of that came through from Multics.

The Beginning

Michael Errecart and Cameron Jones

Files to Share

The Unix file system was based almost entirely on the file system for the failed Multics project. The idea was for file sharing to take place with explicitly separated file systems for each user, so that there would be no locking of file tables.

A major part of the answer to this question is that the file system had to be open. The needs of the group dictated that every user had access to every other user’s files, so the Unix system had to be extremely open. This openness is even seen in the password storage, which is not hidden at all but is encrypted. Any user can see all the encrypted passwords, but can only test one solution per second, which makes it extremely time consuming to try to break into the system.

The idea of standard input and output for devices eventually found its way into Unix as pipes. Pipes enabled users and programmers to send one function’s output into another function by simply placing a vertical line, a ‘|’ between the two functions. Piping is one of the most distinct features of Unix …

Language from B to C

… Thompson was intent on having Unix be portable, and the creation of a portable language was intrinsic to this. …

Finding a Machine

Darice Wong & Jake Knerr

… Thompson devoted a month apiece to the shell, editor, assembler, and other software tools. …

Use of Unix started in the patent office of Bell Labs, but by 1972 there were a number of non-research organizations at Bell Labs that were beginning to use Unix for software development. Morgan recalls the importance of text processing in the establishment of Unix. …

Building Unix

Jason Aughenbaugh, Jonathan Jessup, & Nicholas Spicher

The Origin of Pipes

The first edition of Thompson and Ritchie’s The Unix Programmer’s Manual was dated November 3, 1971; however, the idea of pipes is not mentioned until the Version 3 Unix manual, published in February 1973. …

Software Tools

grep was, in fact, one of the first programs that could be classified as a software tool. Thompson designed it at the request of McIlroy, as McIlroy explains:

One afternoon I asked Ken Thompson if he could lift the regular expression recognizer out of the editor and make a one-pass program to do it. He said yes. The next morning I found a note in my mail announcing a program named grep. It worked like a charm. When asked what that funny name meant, Ken said it was obvious. It stood for the editor command that it simulated, g/re/p (global regular expression print)….From that special-purpose beginning, grep soon became a household word. (Something I had to stop myself from writing in the first paragraph above shows how firmly naturalized the idea now is: ‘I used ed to grep out words from the dictionary.’) More than any other single program, grep focused the viewpoint that Kernighan and Plauger christened and formalized in Software Tools: make programs that do one thing and do it well, with as few preconceptions about input syntax as possible.

eqn and grep are illustrative of the Unix toolbox philosophy that McIlroy phrases as, “Write programs that do one thing and do it well. Write programs to work together. Write programs that handle text streams, because that is a universal interface.” This philosophy was enshrined in Kernighan and Plauger’s 1976 book, Software Tools, and reiterated in the “Foreword” to the issue of The Bell Systems Technical Journal that also introduced pipes.

Ethos

Robert Murray-Rust & Malika Seth

McIlroy says,

This is the Unix philosophy. Write programs that do one thing and do it well. Write programs to work together. Write programs that handle text streams because, that is a universal interface.

The dissemination of Unix, with a focus on what went on within Bell Labs

Steve Chen

In 1973, the first Unix applications were installed on a system involved in updating directory information and intercepting calls to numbers that had been changed. This was the first time Unix had been used in supporting an actual, ongoing operating business. Soon, Unix was being used to automate the operations systems at Bell Laboratories. It was automating the monitoring, involved in measurement, and helping to rout calls and ensure the quality of the calls.

There were numerous reasons for the friendliness the academic society, especially the academic Computer Science community, showed towards Unix. John Stoneback relates a few of these:

Unix came into many CS departments largely because it was the only powerful interactive system that could run on the sort of hardware (PDP-11s) that universities could afford in the mid ’70s. In addition, Unix itself was also very inexpensive. Since source code was provided, it was a system that could be shaped to the requirements of a particular installation. It was written in a language considerably more attractive than assembly, and it was small enough to be studied and understood by individuals. (John Stoneback, “The Collegiate Community,” Unix Review, October 1985, p. 27.)

The key features and characteristics of Unix that held it above other operating systems at the time were its software tools, its portability, its flexibility, and the fact that it was simple, compact, and efficient. The development of Unix in Bell Labs was carried on under a set of principles that the researchers had developed to guide their work. These principles included:

(i) Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new features.

(ii) Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.

(iii) Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them.

(iv) Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you’ve finished using them.”

(M.D. McIlroy, E.N. Pinson, and B.A. Tague “Unix Time-Sharing System Forward,” The Bell System Technical Journal, July-Aug 1088 vol 57, number 6 part 2. P. 1902)

Unix: An Oral History Read More »

Evaluating software features

When developing software, it’s important to rank your features, as you can’t do everything, & not everything is worth doing. One way to rank features is to categorize them in order of importance using the following three categories:

  1. Required/Essential/Necessary: Mission critical features that must be present
  2. Preferred/Conditional: Important features & enhancements that bring better experience & easier management, but can wait until later release if necessary
  3. Optional/Nice To Have: If resources permit, sure, but otherwise…

Of course, you should also group your features based upon the kinds of features they are. Here’s a suggestion for those groups:

  • User experience
  • Management
  • Security

Evaluating software features Read More »

Programmer jokes

Q: How do you tell an introverted computer scientist from an extroverted computer scientist?

A: An extroverted computer scientist looks at your shoes when he talks to you.


Knock, knock.

Who’s there?

very long pause….

Java.


Saying that Java is nice because it works on every OS is like saying that anal sex is nice because it works on every gender.


A physicist, an engineer and a programmer were in a car driving over a steep alpine pass when the brakes failed. The car was getting faster and faster, they were struggling to get round the corners and once or twice only the feeble crash barrier saved them from crashing down the side of the mountain. They were sure they were all going to die, when suddenly they spotted an escape lane. They pulled into the escape lane, and came safely to a halt.

The physicist said “We need to model the friction in the brake pads and the resultant temperature rise, see if we can work out why they failed”.

The engineer said “I think I’ve got a few spanners in the back. I’ll take a look and see if I can work out what’s wrong”.

The programmer said “Why don’t we get going again and see if it’s reproducible?”


To understand what recursion is you must first understand recursion.

Programmer jokes Read More »

COBOL is much more widely used than you might think

From Darryl Taft’s “Enterprise Applications: 20 Things You Might Not Know About COBOL (as the Language Turns 50)” (eWeek: September 2009). http://www.eweek.com/c/a/Enterprise-Applications/20-Things-You-Might-Not-Know-About-COBOL-As-the-Language-Turns-50-103943/?kc=EWKNLBOE09252009FEA1. Accessed 25 September 2009.

Five billion lines of new COBOL are developed every year.

More than 80 percent of all daily business transactions are processed in COBOL.

More than 70 percent of all worldwide business data is stored on a mainframe.

More than 70 percent of mission-critical applications are in COBOL.

More than 310 billion lines of software are in use today and more than 200 billion lines are COBOL (65 percent of the total software).

There are 200 times more COBOL transactions per day than Google searches worldwide.

An estimated 2 million people are currently working in COBOL in one form or another.

COBOL is much more widely used than you might think Read More »

Green Dam is easily exploitable

Green_Damn_site_blocked.jpg

From Scott Wolchok, Randy Yao, and J. Alex Halderman’s “Analysis of the Green Dam Censorware System” (The University of Michigan: 11 June 2009):

We have discovered remotely-exploitable vulnerabilities in Green Dam, the censorship software reportedly mandated by the Chinese government. Any web site a Green Dam user visits can take control of the PC.

According to press reports, China will soon require all PCs sold in the country to include Green Dam. This software monitors web sites visited and other activity on the computer and blocks adult content as well as politically sensitive material.

We examined the Green Dam software and found that it contains serious security vulnerabilities due to programming errors. Once Green Dam is installed, any web site the user visits can exploit these problems to take control of the computer. This could allow malicious sites to steal private data, send spam, or enlist the computer in a botnet. In addition, we found vulnerabilities in the way Green Dam processes blacklist updates that could allow the software makers or others to install malicious code during the update process.

We found these problems with less than 12 hours of testing, and we believe they may be only the tip of the iceberg. Green Dam makes frequent use of unsafe and outdated programming practices that likely introduce numerous other vulnerabilities. Correcting these problems will require extensive changes to the software and careful retesting. In the meantime, we recommend that users protect themselves by uninstalling Green Dam immediately.

Green Dam is easily exploitable Read More »

An analysis of Google’s technology, 2005

From Stephen E. Arnold’s The Google Legacy: How Google’s Internet Search is Transforming Application Software (Infonortics: September 2005):

The figure Google’s Fusion: Hardware and Software Engineering shows that Google’s technology framework has two areas of activity. There is the software engineering effort that focuses on PageRank and other applications. Software engineering, as used here, means writing code and thinking about how computer systems operate in order to get work done quickly. Quickly means the sub one-second response times that Google is able to maintain despite its surging growth in usage, applications and data processing.

Google is hardware plus software

The other effort focuses on hardware. Google has refined server racks, cable placement, cooling devices, and data center layout. The payoff is lower operating costs and the ability to scale as demand for computing resources increases. With faster turnaround and the elimination of such troublesome jobs as backing up data, Google’s hardware innovations give it a competitive advantage few of its rivals can equal as of mid-2005.

How Google Is Different from MSN and Yahoo

Google’s technologyis simultaneously just like other online companies’ technology, and very different. A data center is usually a facility owned and operated by a third party where customers place their servers. The staff of the data center manage the power, air conditioning and routine maintenance. The customer specifies the computers and components. When a data center must expand, the staff of the facility may handle virtually all routine chores and may work with the customer’s engineers for certain more specialized tasks.

Before looking at some significant engineering differences between Google and two of its major competitors, review this list of characteristics for a Google data center.

1. Google data centers – now numbering about two dozen, although no one outside Google knows the exact number or their locations. They come online and automatically, under the direction of the Google File System, start getting work from other data centers. These facilities, sometimes filled with 10,000 or more Google computers, find one another and configure themselves with minimal human intervention.

2. The hardware in a Google data center can be bought at a local computer store. Google uses the same types of memory, disc drives, fans and power supplies as those in a standard desktop PC.

3. Each Google server comes in a standard case called a pizza box with one important change: the plugs and ports are at the front of the box to make access faster and easier.

4. Google racks are assembled for Google to hold servers on their front and back sides. This effectively allows a standard rack, normally holding 40 pizza box servers, to hold 80.

5. A Google data center can go from a stack of parts to online operation in as little as 72 hours, unlike more typical data centers that can require a week or even a month to get additional resources online.

6. Each server, rack and data center works in a way that is similar to what is called “plug and play.” Like a mouse plugged into the USB port on a laptop, Google’s network of data centers knows when more resources have been connected. These resources, for the most part, go into operation without human intervention.

Several of these factors are dependent on software. This overlap between the hardware and software competencies at Google, as previously noted, illustrates the symbiotic relationship between these two different engineering approaches. At Google, from its inception, Google software and Google hardware have been tightly coupled. Google is not a software company nor is it a hardware company. Google is, like IBM, a company that owes its existence to both hardware and software. Unlike IBM, Google has a business model that is advertiser supported. Technically, Google is conceptually closer to IBM (at one time a hardware and software company) than it is to Microsoft (primarily a software company) or Yahoo! (an integrator of multiple softwares).

Software and hardware engineering cannot be easily segregated at Google. At MSN and Yahoo hardware and software are more loosely-coupled. Two examples will illustrate these differences.

Microsoft – with some minor excursions into the Xbox game machine and peripherals – develops operating systems and traditional applications. Microsoft has multiple operating systems, and its engineers are hard at work on the company’s next-generation of operating systems.

Several observations are warranted:

1. Unlike Google, Microsoft does not focus on performance as an end in itself. As a result, Microsoft gets performance the way most computer users do. Microsoft buys or upgrades machines. Microsoft does not fiddle with its operating systems and their subfunctions to get that extra time slice or two out of the hardware.

2. Unlike Google, Microsoft has to support many operating systems and invest time and energy in making certain that important legacy applications such as Microsoft Office or SQLServer can run on these new operating systems. Microsoft has a boat anchor tied to its engineer’s ankles. The boat anchor is the need to ensure that legacy code works in Microsoft’s latest and greatest operating systems.

3. Unlike Google, Microsoft has no significant track record in designing and building hardware for distributed, massively parallelised computing. The mice and keyboards were a success. Microsoft has continued to lose money on the Xbox, and the sudden demise of Microsoft’s entry into the home network hardware market provides more evidence that Microsoft does not have a hardware competency equal to Google’s.

Yahoo! operates differently from both Google and Microsoft. Yahoo! is in mid-2005 a direct competitor to Google for advertising dollars. Yahoo! has grown through acquisitions. In search, for example, Yahoo acquired 3721.com to handle Chinese language search and retrieval. Yahoo bought Inktomi to provide Web search. Yahoo bought Stata Labs in order to provide users with search and retrieval of their Yahoo! mail. Yahoo! also owns AllTheWeb.com, a Web search site created by FAST Search & Transfer. Yahoo! owns the Overture search technology used by advertisers to locate key words to bid on. Yahoo! owns Alta Vista, the Web search system developed by Digital Equipment Corp. Yahoo! licenses InQuira search for customer support functions. Yahoo has a jumble of search technology; Google has one search technology.

Historically Yahoo has acquired technology companies and allowed each company to operate its technology in a silo. Integration of these different technologies is a time-consuming, expensive activity for Yahoo. Each of these software applications requires servers and systems particular to each technology. The result is that Yahoo has a mosaic of operating systems, hardware and systems. Yahoo!’s problem is different from Microsoft’s legacy boat-anchor problem. Yahoo! faces a Balkan-states problem.

There are many voices, many needs, and many opposing interests. Yahoo! must invest in management resources to keep the peace. Yahoo! does not have a core competency in hardware engineering for performance and consistency. Yahoo! may well have considerable competency in supporting a crazy-quilt of hardware and operating systems, however. Yahoo! is not a software engineering company. Its engineers make functions from disparate systems available via a portal.

The figure below provides an overview of the mid-2005 technical orientation of Google, Microsoft and Yahoo.

2005 focuses of Google, MSN, and Yahoo

The Technology Precepts

… five precepts thread through Google’s technical papers and presentations. The following snapshots are extreme simplifications of complex, yet extremely fundamental, aspects of the Googleplex.

Cheap Hardware and Smart Software

Google approaches the problem of reducing the costs of hardware, set up, burn-in and maintenance pragmatically. A large number of cheap devices using off-the-shelf commodity controllers, cables and memory reduces costs. But cheap hardware fails.

In order to minimize the “cost” of failure, Google conceived of smart software that would perform whatever tasks were needed when hardware devices fail. A single device or an entire rack of devices could crash, and the overall system would not fail. More important, when such a crash occurs, no full-time systems engineering team has to perform technical triage at 3 a.m.

The focus on low-cost, commodity hardware and smart software is part of the Google culture.

Logical Architecture

Google’s technical papers do not describe the architecture of the Googleplex as self-similar. Google’s technical papers provide tantalizing glimpses of an approach to online systems that makes a single server share features and functions of a cluster of servers, a complete data center, and a group of Google’s data centers.

The collections of servers running Google applications on the Google version of Linux is a supercomputer. The Googleplex can perform mundane computing chores like taking a user’s query and matching it to documents Google has indexed. Further more, the Googleplex can perform side calculations needed to embed ads in the results pages shown to user, execute parallelized, high-speed data transfers like computers running state-of-the-art storage devices, and handle necessary housekeeping chores for usage tracking and billing.

When Google needs to add processing capacity or additional storage, Google’s engineers plug in the needed resources. Due to self-similarity, the Googleplex can recognize, configure and use the new resource. Google has an almost unlimited flexibility with regard to scaling and accessing the capabilities of the Googleplex.

In Google’s self-similar architecture, the loss of an individual device is irrelevant. In fact, a rack or a data center can fail without data loss or taking the Googleplex down. The Google operating system ensures that each file is written three to six times to different storage devices. When a copy of that file is not available, the Googleplex consults a log for the location of the copies of the needed file. The application then uses that replica of the needed file and continues with the job’s processing.

Speed and Then More Speed

Google uses commodity pizza box servers organized in a cluster. A cluster is group of computers that are joined together to create a more robust system. Instead of using exotic servers with eight or more processors, Google generally uses servers that have two processors similar to those found in a typical home computer.

Through proprietary changes to Linux and other engineering innovations, Google is able to achieve supercomputer performance from components that are cheap and widely available.

… engineers familiar with Google believe that read rates may in some clusters approach 2,000 megabytes a second. When commodity hardware gets better, Google runs faster without paying a premium for that performance gain.

Another key notion of speed at Google concerns writing computer programs to deploy to Google users. Google has developed short cuts to programming. An example is Google’s creating a library of canned functions to make it easy for a programmer to optimize a program to run on the Googleplex computer. At Microsoft or Yahoo, a programmer must write some code or fiddle with code to get different pieces of a program to execute simultaneously using multiple processors. Not at Google. A programmer writes a program, uses a function from a Google bundle of canned routines, and lets the Googleplex handle the details. Google’s programmers are freed from much of the tedium associated with writing software for a distributed, parallel computer.

Eliminate or Reduce Certain System Expenses

Some lucky investors jumped on the Google bandwagon early. Nevertheless, Google was frugal, partly by necessity and partly by design. The focus on frugality influenced many hardware and software engineering decisions at the company.

Drawbacks of the Googleplex

The Laws of Physics: Heat and Power 101

In reality, no one knows. Google has a rapidly expanding number of data centers. The data center near Atlanta, Georgia, is one of the newest deployed. This state-of-the-art facility reflects what Google engineers have learned about heat and power issues in its other data centers. Within the last 12 months, Google has shifted from concentrating its servers at about a dozen data centers, each with 10,000 or more servers, to about 60 data centers, each with fewer machines. The change is a response to the heat and power issues associated with larger concentrations of Google servers.

The most failure prone components are:

  • Fans.
  • IDE drives which fail at the rate of one per 1,000 drives per day.
  • Power supplies which fail at a lower rate.

Leveraging the Googleplex

Google’s technology is one major challenge to Microsoft and Yahoo. So to conclude this cursory and vastly simplified look at Google technology, consider these items:

1. Google is fast anywhere in the world.

2. Google learns. When the heat and power problems at dense data centers surfaced, Google introduced cooling and power conservation innovations to its two dozen data centers.

3. Programmers want to work at Google. “Google has cachet,” said one recent University of Washington graduate.

4. Google’s operating and scaling costs are lower than most other firms offering similar businesses.

5. Google squeezes more work out of programmers and engineers by design.

6. Google does not break down, or at least it has not gone offline since 2000.

7. Google’s Googleplex can deliver desktop-server applications now.

8. Google’s applications install and update without burdening the user with gory details and messy crashes.

9. Google’s patents provide basic technology insight pertinent to Google’s core functionality.

An analysis of Google’s technology, 2005 Read More »

Microsoft’s programmers, evaluated by an engineer

From John Wharton’s “The Origins of DOS” (Microprocessor Report: 3 October 1994):

In August of 1981, soon after Microsoft had acquired full rights to 86-DOS, Bill Gates visited Santa Clara in an effort to persuade Intel to abandon a joint development project with DRI and endorse MS-DOS instead. It was I – the Intel applications engineer then responsible for iRMX-86 and other 16-bit operating systems – who was assigned the task of performing a technical evaluation of the 86- DOS software. It was I who first informed Gates that the software he just bought was not, in fact, fully compatible with CP/M 2.2. At the time I had the distinct impression that, until then, he’d thought the entire OS had been cloned.

The strong impression I drew 13 years ago was that Microsoft programmers were untrained, undisciplined, and content merely to replicate other people’s ideas, and that they did not seem to appreciate the importance of defining operating systems and user interfaces with an eye to the future.

Microsoft’s programmers, evaluated by an engineer Read More »

A quick tutorial on writing a program that accepts plugins

On the CWE-LUG mailing list, someone asked a question about creating a program that can be extended with plugins. I thought the answer was so useful that I wanted to save it and make it available to others.

On 2/17/07, Mark wrote:

I’m a young programmer (just finishing high school) who has done a fair amount of programming with PHP, MySQL, and other web technologies. … How does one go about designing a program so it can be extended later with plugins, apis, and modules?

Ed Howland, veteran programmer, replied:

Mark, if i understand you correctly, you are seeking how to design a general purpose program that can be extended by others. It would help us to know what your target environment is. Especially if it is a dynamic language like Perl, Ruby or Python.Or a compiled language like Java or C/C++. The difference lies in linking others source code with yours, interpreted languages are easier in this respect.

That said, the general techniques are well-established. For purposes of illustration, I’ll call the code you are wanting to write the host (application) and the external modules, the guest (module.) The basic idea is to use various callbacks into the guest module from the host application. But first the guest application must register itself with the host (see it is like a hotel checkin…) This registration process can take many forms and is usually dictated by the programming environment. Anyway, the host maintains a list of registered guests. Each time a new guest registers, he is appended to said list.

Next, the host will then use the handle that represents the main object of the guest, and call an initialize routine in the guest. That routine sets stuff and gets a handle to the host so it can call things in the framework API to open windows, etc.

So the basic steps are:

  1. Devise a registration process
  2. Maintain a list of registered guest modules
  3. When starting, loop over your registered guests and call their initialize routines
    1. When a guest’s initialize routine is called, it calls pre-defined host API calls to open windows, or other things.
    2. These might cause the framework (in the host) to callback to the guest to display the window, and paint the contents of the windows.

You want to make your plugin callback interface as narrow as possible. And you want your host API to be simple to create widgets, windows, whatever in a few easy steps. If using a O-O language like Java or C#, use interfaces for both the IPlugin (guest) and IPluginHost (host) and guest module writes will inherit from or implement those interfaces. Ideally, the minimal IPlugin interface could be as small as init() and destroy() (if destroy is needed.)

Finally, if starting fresh, you might think about designing your entire application to nothing but the framework and your own pieces will simply be plugins.

The hard part is the registration process. Do you allow files to be uploaded to a web server? Does it write and re-read a config file listing plugins? I haven’t looked at DotNuke or PHPNuke or Typo, WordPress or any of the other ones. But the answer is in there.

Ruby on Rails has a built-in plugin architecture, but not one that you can upload files to, at least not w/o restarting the RoR app iteself, IIRC. It looks in a subdirectory for plugin subdirs for a file called init.rb. It just executes whatever is in that tile.

http://en.wikipedia.org/wiki/Plugin
http://codex.wordpress.org/Writing_a_Plugin
http://www.codeguru.com/Cpp/misc/misc/plug-insadd-ins/article.php/c3879/

HTH, somewhat.

Ed

A quick tutorial on writing a program that accepts plugins Read More »

Differences between Macintosh & Unix programmers

From Eric Steven Raymond’s “Problems in the Environment of Unix” (The Art of Unix Programming: 19 September 2003):

Macintosh programmers are all about the user experience. They’re architects and decorators. They design from the outside in, asking first “What kind of interaction do we want to support?” and then building the application logic behind it to meet the demands of the user-interface design. This leads to programs that are very pretty and infrastructure that is weak and rickety. In one notorious example, as late as Release 9 the MacOS memory manager sometimes required the user to manually deallocate memory by manually chucking out exited but still-resident programs. Unix people are viscerally revolted by this kind of mal-design; they don’t understand how Macintosh people could live with it.

By contrast, Unix people are all about infrastructure. We are plumbers and stonemasons. We design from the inside out, building mighty engines to solve abstractly defined problems (like “How do we get reliable packet-stream delivery from point A to point B over unreliable hardware and links?”). We then wrap thin and often profoundly ugly interfaces around the engines. The commands date(1), find(1), and ed(1) are notorious examples, but there are hundreds of others. Macintosh people are viscerally revolted by this kind of mal-design; they don’t understand how Unix people can live with it. …

In many ways this kind of parochialism has served us well. We are the keepers of the Internet and the World Wide Web. Our software and our traditions dominate serious computing, the applications where 24/7 reliability and minimal downtime is a must. We really are extremely good at building solid infrastructure; not perfect by any means, but there is no other software technical culture that has anywhere close to our track record, and it is one to be proud of. …

To non-technical end users, the software we build tends to be either bewildering and incomprehensible, or clumsy and condescending, or both at the same time. Even when we try to do the user-friendliness thing as earnestly as possible, we’re woefully inconsistent at it. Many of the attitudes and reflexes we’ve inherited from old-school Unix are just wrong for the job. Even when we want to listen to and help Aunt Tillie, we don’t know how — we project our categories and our concerns onto her and give her ‘solutions’ that she finds as daunting as her problems.

Differences between Macintosh & Unix programmers Read More »

Why software is difficult to create … & will always be difficult

From Frederick P. Brooks, Jr.’s “No Silver Bullet: Essence and Accidents of Software Engineering” (Computer: Vol. 20, No. 4 [April 1987] pp. 10-19):

The familiar software project, at least as seen by the nontechnical manager, has something of this character; it is usually innocent and straightforward, but is capable of becoming a monster of missed schedules, blown budgets, and flawed products. So we hear desperate cries for a silver bullet–something to make software costs drop as rapidly as computer hardware costs do.

But, as we look to the horizon of a decade hence, we see no silver bullet. There is no single development, in either technology or in management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity. …

The essence of a software entity is a construct of interlocking concepts: data sets, relationships among data items, algorithms, and invocations of functions. This essence is abstract in that such a conceptual construct is the same under many different representations. It is nonetheless highly precise and richly detailed.

I believe the hard part of building software to be the specification, design, and testing of this conceptual construct, not the labor of representing it and testing the fidelity of the representation. We still make syntax errors, to be sure; but they are fuzz compared with the conceptual errors in most systems. …

Let us consider the inherent properties of this irreducible essence of modern software systems: complexity, conformity, changeability, and invisibility.

Complexity. Software entities are more complex for their size than perhaps any other human construct because no two parts are alike (at least above the statement level). …

Many of the classic problems of developing software products derive from this essential complexity and its nonlinear increases with size. From the complexity comes the difficulty of communication among team members, which leads to product flaws, cost overruns, schedule delays. From the complexity comes the difficulty of enumerating, much less understanding, all the possible states of the program, and from that comes the unreliability. From complexity of function comes the difficulty of invoking function, which makes programs hard to use. From complexity of structure comes the difficulty of extending programs to new functions without creating side effects. From complexity of structure come the unvisualized states that constitute security trapdoors.

Not only technical problems, but management problems as well come from the complexity. It makes overview hard, thus impeding conceptual integrity. It makes it hard to find and control all the loose ends. It creates the tremendous learning and understanding burden that makes personnel turnover a disaster.

Conformity. … No such faith comforts the software engineer. Much of the complexity that he must master is arbitrary complexity, forced without rhyme or reason by the many human institutions and systems to which his interfaces must conform. …

Changeability. … All successful software gets changed. Two processes are at work. First, as a software product is found to be useful, people try it in new cases at the edge of or beyond the original domain. The pressures for extended function come chiefly from users who like the basic function and invent new uses for it.

Second, successful software survives beyond the normal life of the machine vehicle for which it is first written. If not new computers, then at least new disks, new displays, new printers come along; and the software must be conformed to its new vehicles of opportunity. …

Invisibility. Software is invisible and unvisualizable. …

The reality of software is not inherently embedded in space. Hence, it has no ready geometric representation in the way that land has maps, silicon chips have diagrams, computers have connectivity schematics. As soon as we attempt to diagram software structure, we find it to constitute not one, but several, general directed graphs superimposed one upon another. The several graphs may represent the flow of control, the flow of data, patterns of dependency, time sequence, name-space relationships. These graphs are usually not even planar, much less hierarchical. …

Past Breakthroughs Solved Accidental Difficulties

If we examine the three steps in software technology development that have been most fruitful in the past, we discover that each attacked a different major difficulty in building software, but that those difficulties have been accidental, not essential, difficulties. …

High-level languages. Surely the most powerful stroke for software productivity, reliability, and simplicity has been the progressive use of high-level languages for programming. …

What does a high-level language accomplish? It frees a program from much of its accidental complexity. …

Time-sharing. Time-sharing brought a major improvement in the productivity of programmers and in the quality of their product, although not so large as that brought by high-level languages.

Time-sharing attacks a quite different difficulty. Time-sharing preserves immediacy, and hence enables one to maintain an overview of complexity. …

Unified programming environments. Unix and Interlisp, the first integrated programming environments to come into widespread use, seem to have improved productivity by integral factors. Why?

They attack the accidental difficulties that result from using individual programs together, by providing integrated libraries, unified file formats, and pipes and filters. As a result, conceptual structures that in principle could always call, feed, and use one another can indeed easily do so in practice.

Why software is difficult to create … & will always be difficult Read More »

How to get 1 million MySpace friends

From Nate Mook’s “Cross-Site Scripting Worm Hits MySpace” (Beta News: 13 October 2005):

One clever MySpace user looking to expand his buddy list recently figured out how to force others to become his friend, and ended up creating the first self-propagating cross-site scripting (XSS) worm. In less than 24 hours, “Samy” had amassed over 1 million friends on the popular online community.

How did Samy transcend his humble beginnings of only 73 friends to become a veritable global celebrity? The answer is a combination of XSS tricks and lax security in certain Web browsers.

First, by examining the restrictions put into place by MySpace, Samy discovered how to insert raw HTML into his user profile page. But MySpace stripped out the word “javascript” from any text, which would be needed to execute code.

With the help of Internet Explorer, Samy was able to break the word JavaScript into two lines and place script code within a Cascading Style Sheet tag.

The next step was to simply instruct the Web browser to load a MySpace URL that would automatically invite Samy as a friend, and later add him as a “hero” to the visitor’s own profile page. To do this without a user’s knowledge, the code utilized XMLHTTPRequest – a JavaScript object used in AJAX, or Web 2.0, applications such as Google Maps.

Taking the hack even further, Samy realized that he could simply insert the entire script into the visiting user’s profile, creating a replicating worm. “So if 5 people viewed my profile, that’s 5 new friends. If 5 people viewed each of their profiles, that’s 25 more new friends,” Samy explained.

It didn’t take long for friend requests to start rolling in – first in the hundreds, then thousands. By 9:30pm that night, requests topped one million and continued arriving at a rate of 1,000 every few seconds. Less than an hour later, MySpace was taken offline while the worm was removed from all user profiles.

How to get 1 million MySpace friends Read More »

The Vitruvian Triad & the Urban Triad

From Andrés Duany’s “Classic Urbanism“:

From time to time there appears a concept of exceptional longevity. In architecture, the pre-eminent instance is the Vitruvian triad of Comoditas, Utilitas, e Venustas. This Roman epigram was propelled into immortality by Lord Burlington’s felicitous translation as Commodity, Firmness and Delight.

It has thus passed down the centuries and remains authoritative, even if not always applied in practice; Commodity: That a building must accommodate its program; Firmness: That it must stand up to the natural elements, among them gravity; Delight: that it must be satisfying to the eye, is with the aberrant exception of the tiny, current avant garde, the ideal of architecture. …

Let me propose the urban triad of Function, Disposition and Configuration as categories that would both describe and “test” the urban performance of a building.

Function describes the use to which the building lends itself, towards the ideal of mixed-use. In urbanism the range of function a first cut may include: exclusively residential, primarily residential, primarily commercial or exclusively commercial. The middle two being the best in urban performance although the extremes have justification in the urban to rural transect. An elaboration should probably differentiate the function at the all-important sidewalk level from the function above.

Disposition describes the location of the building on its lot or site. This may range from a building placed across the frontage of its lot, creating a most urban condition to the rural condition of the building freestanding in the center of its site. Perhaps the easiest way to categorize the disposition of the building is by describing it by its yards: The rearyard building has the building along the frontage, the courtyard building internalizes the space and is just as urban, the sideyard building is the zero-lot line or “Charleston single house” and the edgeyard building is a freestanding object closest to the rural edge of the transect.

The third component of the urban triad is Configuration. This describes the massing, height of a building and, for those who believe that harmony is a tool of urbanism, the architectural syntax and constructional tectonic. It can be argued that the surface of a building is a tool of urbanism no less than its form. Silence of expression is required to achieve the “wall” that defines public space, and that reserves the exalted configuration to differentiate the public building. Harmony in the architectural language is the secret of mixed-use. People seem not to mind variation of function as long as the container looks similar. It is certainly a concern of urbanism.

The Vitruvian Triad & the Urban Triad Read More »

Vitruvian Triad terminology

From “Good Architecture“:

In ‘building architecture’, for comparison, we have the 3 classic Vitruvian qualities to which ‘GoodArchitecture’ aspires:

‘Firmitas, Utilitas and Venustas’ (Marcus Vitruvius Pollio ‘The Ten Books of Architecture’ 1st C AD).

These qualities may be translated as: ‘Technology, Function and Form’ (C St J Wilson ‘ArchitecturalReflections?; Studies in the Philosophy and Practice of Architecture’ 1992 ISBN 0-7506-1283-5

or, in the slightly more familiar but antique: ‘Firmness, Commodity & Delight’

— MartinNoutch

Vitruvian Triad terminology Read More »

Culture, values, & designing technology systems

From danah boyd’s “G/localization: When Global Information and Local Interaction Collide“:

Culture is the set of values, norms and artifacts that influence people’s lives and worldview. Culture is embedded in material objects and in conceptual frameworks about how the world works. …

People are a part of multiple cultures – the most obvious of which are constructed by religion and nationality, but there are all sorts of cultures that form from identities and communities of practice. … Identification and participation in that culture means sharing a certain set of cultural values and ideas about how the world should work. …

Cultural norms evolve over time, influenced by people, their practices, and their environment. Culture is written into law and laws influence the evolution of culture. Cultures develop their own symbols as a way of conveying information. Often, these symbols make sense to those within a culture but are not parsable to those outside. Part of becoming indoctrinated into a culture is learning the symbols of that culture. …

… there are numerous cultural forces affecting your life at all times. How you see the world and how you design or build technology is greatly influenced by the various cultural concepts you hold onto. …

… algorithms are simply the computer manifestation of a coder’s cultural norms.

Culture, values, & designing technology systems Read More »

A very brief history of programming

From Brian Hayes’ “The Post-OOP Paradigm“:

The architects of the earliest computer systems gave little thought to software. (The very word was still a decade in the future.) Building the machine itself was the serious intellectual challenge; converting mathematical formulas into program statements looked like a routine clerical task. The awful truth came out soon enough. Maurice V. Wilkes, who wrote what may have been the first working computer program, had his personal epiphany in 1949, when “the realization came over me with full force that a good part of the remainder of my life was going to be spent in finding errors in my own programs.” Half a century later, we’re still debugging.

The very first programs were written in pure binary notation: Both data and instructions had to be encoded in long, featureless strings of 1s and 0s. Moreover, it was up to the programmer to keep track of where everything was stored in the machine’s memory. Before you could call a subroutine, you had to calculate its address.

The technology that lifted these burdens from the programmer was assembly language, in which raw binary codes were replaced by symbols such as load, store, add, sub. The symbols were translated into binary by a program called an assembler, which also calculated addresses. This was the first of many instances in which the computer was recruited to help with its own programming.

Assembly language was a crucial early advance, but still the programmer had to keep in mind all the minutiae in the instruction set of a specific computer. Evaluating a short mathematical expression such as x 2+y 2 might require dozens of assembly-language instructions. Higher-level languages freed the programmer to think in terms of variables and equations rather than registers and addresses. In Fortran, for example, x 2+y 2 would be written simply as X**2+Y**2. Expressions of this kind are translated into binary form by a program called a compiler.

… By the 1960s large software projects were notorious for being late, overbudget and buggy; soon came the appalling news that the cost of software was overtaking that of hardware. Frederick P. Brooks, Jr., who managed the OS/360 software program at IBM, called large-system programming a “tar pit” and remarked, “Everyone seems to have been surprised by the stickiness of the problem.”

One response to this crisis was structured programming, a reform movement whose manifesto was Edsger W. Dijkstra’s brief letter to the editor titled “Go to statement considered harmful.” Structured programs were to be built out of subunits that have a single entrance point and a single exit (eschewing the goto command, which allows jumps into or out of the middle of a routine). Three such constructs were recommended: sequencing (do A, then B, then C), alternation (either do A or do B) and iteration (repeat A until some condition is satisfied). Corrado Böhm and Giuseppe Jacopini proved that these three idioms are sufficient to express essentially all programs.

Structured programming came packaged with a number of related principles and imperatives. Top-down design and stepwise refinement urged the programmer to set forth the broad outlines of a procedure first and only later fill in the details. Modularity called for self-contained units with simple interfaces between them. Encapsulation, or data hiding, required that the internal workings of a module be kept private, so that later changes to the module would not affect other areas of the program. All of these ideas have proved their worth and remain a part of software practice today. But they did not rescue programmers from the tar pit.

Object-oriented programming addresses these issues by packing both data and procedures—both nouns and verbs—into a single object. An object named triangle would have inside it some data structure representing a three-sided shape, but it would also include the procedures (called methods in this context) for acting on the data. To rotate a triangle, you send a message to the triangle object, telling it to rotate itself. Sending and receiving messages is the only way objects communicate with one another; outsiders are not allowed direct access to the data. Because only the object’s own methods know about the internal data structures, it’s easier to keep them in sync.

You define the class triangle just once; individual triangles are created as instances of the class. A mechanism called inheritance takes this idea a step further. You might define a more-general class polygon, which would have triangle as a subclass, along with other subclasses such as quadrilateral, pentagon and hexagon. Some methods would be common to all polygons; one example is the calculation of perimeter, which can be done by adding the lengths of the sides, no matter how many sides there are. If you define the method calculate-perimeter in the class polygon, all the subclasses inherit this code.

A very brief history of programming Read More »

Four principles of modernity

From “Relativity, Uncertainty, Incompleteness and Undecidability“:

In this article four fundamental principles are presented: relativity, uncertainty, incompleteness and undecidability. They were studied by, respectively, Albert Einstein, Werner Heisenberg, Kurt Gödel and Alan Turing. …

Relativity says that there is no privileged, “objective” viewpoint for certain observations. … Now, if things move relative to each other, then obviously their positions at a given time are also measured relative to each other. …

Werner Heisenberg showed that if we built a machine to tell us with high precision were an electron is, this machine could not also tell us the speed of the electron. If we want to measure its speed without altering it we can use a different light but then we wouldn’t know where it is. At atomic scale, no instrument can tell us at the same time exactly where a particle is and exactly at what speed it is moving. …

If this system is complete, then anything that is true is provable. Similarly, anything false is provable false. Kurt Gödel got the intuition that traditional mathematical logic was not complete, and devoted several years to try to find one thing, a single thing that was inside the mathematics but outside the reach of logic. … Gödel’s incompleteness means that the classical mathematical logic deductive system, and actually any logical system consistent and expressive enough, is not complete, has “holes” full of expressions that are not logically true nor false. …

Turing’s halting problem is one of the problems that fall in to the category of undecidable problems. It says that it is not possible to write a program to decide if other program is correctly written, in the sense that it will never hang. This creates a limit to the verification of all programs, as all the attempts of building actual computers, usable in practice and different from Turing machines have been proved to be equivalent in power and limitations to the basic Turing machine.

Four principles of modernity Read More »

A programmer’s poem

From dive into mark:

First, the poem itself (there are many versions, this is just one):

< > ! * ' ' #
^ " ` $ $ -
! * = @ $ _
% * < > ~ # 4
& [ ] . . /
| { , , system halted

In English, this reads:

waka waka bang splat tick tick hash
caret quote back-tick dollar dollar dash
bang splat equal at dollar under-score
percent splat waka waka tilda number four
ampersand bracket bracket dot dot slash
vertical-bar curly-bracket comma comma crash

A programmer’s poem Read More »