The art of programming is the art of organizing complexity, of mastering multitude and avoiding its bastard chaos as effectively as possible.”
— Edsger Dijkstra, “Notes on Structured Programming”
“Notes on Structured Programming” (1970) is the third and, for now, final Edsger Dijkstra work this series will look at. The paper fills out the terse observations in “Go To Statement Considered Harmful” and the historical overview of “The Humble Programmer” — and offers some detailed examples of the disciplined sort of approach to programming Dijkstra was advocating at the time.
In his 2001 “What led to ‘Notes on Structured Programming'” Dijkstra recalled that he wrote the original “Notes” while emerging from a depression; he felt that his previous work on ALGOL and other systems had “only been agility exercises” and he now faced tackling “the real problem of How to Do Difficult Things.”
At the outset “Notes” declares that its concern is “the composition of large programs” — which may be “say, of the same size as the whole text of this booklet.” (The essay is about 80 pages long, or about 2000 lines of text, which is certainly not small, but might well be dwarfed by latter-day codebase sizes.) Dijkstra takes pains to argue that techniques that work on small programs simply do not scale up: quantity transforms quality; “Any two things that differ in some respect by a factor of already a hundred or more, are utterly incomparable.” “Enumerative reasoning” — by which my mathematically-limited brain understands Dijkstra to mean “attempting to prove a program’s correctness by walking through its steps, line by line” — breaks down as program size grows. Therefore, he argues, “We should appreciate abstraction as our main mental technique to reduce the demands made upon enumerative reasoning.”
In the section “On Understanding Programs,” Dijkstra revisits his “Go To Statement” argument, stressing the need for mechanisms that provide a “co-ordinate system in terms of which the discrete points of computation progress can be identified.” Then, in the section “A first example of step-wise program construction,” he walks us through his method of decomposition — breaking a problem down into smaller and smaller parts so that it can be better understood and solved — using the example of a program to print the first thousand prime numbers.
It’s a fascinating exercise: The first thing he points out is that, if you had an “instruction from the well-understood repertoire” that could make sense of the statment “print first thousand prime numbers,” you were basically done. Since you didn’t have such an instruction, you could begin by breaking the problem in two: “fill table p wiith first thousand prime numbers” and “print table p.” Then each of those two statements needs to be broken down further.
Dijkstra works his way down the levels of the problem until he arrives at “primitive”-level instructions that verifiably achieve the program’s aim. Along the way, he observes: “I think that the behavior of the efficient programmer can be described as trying to take the easiest decision first, that is the decision that requires the minimum amount of investigation (trial and error, iterative mutual adjustment, etc.) for the maximum justification of the hope that he will not regret it.”
At the conclusion of this lengthy exercise, Dijkstra wanly observes that “our treatment of a very simple program has become very long, too long indeed to my taste and wishes.” But he is encouraged that “we succeeded to a large extent in building up this program one decision at a time.”
At the heart of “Notes on Structured Programming” lies Dijkstra’s advocacy of a particular approach to a program’s layered structure: “Each program layer is to be understood all by itself, under the assumption of a suitable machine to execute it, while the function of each layer is to simulate the machine that is assumed to be available on the level immediately above it.”
This “imaginary hardware” vision of abstraction may be foreign to the contemporary programming mindset; Dijkstra explains (in “What Led to ‘Notes…'”) that he’d come to this way of thinking from his experiences early in his career of writing software for hardware that was still being built and couldn’t yet run the programs. He worked that way so the code would be ready in time to ship with the new machines.
With a second detailed example illustrating this approach, Dijkstra arrives at his conclusion — a vision of program structure as “a necklace, strung from individual pearls… Changing a program will be treated as replacing one or more pearls of the original necklace by one or more other pearls.”
“Necklace strung from individual pearls” is probably Dijkstra’s best known phrase (after “Go To Statement Considered Harmful,” which, as we learned, he didn’t even write). It suggests beauty and fine handiwork; oddly, though, it also suggests circularity, which I don’t think is part of the idea. (Perhaps the structured necklace is to be envisioned unclasped.)
In the end, according to “What Led to ‘Notes’,” Dijkstra felt that his notion of “Structured Programming” was hijacked: “Apparently, IBM did not like the popularity of my text; it stole the term ‘Structured Programming’ and under its auspices Harlan D. Mills trivialized the original concept to the abolishment of he goto statement.”
Reading Dijkstra’s description today, I’m left with a sense of a common thread that runs through so many different visions of an improved programming universe. The theme is always, “Why can’t we build programs using easily exchangeable parts?” Dijkstra chose a more elegant metaphor than the common Lego blocks — but isn’t the vision the same? Or are there fundamental differences between his “pearls” and the objects and classes that would later come to dominate programming work?
[tags]code reads, dreaming in code, edsger dijkstra, structured programming, software development[/tags]
Post Revisions:
There are no revisions for this post.
I’ve only read the first part of the paper, but it sounds surprisingly modern. The emphasis on thinking of programs as machines isn’t the least bit foreign. Virtual machines go way back but are also the latest thing. (Take the Java Virtual Machine, or VMWare, or Xen.)
It’s also interesting how much Dijkstra’s argument for structured programming is based on the requirement that programs be testable, and in particular that the parts of a program should be testable in isolation. This same problem is very much with us today, with renewed emphasis.
“[…] we now take the position that it is not only the programmer’s task to produce a correct program but also to demonstrate its correctness in a convincing manner […]” — other than the formal language, any advocate of test-first programming could be making the same statement today.
I think the key to having “Lego blocks” in software development is (1) isolation, i.e. a block should stand on its own and have provable semantics, and (2) reusability, i.e. it should be easy to plug the block into a system of existing blocks, with a minimum of fuss, if possible without having to rename variables and things like that.
Expression-based languages, as opposed to statement-based ones, are more reusable. You can use a Lisp “if”-statement in a different place (assuming you bind all its free variables), while the Java “if” first has to assign a variable before it can be used in an expression context. Concatenative languages (without variables! like Forth, Factor, Joy) are even good at isolation. Because blocks don’t even have variables, they can be copied and plugged in to different places.
Of course every language has its shortcomings. Dijkstra was one of the pioneers in giving good old statement-based languages clear semantics, I’d say, and this included removing constructs that break isolation, like the goto statement. (How would you copy/transfer/plugin a new block, when inside the block there’s a jump to I-don’t-know-where?)
Dijkstra claims that to prove the correctness of a function will take an inifinte number of tests. Suppose though, I have a function that sums two complex numbers. For Dijsktra the only sufficient test will be running it on inifinite set of complex number, but for us, it is clear that one simple test will be sufficient.
It may be hard to understand but the act of writing the program itself is an attempt to prove that it is correct – with no math involved.
Dijkstra misses the point that testing comes ususally as an add-on of the prove which was already made by programmer in his head. It is not black box which is being tested. The difference between math and real programming is that right/wrong is not a black/white issue here but rather kind of scale. Even initially without any testing the program is quite correct enough – remember there are a lot of code out there which was not properly tested. Some basic test will throw it up the scale. The basic assumption is that something goes wrong it will go wrong totally, so that event simple tests will reveal it. And if it won’t then it will be acceptable trade off for time and effort consuming mathematical proving of its correctness.
Brian: Of course you’re right about virtual machines. I think that in reading Dijkstra I was thinking in a more literal-minded way about his own explanation of how he ended up working this way, writing for actual machines that simply weren’t yet available. That just threw me right off the path of thinking about the technique of writing to a machine that is always going to remain an abstraction. Sometimes, for me at least, a powerful image or metaphor obscures something simple and obvious!
Having read further into the paper, I think the place where Dijkstra differs from modern programming practice is where he demonstrates a component’s correctness using a proof. That’s pretty far from the mainstream these days – it’s something academics study, but certainly not something that the average web programmer does. (Although, refactoring is pretty closely related in that it’s a series of mechanical steps to transform one program into another without changing what it does.)
I think the reason proofs have largely failed is that few customers or managers understand them. The point is not to convince ourselves but rather to convince a boss or a customer, and a working demo is far more convincing. It reminds me of a famous Knuth quote: “”Beware of bugs in the above code; I have only proved it correct, not tried it.”
On the other hand, I am pretty sympathetic towards making a language easy to prove things in; one of the reasons I like Java is that it seems easy to (informally) prove things to myself about some unfamiliar code I’m reading. But the “hip” programmers aren’t into that – the cool thing to do seems to be to come up with increasingly obscure magic incantations in Ruby.
Is it only my impression or Dijkstra really tries hard to make our profession as boring and mechanistic as it gets, refusing to address creativity, art, intuition and code esthetics :) ?
Most of Dijkstra’s paper sounded quite ancient except for the section by the name of “On a program model”. He did not mean virtual machines in the contemporary sense. In fact, a virtual machine like JVM or .net is actually the anti-thesis of what he was talking about. What Dijkstra was talking about is that every sub-layer is responsible for making the upper layer work. There is currently nothing like it in existance. This is the only section that had my heart beating faster because he was 37 years ahead of him time and still counting as it’s still not available. The simplest way to describe it is emulator on top of emulator, but where each level is finally converted to the actual hardware platform used (where the platform matters not).
Unfortunaly, most of his paper reflected on the complexity that control statements bring. Even with his “input; manipulation; output;” program model, he could not get away from control statements. He was on the verge of a great discovery 37 years ago. He was going in the right direction all those years ago, but could not see the truth.
Had he used his “input; manipulation; output;” program model and left behind the notion of control statements, we would have quite a different world today. The answers were available back then and this paper proves it. Unfortunately, I believe that Dijkstra could not conceive of the “manipulation” part being anything other than using control statements. This is a serious blow to the computing field.
I say now that this is the turning point in history that “should have been”. Instead, we went down the wrong path. I can say conclusively that we are at least 37 years behind the times as it relates to software development. It is sad to see that he had the answers right before his eyes, but could not see it. However, it is unfair to put responsability on one man. No one else has been able to see the truth since… until now. But it’ll take another 30 years to become accepted even when it is finally revealed.
I’d like to suggest that the “imaginary hardware” metaphor is actually highly similar to contemporary software development best practice. Although different writers/methodologies use varying terminology, one can view object-oriented programming from the perspective that each object is an “imaginary computer” with a specific set of responsibilities. A component at one level can be understood by how it utilizes the components “below” it, based on the FACT that each of them discharges its responsibilities (rather than on the MEANS by which they do so).
The extent to which these objects clearly separate the FACT/what from the MEANS/how (AKA interface from implementation, in some OOP approaches) is the extent to which we can replace a flawed component with a correct one, or replace a working component with another that is optimized differently (speed-vs-memory tradeoffs, etc.)
The extent to which these imaginary machines offer more general services, rather than special-cased ones, is the extent to which they are likely to be reusable in solving future problems.
Boris, I must respond to two points, complex addition and black-box testing.
First, a single test cannot possibly suffice to demonstrate the correctness of a method intended to add complex numbers. For example (with apologies for the formatting; apparently WordPress kills leading whitespace):
public class Complex {
public final int real;
public final int imag;
public Complex (int r, int i) {
real = r;
imag = i;
}
public Complex add(Complex c) {
return new Complex(this.real + c.real, 0);
}
}
This implementation is broken, yet it can survive an enormous number of tests (those which use only pure-real values, as well as those which add complex conjugates). A subtler form of breakage is illustrated by:
public Complex add(Complex c) {
short r = (short)(this.real + c.real);
short i = (short)(this.imag + c.imag);
return new Complex(r, i);
}
which works nicely for small enough values, but breaks as soon as sufficiently-large values are introduced.
I think we’d both agree that the flaws above are obvious as soon as one looks at the code (assuming one knows Java, of course ;-). I hope we’d also agree that if one takes a black-box, try-and-see-what-happens approach, it could take a while to find that a problem exists, and even more time to describe the circumstances in which the problem manifests itself.
It is important to remember that Dijkstra was writing in a time where (as one of my collegues put it) “code your brains out and then debug it until it works” was all too common an approach. I believe that reading a representative sample of his work (not just the one or two highly-publicized and controversial pieces) supports the view that Dijkstra was focusing almost entirely on the problems of correct DESIGN. I don’t believe he was oblivious to the rest of the software development process, but I believe that he was working to make demonstrably-correct design the foundation of the rest of the process.
As to black boxes, Dijkstra didn’t advocate of formal mathematical examination of completed designs. Instead, he argued forcefully (perhaps even stridently) in favor of letting the requirements of the proof drive the design decisions. From where I sit, there’s a strong connection to the contemporary practice of test-driven development. While I am aware that there are still significant differences between formal proofs and operational tests, I see a common theme of “let the obligation to verify the end product guide the way it is designed and built”.
Although some of the challenges of computing in the 60s and 70s are not at the top of our list of concerns in 2006, I freely admit that I am inspired and motivated every time I go back and re-consider the visions many of the computing pioneers articulated. In some aspects we’ve come a long way since those days, but in some others we have learned very little.