June 2009. Rewritten in July 2010 (first version).
Assignment Statement Considered Harmful
In his essay "Go To Statement Considered Harmful" Edsger W. Dijkstra
demonstrated how the use of
goto made programming harder. Now,
goto is considered harmful, and has been replaced by more
reasonable constructs. I attempt here to demonstrate the same about
the assignment statement.
This is old news. Any programmer who has been exposed to functional languages and practices knows about that. I just didn't found it formulated in this way, as a direct attack of this seemingly fundamental feature.
What makes a good program
A good program solves your problem, has no error, and is easy to understand and modify.
We humans have certain limitations. The greatest here is our short term memory. We can't work with a whole program. We can only deal with it piece by piece.
Therefore, to be easy to understand, a program must be divided into pieces small enough so they fit in our short term memory. Moreover, each of those pieces must stand alone, or require as little external knowledge as possible (they must be loosely coupled).
Assignment, functions and procedures
Most programming languages are build around two features: the assignment statement and functions.
Functions are very simple: given some parameters, they produce a result, which depends only on the parameters. Same parameters, same result. Note that a function exposes a well defined, and typically small interface: its parameters and its result.
The assignment statement is even simpler. It puts a value in a variable. Note that it introduces the notion of time: before the assignment, the variable holds one value. After, it holds another. This is the most basic form of side effect. (I prefer to say just "effect" because most of the time, we want it.)
With both assignment and functions, we can build procedures. Procedures are like functions, but more capable. Like functions, they take parameters and may return a result. Unlike functions, they can directly interact with the outside world, and have effects beyond their result.
This comes with a price, however: a bigger and less explicit interface. Procedures expose more than just their arguments and result. They may depend on things that can change over time (mutable state), and may mutate state themselves. These additional dependencies are often implicit. For instance, a procedure can take no argument, return no result, yet have loads of implicit dependencies and effects.
The conclusion is obvious: with their smaller and more explicit interface, functions are easier to deal with than procedures. Therefore, procedures should be avoided whenever possible, and insulated otherwise. And so should the assignment statement (for it makes procedures possible).
Pervasive use of the assignment statement also have concrete, readily visible drawbacks: it encourages the confusion between values and variables, makes program analysis and refactoring harder, and can even hurt performance.
Confusing values and variables
The assignment statement is not directly at fault here. Its pervasive use, however, influenced many programming languages and programming courses. This resulted in a confusion akin to the classic confusion of the map and the territory.
Compare these two programs:
(* Ocaml *) │ # most imperative languages let x = ref 1 │ int x = 1 and y = ref 42 │ int y = 42 in x := !y; │ x := y print_int !x │ print(x)
In Ocaml, the assignment statement is discouraged. We can only use it
on "references" (variables). By using the "
ref" keyword, the Ocaml
program makes explicit that
x is a variable, which holds an
integer. Likewise, the "
!" operator explicitly access the value
of a variable. The indirection is explicit.
Imperative languages don't discourage the use of the assignment
statement. For the sake of brevity, they don't explicitly distinguish
values and variables. Disambiguation is made from context: at the
left hand side of assignment statements, "
x" refer to the variable
itself. Elsewhere, it refers to its value. The indirection is
Having this indirection implicit leads to many language abuses. Here,
we might say "
x is equal to 1, then changed to be equal to
Taking this sentence literally would be making three mistakes:
xis a variable. It can't be equal to 1, which is a value (an integer, here). A variable is not the value it contains.
yare not equal, and will never be. They are distinct variables. They can hold the same value, though.
xitself doesn't change. Ever. The value it holds is just replaced by another.
The gap between language abuse and actual misconception is small.
Experts can easily tell a variable from a value, but non-specialists
often don't. That's probably why C pointers are so hard. They
introduce an extra level of indirection. An
int * in C is roughly
equivalent to an
int ref ref in Ocaml (plus pointer arithmetic). If
variables themselves aren't understood, no wonder pointers look like
Program analysis an refactoring
In high school, a definition like "let a = x + 1" meant any occurrence of "a" or "(x + 1)", can be replaced by the other without changing the meaning of what is written. They are equivalent, and therefore substitutable. Imperative programs are more complicated:
int x = 42 ... x := 7 ... print(x);
Q: What does that print?
A: So that prints (looking for the definition of
x) 42! right? Oh, crap, I forgot that
xhas been modified —err, I mean, the value it initially held has been replaced by another. So, that should be 7. But, I don't know this code very well,
xmay be referenced from elsewhere and modified behind my back… Ahhrrg!! (ripping my hair off)
Same problem with refactoring. If
x was immutable, any of its
occurrences could be replaced by 42. Like you would naturally do in
pen-and-paper mathematics. Unfortunately,
x is not immutable.
Tread carefully, and mind your hair.
Remember: using the assignment statement has a cost. When allowed, many assumptions about the program have to be dropped. Algebraic properties are lost. Some transformations don't preserve meaning any more. Think twice before you use it.
Another thing you lose when you allow assignment is sharing. It can matter when you manipulate big containers or other complex data structures. Basically, there are three ways to manipulate a data structure:
Directly modify it (assignment allows that).
Copy the whole thing, then work on the copy.
Share the parts of the old structure that you didn't want to change in the first place.
Each way have its problems. Way 1 is effectively an assignment. This is often most efficient, but, like I said, that's also Bad™. Way 2 wastes unspeakable amounts of time and memory. Way 3 is reasonably efficient, but is wildly unsafe if you ever allow yourself to use way 1 (modifying shared parts is rarely a good idea).
You often need to take snapshots of the state of a big data structure. Maybe you run an algorithm with backtracking. Maybe you produce intermediate results. Maybe you use the same data structure for different purposes. If assignment is allowed, the only safe way to take your snapshot is to perform a deep copy. That can kill performance. If assignment is not allowed, then taking a snapshot is instantaneous, way 3 becomes safe, and the program is overall more efficient.
Learn to avoid or isolate the assignment statement and other (side) effects. Then apply that knowledge.
Try to use better languages. At the very least, use garbage collection whenever you can. Without it, avoiding the assignment statement is hard.
If you have the time, learn a purely functional language —Haskell is the obvious choice. The total absence of the assignment statement is a great teacher.