@ Loup's Impossible? Like that would stop me.

June 2015

What is good code?

Good code is cheap code that meets our needs. The cheaper the better.

Well, assuming code is just a means to an end. Sometimes we want to enjoy the code for itself. Most of the time however, the overriding concern is the bottom line.

I could almost stop right there, but I feel like I should elaborate.

Good (enough) programs

There is the code, and there is the program. Code is read by a human. Programs are interpreted by computers (or virtual machines).

Programs have many requirements, most of which are domain specific. Some requirements however are universal:

We don’t always think of these requirements explicitly. Still, they must be met, and they have a cost. Want more performance or waste less resources? Your code will be more complex. Want less bugs? You will have to spend more time shaking them out.

Once a program does meet all its requirements however, it is good enough. There is little point in trying to make it even better. With that goal set, the only thing left to optimise is the cost of code.

Code as a dependency graph

Before we can hope to estimate the cost of code, we must understand a few things about its structure.

Basically, your code is made up of little chunks that can fit in your head (if it didn’t, you couldn’t have written it in the first place). Most of the time, a chunk is a function —or a method, or a procedure. Each such chunk is a node of your graph. And their dependencies form the edges.

Like in any directed graph, when you have N nodes, you can have up to N² edges. This has important consequences when estimating the cost of code. The number of edges, relative to the number of nodes is called the density of the graph. Dense graphs have many edges (close to N²). Sparse graphs have relatively few edges (close to N).

Each chunk of code have 2 levels of understanding: interface, and implementation.

A few caveats, however:

The cost of code

Basically, code costs however much time we spend on it. I personally break it down to 3 activities:

Some would talk about development, maintenance, testing… But this is not an interesting way to break things down. Development and maintenance have a lot in common. Even testing involves writing and reading code. And all three activities involve typing, understanding, and coordination.

Typing

It is generally admitted that we spend much more time thinking about our code than we do typing it down. I still find it interesting however, because it provides an obvious lower bound. Code cannot cost less than the time taken to type it.

Imagine a 20,000 lines program. By current standards, this is not a big program. If you were to print it, it would fit in a medium-sized book: 400 pages, 50 lines per page. Prose would typically have about 12 words per line, but lines of code are shorter. Let’s say 6 words per line. That’s 120,000 words.

Assuming you type 50 words per minute (a fair speed), typing it all down would take 40 hours. A full work week of mindless typing, and the program is not even “big”. (Our perception of “big” is insane.)

Understanding

You can’t write code randomly. You need to know what you’re doing. More specifically, you need to understand three things:

New code

The depth of understanding required to write new code is significant. This is going to take longer than 50 words per minute. Those 20,000 lines aren’t going to write themselves in a week. Nevertheless, assuming you work on this code piecemeal (it is impossible not to), the time taken to understand new code is still proportional to the length of that code.

Oh, right. Length.

Intuitively, the time required to understand a piece of code is proportional to its complexity, not its length. Length, measured in lines of code, is an incredibly crude proxy. But it works. It is strongly correlated with most complexity measures we came up with so far, and those don’t have more predictive power than length alone. For instance, if you know a program’s length, learning its cyclomatic complexity won’t tell you anything more about things like time to completion or number of bugs.

Besides a few exceptions, complexity and length are roughly proportional. Of two chunks of code solving the same problem, if one is twice as big, it is probably twice as complex. Roughly. Amazingly, this heuristic works across languages. Terser languages make the code cheaper. (Again, we may have some exceptions).

Existing code

Any new code you write will use, or be used by, existing code. You need to understand some of the old code before you write anything new. Unless you’re just starting your project, but you won’t start forever.

The ideal case is when you work alone, and everything you have written so far is still fresh in your mind. Understanding it again costs you nothing.

Well, that never happens. You forget about code you have written long ago. You don’t work alone, and must understand code others have written. Or maybe you arrived late in the project. Now the density of your dependency graph matters a great deal:

Prerequisites

This one is highly context specific. Common knowledge costs nothing, because everyone knows it (by definition), and some knowledge is required to merely understand the problem.

Some background knowledge however relates to how you solve your problem. There are different ways to write a program, and some are more… esoteric than others. Take for instance that little metacompiler I have written in Haskell. Very handy when you need to parse some textual data format. On the other hand, you will need to know about top-down parsing, parsing expression grammars, parser combinators, monads, applicative functors… are you sure you want to learn all that just to parse some text? By the way, I no longer maintain this tool.

Required background knowledge is the reason why lambdas and recursion are often frowned upon in mainstream settings. The typical OOP programmer is not used to this eldritch stuff from FP hell. Not yet, at least.

I don’t have a magic bullet for this problem. If you don’t know some useful concept or external tool, you can’t use it unless you spend time to learn it. Good luck with the cost-benefit analysis.

Coordination

(I have worked in several teams, but never lead one. Take this section with a grain of salt.)

Most of the time, you will work in a team. You have to. Most programmers can’t do everything a full system requires. Domain specific algorithms, user interface, network, database, security… Even if you’re one of those miraculous full stack experts, you probably won’t have the time to code it by yourself.

Hence coordination. In the worst case, everyone communicates with everyone else all the time. The cost of that overhead is quadratic with respect to the size of the team. Not a problem with 4 people. Quite a mess with 40. In practice, when teams grow large enough, two things inevitably happen:

(If neither happens, communication overhead explodes and little gets done.)

How that relates to code is explained by Conway’s law:

organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations

This works both ways. The organisation of your team will shape the code it produces, and the code you ask of your team will shape its organisation. You just can’t build a big monolith with separate sub-teams. Either the teams will communicate a lot (effectively merging themselves), or they will come up with a more modular design.

Driving the costs down

From the above we can deduce 4 levers to reduce the cost of code:

Pretty standard stuff. I just want to stress one thing: those 4 levers are probably the only ones that matter. Found a new technique, a new language, a new methodology? It has to do one of those:

Otherwise it won’t reduce your costs.