March 2013

From Imperative to Functional: how to make the leap

Many programmers find functional programming difficult, if not downright cryptic. This is because functional programming is not merely about relinquishing the assignment statement and using closures liberally. It’s a whole different mindset.

The functional mindset

…basically boils down to three things:

Your program isn’t about what it does, but about what things are.
Your code isn’t just a linear sequence of instructions, but a tree of nested expressions.
Functions are ordinary mathematical objects, just like an integer is.

Some notations first

For this tutorial, I won’t be using the syntax of any particular programming language. That would obscure my point. Instead, I will borrow from high school mathematics. Here is an example:

n : ℕ
n = 42

Which means, “let n be a natural number whose value is 42. More generally, when I say

x : T

I mean that x has type T. Basically x is an element of the set T (let’s just assume that types are sets). And when I say

x = Expr

I just mean that x is the value denoted by the expression Expr. Now let’s define a function:

f : ℕ → ℝ
f = x ↦ x + π

Which means “Let f be a function from natural numbers to real numbers, which returns its input plus π”. To make reading easier, we will use this more familiar syntax whenever possible:

f(x) = x + π

I will however insist on using type annotation such as this:

 f : ℕ → ℝ

They’re not central to my point, but they make useful documentation.

Descriptive mindset

Compare the following definitions:

-- imperative     │    -- functional
                  │
square : ℕ → ℕ    │    square : ℕ → ℕ
square(n)         │    square(n) = n * n
{ sq = n * n      │
  return sq       │
}                 │

Even though the two definitions above are strictly equivalent, they are very different in spirit:

The imperative side reads like procedure to follow: “compute n times n, put it in sq, then return sq to whoever called you”. (I’m anthropomorphising square() here, because that’s how the imperative mindset often operates).
The functional code reads like a description: “the square of n is n times n”.

Don’t be fooled by this apparent simplicity. In functional programming the descriptive mindset is mandatory. If you stay stuck into the active mindset, not only will you miss the assignment statement dearly, but the other difficulties (reading order and first class functions), will be almost insurmountable.

Reading order

Functional code is rarely a sequence of instructions that one can just read from top to bottom. To quote Paul Graham’s On Lisp, “an imperative program is a functional program turned inside out”. Before we see an example, I need to introduce a bit of syntax: tuples.

p : ℕ × ℝ
p = (42, π)

As you may have guessed, ℕ × ℝ denotes the Cartesian product. So, let’s see our example (inspired from §3.2 of the book):

-- imperative         │    -- functional
                      │
compute : ℝ×ℝ → ℕ×ℝ   │    compute : ℝ×ℝ → ℕ×ℝ
compute(x,y)          │    compute(x, y) = (42, |x| + 2)
{ abs = |x|           │
  res = abs + 2       │
  return (42, res)    │
}                     │

This silly compute function takes a pair of real numbers as input, and returns a pair of numbers (one natural, one real). Now look at how we can convert the imperative program into the functional one, step by step:

return (42, res)      -- original return statement
return (42, abs + 2)  -- replacing res by its value
return (42, |x| + 2)  -- replacing abs by its value

What we did is take the imperative program, and turned it outside in: we stuffed the expressions on the right hand side of abs and res inside the final expression. So, after the modification, we have this:

compute : ℝ×ℝ → ℕ×ℝ
compute(x,y)
{ abs = |x|              -- we no longer need
  res = abs + 2          -- those two lines
  return (42, |x| + 2)
}

After removing dead code:

compute : ℝ×ℝ → ℕ×ℝ
compute(x,y)
{ return (42, |x| + 2)
}

And now we can use the functional syntax directly:

compute : ℝ×ℝ → ℕ×ℝ
compute(x, y) = (42, |x| + 2)

Now, going back and forth the two programming styles is all well and good, but what does it have to do with reading order?

Simple: the imperative program have an obvious reading order: instruction by instruction, from the first to the last. As we have seen, the first statement in the imperative program corresponds to the inner-most expression in the functional program. Likewise, the last imperative statement corresponds to the outer-most functional expression.

So, the functional program typically reads from the inner-most expressions to the outer-most ones. This is no more difficult than reading a sequence of statements, but it is different, and requires some getting used to.

By the way, I believe this getting used to is a major reason why the ternary operator in C is frowned upon:

// What I would write:    │    // What I often see
int abs(int n)            │    int abs(int n)
{                         │    {
  return n > 0            │      if (n > 0)
       ? n                │        return n;
       : -n;              │      else
}                         │        return -n;
        ┌─────────────────┘    }
        │
        │    // When QA forbids multiple returns
        │    int abs(int n)
        │    {
        │      int result;
        │      if (n > 0)
        │        result = n;
        │      else
        │        result -n;
        │      return result;
        │    }
        │
        │    // When QA forbids brace-less ifs
        │    ...

The code on the left shows my slant towards functional programming, and can’t really be read as a sequence of statements. The program on the right, on the other hand, is clearly a set of instructions of the familiar form “if this, then do that, otherwise do those”.

There are other differences in style:

// What I sometimes see    │    // What I would write
if (cond_1)                │    if (cond_1 && cond_2)
{                          │    {
  if (cond_2)              │      // some code
  {                        │    }
    // some code           │
  }                        │
}                          │

I know some of you will really believe the code on the left is more readable than the code on the right. It is not. While each separate line on the left is at least as readable as the corresponding line on the right, there are more of them. Overall, the structure of both listings have the same complexity: we just trade an if statement for a boolean expression. Nevertheless, I think writing the version on the left is a mistake, for two reasons:

The syntax is significantly heavier in many languages.
The control flow is is more obvious with one if instead of two.

And then there are outright mistakes:

// What I sometimes see    │    // What I'd write
bool flag = false;         │    bool flag = (userInput == "yes");
if (userInput == "yes")    │
  flag = true;             │

A cursory look tells me there is just no way the code on the left is more readable. It has an initialisation, an if statement, a relational expression, and an assignment statement. The code on the right only has an initialisation and a relational expression. Yet some programmers still prefer the code on the left. How so?

Well, one still cannot read the code on the right from top to bottom without thinking. Again, that requires some getting used to. But there is another “problem”: this code actually goes meta.

While booleans are first class values in nearly every language I know of, they are rarely used as such. For instance, booleans are often not thought of as truth values even when it is warranted:

// What I often see    │    // What I would write
if (flag == true)      │    if (flag)

Many people would tell you that the listings on the left are more readable, even though they’re strictly more complex. The reason is, they mentally separate the world of propositions, used for branching and looping, and the world of ordinary values, used for more ordinary calculations. They don’t think of truth (or lack thereof) as an ordinary value. But believe me, when you do think of truth values as something as ordinary as integers, your code just gets a bit simpler:

                       -- naive approach

imply : (bool × bool) → bool    │    imply : (bool × bool) → bool
imply(A, B)                     │    imply(A, B) =
{ if A = false                  │      if A = false
  then return true              │      then true
  if B = true                   │      else if B = true
  then return true              │           then true
  else return false             │           else false
}                               │

                     -- first class booleans

imply : (bool × bool) → bool    │    imply : (bool × bool) → bool
imply(A, B)                     │    imply(A, B) = impl
{                               │      with impl = nulA ∨ B
   nulA = ¬A                    │           nulA = ¬A
   impl = nulA ∨ B              │
   return impl                  │
}                               │

                   -- full functional approach

imply : (bool × bool) → bool    │    imply : (bool × bool) → bool
imply(A, B)                     │    imply(A, B) = ¬A ∨ B
{  return ¬A ∨ B                │
}                               │

(I hope we do agree that the programs at the bottom are by far the more readable ones.)

I digress, but I needed to introduce meta thinking and first class booleans before I get to the real meat.

First class functions

If there is one thing in functional programming that sends imperative minds running and screaming into the night, it’s those.

Don’t panic.

The trick is to remember that a function is a mathematical object like any other. You may think it is not, most probably because, unlike a number or a set, a function is not a passive piece of data: when you give it an input, it produces a result.

Nevertheless, functions can be seen as a passive piece of data. See this?

Well, in this image the function, (which happens to map objects to their colours) is just the set of arrows. A set. If you want more precision, a function f from X to Y is a subset of the Cartesian product X × Y, with the property that For each x, there is one and only one y such that the pair (x, y) is an element of f. If not just envision the graph of a function: it describes the function completely, yet is a completely passive object.

So, functions are “ordinary” mathematical objects. Yet, many programming languages don’t treat them as such. Mostly for ease of implementation, functions in programming languages are subject to various restrictions. In those languages, you may not be able to:

give a function as a parameter to another function;
write expressions that describe functions directly;
returning a function as the result of a function call.

Making functions “first class” is all about lifting those restrictions, so you can use them as freely as you would an integer. The question is, what are the practical implications?

First, let’s begin with three examples (one for each of the restrictions above)

functional parameters

apply : ((ℕ → ℝ) × ℕ) → ℝ
apply(f, n) = f(n)

This is a re-implementation of function application, which uses the built-in one. I know it looks contrived, but I swear it does have its uses. So, apply takes two arguments:

a function from integers to real numbers, and
an integer.

With those, it return an integer.

(The astute reader would have noted that there are no mention of integers or real numbers in the definition of apply. Indeed, it could work on any type A and B instead. We are not ready however to discuss system F just yet, so I will just pretend parametric polymorphism doesn’t exist.)

Using apply() is easy:

f : ℕ → ℝ
f(x) = x + π

a : ℝ
a = apply(f, 42)  -- now, the number a equals 42 + π

Simplicity itself.

literal function

We usually use the term anonymous function, because it’s a function without a name. I don’t like the term, because it gives you the hint that functions without a name are the exception rather than the rule. Hence the term “literal”, as in “literal string”. Strings can be written directly?

"Hello World!"

Well, so can functions:

x ↦ x + π

This is the same syntax I used to define f at the beginning of this tutorial. I can use it to simplify my example on the usage of apply above:

a : ℝ
a = apply(x ↦ x + π, 42)

I can also apply the function directly, and still have the same result.

a : ℝ
a = (x ↦ x + π)(42)

Though I concede that at that point, I might as well write

a : ℝ
a = 42 + π

directly. But in practice, functions can be given as argument to more complicated functions than apply, so simplifications such as this one aren’t always possible.

returning functions

Here is another example from high school: function composition:

∘ : ((ℚ → ℝ) × (ℕ → ℚ)) → (ℕ → ℝ)
g ∘ f = (x ↦ g(f(x)))

(Let’s pretend my language can define infix operators. And again, function composition can be more generic than that.)

Now, can you guess the value of this expression?

((x ↦ x + π) ∘ (x ↦ x ÷ 2)) (13)

To know this, we will have to perform step by step reduction. First, let’s use the ∘ operator as a prefix function, instead of an infix operator:

(∘) ((x ↦ x + π), (x ↦ x ÷ 2)) (13)

Then, we replace (°) by its value:

((g, f) ↦ (x ↦ g(f(x)))) ((x ↦ x + π), (x ↦ x ÷ 2)) (13)

(This is starting to look seriously unreadable…) Then, we start doing actual substitutions: first we replace g and f by the values of the corresponding call parameters (with a bit of renaming so names don’t clash):

(x ↦ (z ↦ z + π) ((y ↦ y ÷ 2) (x))) (13)
(z ↦ z + π) ((y ↦ y ÷ 2) (13))
(z ↦ z + π) (13 ÷ 2)
(13 ÷ 2) + π

I could also evaluate the body of (∘) first. It makes it more obvious what function composition actually does:

(x ↦ (z ↦ z + π) ((y ↦ y ÷ 2) (x))) (13)
(x ↦ (z ↦ z + π) (x ÷ 2)) (13)
(x ↦ (x ÷ 2) + π) (13)
(13 ÷ 2) + π

Compare the last line with the original expression:

((x ↦ x + π) ∘ (x ↦ x ÷ 2)) (13)

That was a lot of complications for such a simple result, wasn’t it? Actually, that’s kinda the point: you don’t want to do yourself what the compiler could have done for you. Conceptually, chaining functions together isn’t that complicated. This for instance:

h ∘ g ∘ f

is a function that works by first applying f, then g to the result of f, then h to the result of g. Without the composition operator however, I would have to write:

x ↦ h(g(f(x))

which is arguably much less readable. That would be a shame, since functional programming is often about chaining functions together.

The point of it all

Passing functions around is cool and all, but it looks like a heap of trouble. Why would we do that in our day to day work?

The answer is, we already do.

So called object oriented languages such as C++, Java, or Python have this nice thing called “subtype polymorphism”, which means that when you pass around an object, its actual behaviour may not be determined at compile time. This is achieved through inheritance and method overriding. This can be a great mechanism for code reuse, but it’s quite cumbersome.

The functional approach is more direct. Instead of manipulating whole objects, we pass around the behaviour (a function) directly. With syntactic support from the language, this is way more straightforward, and enables a number of powerful idioms.

One can for instance write one’s own loops. Let’s say for instance you want to process lists. Well, my language doesn’t have lists, so we’ll work with pairs instead:

inc-all : ℝ×ℝ → ℝ×ℝ
inc-all(x, y) = (1 + x, y + 1)

dbl-all : ℝ×ℝ → ℝ×ℝ
dbl-all(x, y) = (2 × x, y × 2)

Simple, but not simple enough. There’s quite a bit of duplication here. We need to factor out these commonalities:

map : ((ℝ → ℝ) × (ℝ×ℝ)) → ℝ×ℝ
map(f, (x, y)) = (f(x), f(y))

As you may have noticed, the first argument of map is a function (from numbers to numbers). And now we can redefine inc-all and dbl-all in a simpler fashion:

inc-all : ℝ×ℝ → ℝ×ℝ
inc-all(p) = map ((x ↦ 1 + x), p)

dbl-all : ℝ×ℝ → ℝ×ℝ
dbl-all(p) = map ((x ↦ 2 × x), p)

Well, I admit it’s not exactly simpler. Nevertheless, being able to give a function argument to map did allow us to repeat ourselves less. It can still be improved, though. Right now, inc-all and dbl-all have the following structure:

f(x) = map(g, x)

This x is line noise. I’d like to be able to remove it. Like this:

f = map(g)

To do that, I need not only to have a function argument, but also to return a function result. So let’s redefine map:

map : (ℝ → ℝ) → (ℝ×ℝ → ℝ×ℝ)
map(f) = (x, y) ↦ (f(x), f(y))

Now I can write:

inc-all : ℝ×ℝ → ℝ×ℝ
inc-all = map(x ↦ 1 + x)

dbl-all : ℝ×ℝ → ℝ×ℝ
dbl-all = map(x ↦ 2 × x)

Better, isn’t it? Still, this is not enough. See the literal functions? Their structures are the same, the only thing that changes is the operator. Now we have seen with function composition that operators are actually functions with two arguments —or functions with a pair argument. We can take advantage of that:

curry2 : (ℝ×ℝ → ℝ) → (ℝ → (ℝ → ℝ))
curry2(f) : x ↦ (y ↦ f(x, y))

inc-all : ℝ×ℝ → ℝ×ℝ
inc-all = map(curry2(+)(1))

dbl-all : ℝ×ℝ → ℝ×ℝ
dbl-all = map(curry2(×)(2))

Now we did factor out everything.

on syntax

The last listing still leave me a bit uneasy. We can do better.

The thing is, we’re trying to do functional programming here. Which means we’re mostly calling functions. And whatever we do the most should incur the least cognitive load possible. In programming languages, this means using lightest possible syntax. My language does not do that:

f(x)  -- Function call in my language and high school math.
f x   -- Lightest possible syntax: nothing at all.

If I change my language to use the better syntax, then the definitions above will change a little:

map : (ℝ → ℝ) → ℝ×ℝ → ℝ×ℝ     -- arrows associate to the right
map f = (x, y) ↦ (f(x), f(y))

curry2 : (ℝ×ℝ → ℝ) → ℝ → ℝ → ℝ
curry2(f) = x ↦ y ↦ f(x, y)    -- arrows associate to the right

inc-all : ℝ×ℝ → ℝ×ℝ
inc-all = map (curry2 (+) 1)   -- function application associates
                          -- to the left.
dbl-all : ℝ×ℝ → ℝ×ℝ
dbl-all = map (curry2 (×) 2)

Better, but frankly, not much. But we’re not finished. Remember that I wrote map and curry2 to factor my code as much as possible. This suggests that functions that are in “curried” form in the first place:

f : ℝ → ℝ → ℝ
f = x → y → whatever

f x y = whatever     -- short hand

are more useful than functions that are in “un-curried” form:

f : ℝ×ℝ → ℝ
f(x, y) = whatever

(Now you know why curry2 is named the way it is: it “curryfies” functions with two arguments.) So, I will make one last change to my language: operators are now curried by default:

map : (ℝ → ℝ) → ℝ×ℝ → ℝ×ℝ
map f (x, y) = (f(x), f(y))

inc-all : ℝ×ℝ → ℝ×ℝ
inc-all = map ((+) 1)

dbl-all : ℝ×ℝ → ℝ×ℝ
dbl-all = map ((×) 2)

I don’t even need to define curry2.

My point is, some languages use juxtaposition for function calls, and their standard libraries are curried by default. The most notable examples are ML and Haskell. The advantage is extreme conciseness in many situations. It can be quite cryptic however if you’re not prepared for this.

The regular syntax for function calls however is more common, so I used it in the hope to prepare you for functional code in JavaScript or such.