March 2013
From Imperative to Functional: how to make the leap
Many programmers find functional programming difficult, if not downright cryptic. This is because functional programming is not merely about relinquishing the assignment statement and using closures liberally. It’s a whole different mindset.
The functional mindset
…basically boils down to three things:
- Your program isn’t about what it does, but about what things are.
- Your code isn’t just a linear sequence of instructions, but a tree of nested expressions.
- Functions are ordinary mathematical objects, just like an integer is.
Some notations first
For this tutorial, I won’t be using the syntax of any particular programming language. That would obscure my point. Instead, I will borrow from high school mathematics. Here is an example:
n : ℕ
n = 42
Which means, “let n be a natural number whose value is 42. More generally, when I say
x : T
I mean that x
has type T
. Basically
x
is an element of the set T
(let’s
just assume that types are sets). And when I say
x = Expr
I just mean that x
is the value denoted by the
expression Expr
. Now let’s define a function:
f : ℕ → ℝ
f = x ↦ x + π
Which means “Let f be a function from natural numbers to real numbers, which returns its input plus π”. To make reading easier, we will use this more familiar syntax whenever possible:
f(x) = x + π
I will however insist on using type annotation such as this:
f : ℕ → ℝ
They’re not central to my point, but they make useful documentation.
Descriptive mindset
Compare the following definitions:
-- imperative │ -- functional
│
square : ℕ → ℕ │ square : ℕ → ℕ
square(n) │ square(n) = n * n
{ sq = n * n │
return sq │
} │
Even though the two definitions above are strictly equivalent, they are very different in spirit:
The imperative side reads like procedure to follow: “compute n times n, put it in sq, then return sq to whoever called you”. (I’m anthropomorphising
square()
here, because that’s how the imperative mindset often operates).The functional code reads like a description: “the square of n is n times n”.
Don’t be fooled by this apparent simplicity. In functional programming the descriptive mindset is mandatory. If you stay stuck into the active mindset, not only will you miss the assignment statement dearly, but the other difficulties (reading order and first class functions), will be almost insurmountable.
Reading order
Functional code is rarely a sequence of instructions that one can just read from top to bottom. To quote Paul Graham’s On Lisp, “an imperative program is a functional program turned inside out”. Before we see an example, I need to introduce a bit of syntax: tuples.
p : ℕ × ℝ
p = (42, π)
As you may have guessed, ℕ × ℝ denotes the Cartesian product. So, let’s see our example (inspired from §3.2 of the book):
-- imperative │ -- functional
│
compute : ℝ×ℝ → ℕ×ℝ │ compute : ℝ×ℝ → ℕ×ℝ
compute(x,y) │ compute(x, y) = (42, |x| + 2)
{ abs = |x| │
res = abs + 2 │
return (42, res) │
} │
This silly compute
function takes a pair of real numbers
as input, and returns a pair of numbers (one natural, one real). Now
look at how we can convert the imperative program into the functional
one, step by step:
return (42, res) -- original return statement
return (42, abs + 2) -- replacing res by its value
return (42, |x| + 2) -- replacing abs by its value
What we did is take the imperative program, and turned it outside in:
we stuffed the expressions on the right hand side of abs
and res
inside the final expression. So, after the
modification, we have this:
compute : ℝ×ℝ → ℕ×ℝ
compute(x,y)
{ abs = |x| -- we no longer need
res = abs + 2 -- those two lines
return (42, |x| + 2)
}
After removing dead code:
compute : ℝ×ℝ → ℕ×ℝ
compute(x,y)
{ return (42, |x| + 2)
}
And now we can use the functional syntax directly:
compute : ℝ×ℝ → ℕ×ℝ
compute(x, y) = (42, |x| + 2)
Now, going back and forth the two programming styles is all well and good, but what does it have to do with reading order?
Simple: the imperative program have an obvious reading order: instruction by instruction, from the first to the last. As we have seen, the first statement in the imperative program corresponds to the inner-most expression in the functional program. Likewise, the last imperative statement corresponds to the outer-most functional expression.
So, the functional program typically reads from the inner-most expressions to the outer-most ones. This is no more difficult than reading a sequence of statements, but it is different, and requires some getting used to.
By the way, I believe this getting used to is a major reason why the ternary operator in C is frowned upon:
// What I would write: │ // What I often see
int abs(int n) │ int abs(int n)
{ │ {
return n > 0 │ if (n > 0)
? n │ return n;
: -n; │ else
} │ return -n;
┌─────────────────┘ }
│
│ // When QA forbids multiple returns
│ int abs(int n)
│ {
│ int result;
│ if (n > 0)
│ result = n;
│ else
│ result -n;
│ return result;
│ }
│
│ // When QA forbids brace-less ifs
│ ...
The code on the left shows my slant towards functional programming, and can’t really be read as a sequence of statements. The program on the right, on the other hand, is clearly a set of instructions of the familiar form “if this, then do that, otherwise do those”.
There are other differences in style:
// What I sometimes see │ // What I would write
if (cond_1) │ if (cond_1 && cond_2)
{ │ {
if (cond_2) │ // some code
{ │ }
// some code │
} │
} │
I know some of you will really believe the code on the left is more
readable than the code on the right. It is not. While each separate line
on the left is at least as readable as the corresponding line
on the right, there are more of them. Overall, the structure of both
listings have the same complexity: we just trade an if
statement for a boolean expression. Nevertheless, I think writing the
version on the left is a mistake, for two reasons:
- The syntax is significantly heavier in many languages.
- The control flow is is more obvious with one
if
instead of two.
And then there are outright mistakes:
// What I sometimes see │ // What I'd write
bool flag = false; │ bool flag = (userInput == "yes");
if (userInput == "yes") │
flag = true; │
A cursory look tells me there is just no way the code on the left is
more readable. It has an initialisation, an if
statement, a
relational expression, and an assignment statement. The code on the
right only has an initialisation and a relational expression. Yet some
programmers still prefer the code on the left. How so?
Well, one still cannot read the code on the right from top to bottom without thinking. Again, that requires some getting used to. But there is another “problem”: this code actually goes meta.
While booleans are first class values in nearly every language I know of, they are rarely used as such. For instance, booleans are often not thought of as truth values even when it is warranted:
// What I often see │ // What I would write
if (flag == true) │ if (flag)
Many people would tell you that the listings on the left are more readable, even though they’re strictly more complex. The reason is, they mentally separate the world of propositions, used for branching and looping, and the world of ordinary values, used for more ordinary calculations. They don’t think of truth (or lack thereof) as an ordinary value. But believe me, when you do think of truth values as something as ordinary as integers, your code just gets a bit simpler:
-- naive approach
imply : (bool × bool) → bool │ imply : (bool × bool) → bool
imply(A, B) │ imply(A, B) =
{ if A = false │ if A = false
then return true │ then true
if B = true │ else if B = true
then return true │ then true
else return false │ else false
} │
-- first class booleans
imply : (bool × bool) → bool │ imply : (bool × bool) → bool
imply(A, B) │ imply(A, B) = impl
{ │ with impl = nulA ∨ B
nulA = ¬A │ nulA = ¬A
impl = nulA ∨ B │
return impl │
} │
-- full functional approach
imply : (bool × bool) → bool │ imply : (bool × bool) → bool
imply(A, B) │ imply(A, B) = ¬A ∨ B
{ return ¬A ∨ B │
} │
(I hope we do agree that the programs at the bottom are by far the more readable ones.)
I digress, but I needed to introduce meta thinking and first class booleans before I get to the real meat.
First class functions
If there is one thing in functional programming that sends imperative minds running and screaming into the night, it’s those.
Don’t panic.
The trick is to remember that a function is a mathematical object like any other. You may think it is not, most probably because, unlike a number or a set, a function is not a passive piece of data: when you give it an input, it produces a result.
Nevertheless, functions can be seen as a passive piece of data. See this?
Well, in this image the function, (which happens to map objects to
their colours) is just the set of arrows. A set. If you want more precision, a function
f
from X
to Y
is a subset of the
Cartesian product X × Y
, with the property that For each
x
, there is one and only one y
such that the
pair (x, y)
is an element of f
. If not just
envision the graph of a function: it describes the function
completely, yet is a completely passive object.
So, functions are “ordinary” mathematical objects. Yet, many programming languages don’t treat them as such. Mostly for ease of implementation, functions in programming languages are subject to various restrictions. In those languages, you may not be able to:
- give a function as a parameter to another function;
- write expressions that describe functions directly;
- returning a function as the result of a function call.
Making functions “first class” is all about lifting those restrictions, so you can use them as freely as you would an integer. The question is, what are the practical implications?
First, let’s begin with three examples (one for each of the restrictions above)
functional parameters
apply : ((ℕ → ℝ) × ℕ) → ℝ
apply(f, n) = f(n)
This is a re-implementation of function application, which uses the
built-in one. I know it looks contrived, but I swear it does have its
uses. So, apply
takes two arguments:
- a function from integers to real numbers, and
- an integer.
With those, it return an integer.
(The astute reader would have noted that there are no mention of
integers or real numbers in the definition of apply
.
Indeed, it could work on any type A
and B
instead. We are not ready however to discuss system
F just yet, so I will just pretend parametric polymorphism doesn’t exist.)
Using apply()
is easy:
f : ℕ → ℝ
f(x) = x + π
a : ℝ
a = apply(f, 42) -- now, the number a equals 42 + π
Simplicity itself.
literal function
We usually use the term anonymous function, because it’s a function without a name. I don’t like the term, because it gives you the hint that functions without a name are the exception rather than the rule. Hence the term “literal”, as in “literal string”. Strings can be written directly?
"Hello World!"
Well, so can functions:
x ↦ x + π
This is the same syntax I used to define f
at the
beginning of this tutorial. I can use it to simplify my example on the
usage of apply
above:
a : ℝ
a = apply(x ↦ x + π, 42)
I can also apply the function directly, and still have the same result.
a : ℝ
a = (x ↦ x + π)(42)
Though I concede that at that point, I might as well write
a : ℝ
a = 42 + π
directly. But in practice, functions can be given as argument to more
complicated functions than apply
, so simplifications such
as this one aren’t always possible.
returning functions
Here is another example from high school: function composition:
∘ : ((ℚ → ℝ) × (ℕ → ℚ)) → (ℕ → ℝ)
g ∘ f = (x ↦ g(f(x)))
(Let’s pretend my language can define infix operators. And again, function composition can be more generic than that.)
Now, can you guess the value of this expression?
((x ↦ x + π) ∘ (x ↦ x ÷ 2)) (13)
To know this, we will have to perform step by step reduction. First, let’s use the ∘ operator as a prefix function, instead of an infix operator:
(∘) ((x ↦ x + π), (x ↦ x ÷ 2)) (13)
Then, we replace (°)
by its value:
((g, f) ↦ (x ↦ g(f(x)))) ((x ↦ x + π), (x ↦ x ÷ 2)) (13)
(This is starting to look seriously unreadable…) Then, we start doing
actual substitutions: first we replace g
and f
by the values of the corresponding call parameters (with a bit of
renaming so names don’t clash):
(x ↦ (z ↦ z + π) ((y ↦ y ÷ 2) (x))) (13)
(z ↦ z + π) ((y ↦ y ÷ 2) (13))
(z ↦ z + π) (13 ÷ 2)
(13 ÷ 2) + π
I could also evaluate the body of (∘)
first. It makes it
more obvious what function composition actually does:
(x ↦ (z ↦ z + π) ((y ↦ y ÷ 2) (x))) (13)
(x ↦ (z ↦ z + π) (x ÷ 2)) (13)
(x ↦ (x ÷ 2) + π) (13)
(13 ÷ 2) + π
Compare the last line with the original expression:
((x ↦ x + π) ∘ (x ↦ x ÷ 2)) (13)
That was a lot of complications for such a simple result, wasn’t it? Actually, that’s kinda the point: you don’t want to do yourself what the compiler could have done for you. Conceptually, chaining functions together isn’t that complicated. This for instance:
h ∘ g ∘ f
is a function that works by first applying f
, then
g
to the result of f
, then h
to
the result of g
. Without the composition operator however,
I would have to write:
x ↦ h(g(f(x))
which is arguably much less readable. That would be a shame, since functional programming is often about chaining functions together.
The point of it all
Passing functions around is cool and all, but it looks like a heap of trouble. Why would we do that in our day to day work?
The answer is, we already do.
So called object oriented languages such as C++, Java, or Python have this nice thing called “subtype polymorphism”, which means that when you pass around an object, its actual behaviour may not be determined at compile time. This is achieved through inheritance and method overriding. This can be a great mechanism for code reuse, but it’s quite cumbersome.
The functional approach is more direct. Instead of manipulating whole objects, we pass around the behaviour (a function) directly. With syntactic support from the language, this is way more straightforward, and enables a number of powerful idioms.
One can for instance write one’s own loops. Let’s say for instance you want to process lists. Well, my language doesn’t have lists, so we’ll work with pairs instead:
inc-all : ℝ×ℝ → ℝ×ℝ
inc-all(x, y) = (1 + x, y + 1)
dbl-all : ℝ×ℝ → ℝ×ℝ
dbl-all(x, y) = (2 × x, y × 2)
Simple, but not simple enough. There’s quite a bit of duplication here. We need to factor out these commonalities:
map : ((ℝ → ℝ) × (ℝ×ℝ)) → ℝ×ℝ
map(f, (x, y)) = (f(x), f(y))
As you may have noticed, the first argument of map is a function
(from numbers to numbers). And now we can redefine inc-all
and dbl-all
in a simpler fashion:
inc-all : ℝ×ℝ → ℝ×ℝ
inc-all(p) = map ((x ↦ 1 + x), p)
dbl-all : ℝ×ℝ → ℝ×ℝ
dbl-all(p) = map ((x ↦ 2 × x), p)
Well, I admit it’s not exactly simpler. Nevertheless, being able to
give a function argument to map
did allow us to repeat
ourselves less. It can still be improved, though. Right now,
inc-all
and dbl-all
have the following
structure:
f(x) = map(g, x)
This x
is line noise. I’d like to be able to remove it.
Like this:
f = map(g)
To do that, I need not only to have a function argument, but also to
return a function result. So let’s redefine map
:
map : (ℝ → ℝ) → (ℝ×ℝ → ℝ×ℝ)
map(f) = (x, y) ↦ (f(x), f(y))
Now I can write:
inc-all : ℝ×ℝ → ℝ×ℝ
inc-all = map(x ↦ 1 + x)
dbl-all : ℝ×ℝ → ℝ×ℝ
dbl-all = map(x ↦ 2 × x)
Better, isn’t it? Still, this is not enough. See the literal functions? Their structures are the same, the only thing that changes is the operator. Now we have seen with function composition that operators are actually functions with two arguments —or functions with a pair argument. We can take advantage of that:
curry2 : (ℝ×ℝ → ℝ) → (ℝ → (ℝ → ℝ))
curry2(f) : x ↦ (y ↦ f(x, y))
inc-all : ℝ×ℝ → ℝ×ℝ
inc-all = map(curry2(+)(1))
dbl-all : ℝ×ℝ → ℝ×ℝ
dbl-all = map(curry2(×)(2))
Now we did factor out everything.
on syntax
The last listing still leave me a bit uneasy. We can do better.
The thing is, we’re trying to do functional programming here. Which means we’re mostly calling functions. And whatever we do the most should incur the least cognitive load possible. In programming languages, this means using lightest possible syntax. My language does not do that:
f(x) -- Function call in my language and high school math.
f x -- Lightest possible syntax: nothing at all.
If I change my language to use the better syntax, then the definitions above will change a little:
map : (ℝ → ℝ) → ℝ×ℝ → ℝ×ℝ -- arrows associate to the right
map f = (x, y) ↦ (f(x), f(y))
curry2 : (ℝ×ℝ → ℝ) → ℝ → ℝ → ℝ
curry2(f) = x ↦ y ↦ f(x, y) -- arrows associate to the right
inc-all : ℝ×ℝ → ℝ×ℝ
inc-all = map (curry2 (+) 1) -- function application associates
-- to the left.
dbl-all : ℝ×ℝ → ℝ×ℝ
dbl-all = map (curry2 (×) 2)
Better, but frankly, not much. But we’re not finished. Remember that
I wrote map
and curry2
to factor my code as
much as possible. This suggests that functions that are in “curried”
form in the first place:
f : ℝ → ℝ → ℝ
f = x → y → whatever
f x y = whatever -- short hand
are more useful than functions that are in “un-curried” form:
f : ℝ×ℝ → ℝ
f(x, y) = whatever
(Now you know why curry2
is named the way it is: it
“curryfies” functions with two arguments.) So, I will make one last
change to my language: operators are now curried by default:
map : (ℝ → ℝ) → ℝ×ℝ → ℝ×ℝ
map f (x, y) = (f(x), f(y))
inc-all : ℝ×ℝ → ℝ×ℝ
inc-all = map ((+) 1)
dbl-all : ℝ×ℝ → ℝ×ℝ
dbl-all = map ((×) 2)
I don’t even need to define curry2
.
My point is, some languages use juxtaposition for function calls, and their standard libraries are curried by default. The most notable examples are ML and Haskell. The advantage is extreme conciseness in many situations. It can be quite cryptic however if you’re not prepared for this.
The regular syntax for function calls however is more common, so I used it in the hope to prepare you for functional code in JavaScript or such.