February 2011
Class-based Programming as Syntactic Sugar
Class based programming in statically typed languages like Java and C++ is syntactic sugar over functions, records, and closures. It means that a good deal of class based programming is naturally done even in C, and the rest of it can easily be emulated in ML.
This tutorial will show how.
Defining class based programming
I basically mean classes, inheritance, and subtype polymorphism, in statically type languages. I do not mean genericity which, while tremendously usefull, is orthogonal to class based programming (It wasn’t implemented in Java for a long time, after all).
Classes
A class is a record. Like with a C struct
, an instance
of a class (let’s call it “object”) contains a bunch of “data” and a
bunch of “methods”. It is implied that a piece of data can be anything
but a function. Methods are just functions (whose first argument, the
object they come from, is usually implicit).
Typically, the methods are defined when the class itself is defined. The data, on the other hand, is typically defined at instantiation time, and is often modified after that.
Inheritance
This is a way of defining a class in terms of another class. A derived class is identical to the base class it is build from, except that:
- It may specify additional data and methods.
- It may specify that some methods are different.
There is also the notion of subtyping, but I’ll sweep that under the rug, because subtyping is only interesting for…
subtype polymorphism
Subtype polymorphism is when you use an object of type A, but its actual methods might be those of a derived class of A. This is useful because one often want flexibility about the behaviour (methods) of an object, as well as about its content (data).
abstraction and encapsulation
I didn’t mention them as necessary components for class based programming, but some programmers insist they are. They’re not:
Abstraction is not a fundamental mechanism of class based programming. It is an effect, that arises from the various mechanisms. If abstract data types can be provided by classes, and classes are syntax sugar, then abstract data types don’t go away when we remove the sugar.
Encapsulation mostly refers to the ability to “hide” an object’s data, most often by using the
private
keyword. This is a mechanism, but a minor one, that can be enforced by convention, as in Python.
Classes alone
We will start by emulating class based programming without inheritance nor subtype polymorphism. Let’s begin with the genuine thing (in C++). First, the interface of an object:
// point.hpp
class Point {
public:
Point();
void move(int x, int y);
void lift();
void draw(void) const;
private:
int x_, y_; // This line doesn't belong in the interface
};
Then, its implementation:
// point.cpp
#include "point.hpp"
#include <cstdio>
Point::Point(): x_(0), y_(0) {}
void Point::move(int x, int y) {
x_ += x;
y_ += y;
}
void Point::lift() {
move(1, 0);
}
void Point::draw(void) const {
std::printf("(%d, %d)\n", x_, y_);
}
Finally, a use case of that object.
// main.cpp
#include point.hpp
int main(int argc, char* argv[])
{
Point p; p.draw();
p.move(4, 2); p.draw();
p.lift(); p.draw();
return 0;
}
(This is a just an example, not acceptable production code. I just wanted more evocative names than fooBarBaz.) Now, here’s how we could do the same thing in C. First the interface:
// point.h
typedef struct Point_t Point;
Point * make_point(void);
void move_point(Point* this, int x, int y);
void lift_point(Point* this);
void draw_point(const Point* this);
void unmake_point(Point* this);
Then its implementation:
// point.c
#include "point.h"
#include <stdio.h>
#include <stdlib.h>
struct Point_t { int x, y; };
Point * make_point(void) {
Point * this = malloc(sizeof(Point));
p->x = 0;
p->y = 0;
return this;
}
void move_point(Point* this, int x, int y) {
this->x += x;
this->y += y;
}
void lift_point(Point* this) {
move_point(this, 1, 0);
}
void draw_point(const Point * this) {
// Let's pretend it draws the point.
printf("x = %d; y = %d\n", this->x, this->y);
}
void unmake_point(Point* this) {
free(this);
}
Finally, a use case.
// main.c
#include "point.h"
int main(int argc, char* argv[])
{
Point* p = make_point(); draw_point(p);
move_point(p, 4, 2); draw_point(p);
lift_point(p); draw_point(p);
unmake_point(p);
return 0;
}
Note that this is basically how modular programming is naturally done in C. An old
example is FILE*
in stdio.h
. Yet those two
listings are very similar. Indeed, most of their differences lie in
syntactic sugar:
- The methods and the constructor are declared outside the class.
- Their names changed to avoid name clashes.
- The
public
andprivate
keywords have been removed,class
becamestruct
, andcstdio
becamestdio.h
. - Methods became ordinary functions.
Some of the changes, however, are a bit deeper.
- The
this
pointer became explicit. - The point is allocated on the heap.
- The destructor became explicit.
- The constructor got an explicit allocation, and lost the initialization list.
ML, thanks to its garbage collection, can yield purer sugar. Here’s how classes could look like if I was to implement them in Ocaml. First the interface:
interface point = {
move : int -> int -> unit;
lift : unit -> unit;
draw : unit -> unit;
}
Then, the implementation:
class point
{
members:
x_ : int ref;
y_ : int ref;
constructors:
point () = class point (ref 0) (ref 0);
methods:
move = (fun x y ->
x_ := !x_ + x;
y_ := !y_ + y);
lift = (fun () -> this.move 1 0);
draw = (fun () -> print_string
("(" ^ string_of_int !x_ ^
", " ^ string_of_int !y_ ^ ")\n"));
}
Finally, a use case:
(* Main program (happens to be valid Ocaml) *)
let _ =
let p = point () in
p.draw ();
p.move 4 2; p.draw ();
p.lift (); p.draw ()
Quick reference for C/C++ programmers who don’t read Ocaml:
unit
in Ocaml meansvoid
in C.move : int -> int -> unit;
is a function prototype. The C equivalent would bevoid move(int, int);
.int ref
in Ocaml meansint * const
in C. You can also think of it as a variable that contains anint
value.x_ := !x_ + x;
in Ocaml means*x_ = *x + x;
in C.(fun x y -> (2 * x) + y)
is a litteral function, in the same way that42
is a litteral integer. In english it reads “the function which given x and y gives backs 2x + y).(* This is a comment. *)
let _ =
is line noise. (it declares a variable that won’t be used).let p = point () in
is a variable declaration and initialization.
Note that the interface
, class
,
members
, constructors
, and
methods
keywords are my own invention. Think of them as
“Ocaml++” features.
A fun fact about the way I wrotes the methods, is that methods don’t
have to be functions (much like the actual methods in the actual Ocaml
classes). I could have added answer: int;
in my interface,
and answer = 42;
in my implementation with no problem.
Now let’s see the de-sugared version, in pure Ocaml. It is so similar to the sugared version above that I think we shouldn’t bother in real programs. First the interface:
type point = {
move : int -> int -> unit;
lift : unit -> unit;
draw : unit -> unit;
}
Then, the implementation:
let class_point
(x_ : int ref)
(y_ : int ref) =
let rec this = {
move = (fun x y ->
x_ := !x_ + x;
y_ := !y_ + y);
lift = (fun () -> this.move 1 0);
draw = (fun () -> print_string
("(" ^ string_of_int !x_ ^
", " ^ string_of_int !y_ ^ ")\n"));
} in this
let point () = class_point (ref 0) (ref 0)
The use case doesn’t change (it was valid ocaml already).
So now we have pure sugar. There are still some notable differences with the C++ program, though:
The
this
pointer is explicit when methods refer to each other (private member variables are referred to directly). Heavier sugar could make it implicit, but this example is easier to follow.The type of the class is completely separated from the definition of the class. Actually, the definition of the class itself is something like a class constructor.
On the other hand, I got rid of allocation and destruction problems (thanks to Ocaml’s garbage collection), and had my initialization list back (they’re not essential to class based programming, but I like them for other reasons).
Let’s implement polymorphism
We have polymorphism when two classes share the same type, but have different methods. In C++ and Java, one would use subtyping. No need to here.
Here, a class is just a record (a struct
in C), and its
methods are just fields of the record. There is no real distinction
between a method and member variable: they are both values stored in the
record. Functions and integers work the same:
(* numbers *) (* functions *)
type foo = { type bar {
v1 : int; f1 : int -> int;
v2 : float; f2 : float -> float;
} }
let foo1 = { let bar1 = {
v1 = 1 f1 = (fun x -> x + 1);
v2 = 1. f2 = (fun x -> x +. 1.);
} }
let foo2 = { let bar1 = {
v1 = 2 f1 = (fun x -> x + 2);
v2 = 2. f2 = (fun x -> x +. 2.);
} }
The same goes for the point
“class”. It is very easy to
make different objects of type point
, but with different
methods:
let fancy_point = {
move = (fun _ _ -> print_string "No move ");
lift = (fun () -> print_string "No lift ");
draw = (fun () -> print_string "No_draw\n");
}
(* Main program (with the fancy point) *)
let _ =
let p = if true
then fancy_point ()
else point()
in
p.draw ();
p.move 4 2; p.draw ();
p.lift (); p.draw ()
The bottom line is, first closures give you polymorphism for free. Conversely, closures can be emulated with subtype polymorphism:
class Computer {
public:
virtual int compute(int) = 0;
}
Closures and subtype polymorphism are kinda equivalent. In both
cases, it’s about using functions or procedures without specifying it at
compile time. In practice however, closures are simpler: they let you
parametrize over behaviour directly. The need to enclose that behaviour
in a class tend to break locality of code. That can render functions
like map
and filter
unusable. C++
algorithm
library for instance was a usability nightmare
untill C++11 added closures.
Let’s implement inheritance
Ocaml has a nice syntax to define a record in terms of another:
let foo3 = { foo1 with
v2 = 3.;
}
So, we could just use that for inheritance:
let class_fast_point super =
let rec this = { super with
move = (fun x y ->
super.move x y;
super.move x y);
} in this
let _ =
let p = class_fast_point (point ()) in
p.draw ();
p.move 4 2; p.draw ();
p.lift (); p.draw ()
Note that we derive objects (not classes), and note the explicit
reference to the base object (super
). Anyway, this is a
minor point: it is still a matter of taking the same group of methods,
except for the ones we explicitly override.
Alas, there is a fatal flaw in my program. When we look at its output, we see:
(0, 0)
(8, 4)
(9, 4)
The first two lines are fine: the point moves twice as fast, as
intended. The third line, however, is off. the lift
method
is defined in terms of the move
methods, so if the point
moves twice as fast, it ought to lift twice as high. Here, you can see
that lift
does not “see” that we changed the
move
method. In other words, our inheritance scheme
implements early binding.
Late binding needs more sugar:
type point = {
move : int -> int -> unit;
lift : unit -> unit;
draw : unit -> unit;
}
type class_point = {
move_ : class_point -> int -> int -> unit;
lift_ : class_point -> unit -> unit;
draw_ : class_point -> unit -> unit;
}
let rec make_point cl = {
move = cl.move_ cl;
lift = cl.lift_ cl;
draw = cl.draw_ cl;
}
let class_point
(x_ : int ref)
(y_ : int ref) =
{
move_ = (fun this x y ->
x_ := !x_ + x;
y_ := !y_ + y);
lift_ = (fun this () -> (this.move_ this) 1 0);
draw_ = (fun this () -> print_string
("(" ^ string_of_int !x_ ^
", " ^ string_of_int !y_ ^ ")\n"));
}
let point () = class_point (ref 0) (ref 0)
let fast_point super =
let rec this = { super with
move_ = (fun this x y ->
(super.move_ this) x y;
(super.move_ this) x y)
} in this
let _ =
let p = make_point (point ()) in
p.draw ();
p.move 4 2; p.draw ();
p.lift (); p.draw ()
let _ = print_newline ()
let _ =
let p = make_point (fast_point (point ())) in
p.draw ();
p.move 4 2; p.draw ();
p.lift (); p.draw ()
The added de-sugaring is not overwhelming:
- The name of the methods are changed to avoid name conflicts.
- The definition of the methods have
this
as their first argument. this.M
becomes(this.M_ this)
super.M
becomes(super.M_ this)
- We generate one
class_I
type and onemake_I
function for everyI
interface. - Every constructor call is embedded in a
make_I
call (this is not strictly sugar but it’s still light).
I could have lightened the sugar (no underscores, no class_point
class, no make_point function), but it would have added sugar in the
main program: p.M
method calls would become
(p.M p)
. Another way would be using lazy evaluation, but I
couldn’t make Ocaml lazy
enough.
Anyway, the output of the program now correctly demonstrates late binding:
(0, 0)
(4, 2)
(5, 2)
(0, 0)
(8, 4)
(10, 4)