@ Loup's

Impossible? Like that would stop me.

February 2011

Class-based Programming as Syntactic Sugar

Class based programming in statically typed languages like Java and C++ is syntactic sugar over functions, records, and closures. It means that a good deal of class based programming is naturally done even in C, and the rest of it can easily be emulated in ML.

This tutorial will show how.

Defining class based programming

I basically mean classes, inheritance, and subtype polymorphism, in statically type languages. I do not mean genericity which, while tremendously usefull, is orthogonal to class based programming (It wasn't implemented in Java for a long time, after all).

Classes

A class is a record. Like with a C struct, an instance of a class (let's call it "object") contains a bunch of "data" and a bunch of "methods". It is implied that a piece of data can be anything but a function. Methods are just functions (whose first argument, the object they come from, is usually implicit).

Typically, the methods are defined when the class itself is defined. The data, on the other hand, is typically defined at instantiation time, and is often modified after that.

Inheritance

This is a way of defining a class in terms of another class. A derived class is identical to the base class it is build from, except that:

There is also the notion of subtyping, but I'll sweep that under the rug, because subtyping is only interesting for…

subtype polymorphism

Subtype polymorphism is when you use an object of type A, but its actual methods might be those of a derived class of A. This is useful because one often want flexibility about the behaviour (methods) of an object, as well as about its content (data).

abstraction and encapsulation

I didn't mention them as necessary components for class based programming, but some programmers insist they are. They're not:

Classes alone

We will start by emulating class based programming without inheritance nor subtype polymorphism. Let's begin with the genuine thing (in C++). First, the interface of an object:

// point.hpp

class Point {
public:
  Point();
  void move(int x, int y);
  void lift();
  void draw(void) const;
private:
  int x_, y_; // This line doesn't belong in the interface
};

Then, its implementation:

// point.cpp
#include "point.hpp"
#include <cstdio>

Point::Point(): x_(0), y_(0) {}

void Point::move(int x, int y) {
  x_ += x;
  y_ += y;
}

void Point::lift() {
  move(1, 0);
}

void Point::draw(void) const {
  std::printf("(%d, %d)\n", x_, y_);
}

Finally, a use case of that object.

// main.cpp
#include point.hpp

int main(int argc, char* argv[])
{
  Point p;      p.draw();
  p.move(4, 2); p.draw();
  p.lift();     p.draw();
  return 0;
}

(This is a just an example, not acceptable production code. I just wanted more evocative names than fooBarBaz.) Now, here's how we could do the same thing in C. First the interface:

// point.h

typedef struct Point_t Point;

Point * make_point(void);
void move_point(Point* this, int x, int y);
void lift_point(Point* this);
void draw_point(const Point* this);
void unmake_point(Point* this);

Then its implementation:

// point.c
#include "point.h"
#include <stdio.h>
#include <stdlib.h>

struct Point_t { int x, y; };

Point * make_point(void) {
  Point * this = malloc(sizeof(Point));
  p->x = 0;
  p->y = 0;
  return this;
}

void move_point(Point* this, int x, int y) {
  this->x += x;
  this->y += y;
}

void lift_point(Point* this) {
  move_point(this, 1, 0);
}

void draw_point(const Point * this) {
  // Let's pretend it draws the point.
  printf("x = %d; y = %d\n", this->x, this->y);
}

void unmake_point(Point* this) {
  free(this);
}

Finally, a use case.

// main.c
#include "point.h"

int main(int argc, char* argv[])
{
  Point* p = make_point(); draw_point(p);
  move_point(p, 4, 2);     draw_point(p);
  lift_point(p);           draw_point(p);
  unmake_point(p);
  return 0;
}

Note that this is basically how modular programming is naturally done in C. An old example is FILE* in stdio.h. Yet those two listings are very similar. Indeed, most of their differences lie in syntactic sugar:

Some of the changes, however, are a bit deeper.

ML, thanks to its garbage collection, can yield purer sugar. Here's how classes could look like if I was to implement them in Ocaml. First the interface:

interface point = {
  move : int  -> int -> unit;
  lift : unit -> unit;
  draw : unit -> unit;
}

Then, the implementation:

class point
{
members:
  x_ : int ref;
  y_ : int ref;

constructors:
  point () = class point (ref 0) (ref 0);

methods:
  move = (fun x y ->
            x_ := !x_ + x;
            y_ := !y_ + y);
  lift = (fun () -> this.move 1 0);
  draw = (fun () -> print_string
            ("("  ^ string_of_int !x_ ^
             ", " ^ string_of_int !y_ ^ ")\n"));
}

Finally, a use case:

(* Main program (happens to be valid Ocaml) *)
let _ =
  let p = point () in
    p.draw ();
    p.move 4 2; p.draw ();
    p.lift ();  p.draw ()

Quick reference for C/C++ programmers who don't read Ocaml:

Note that the interface, class, members, constructors, and methods keywords are my own invention. Think of them as "Ocaml++" features.

A fun fact about the way I wrotes the methods, is that methods don't have to be functions (much like the actual methods in the actual Ocaml classes). I could have added answer: int; in my interface, and answer = 42; in my implementation with no problem.

Now let's see the de-sugared version, in pure Ocaml. It is so similar to the sugared version above that I think we shouldn't bother in real programs. First the interface:

type point = {
  move : int  -> int -> unit;
  lift : unit -> unit;
  draw : unit -> unit;
}

Then, the implementation:

let class_point
    (x_ : int ref)
    (y_ : int ref) =
  let rec this = {
    move = (fun x y ->
              x_ := !x_ + x;
              y_ := !y_ + y);
    lift = (fun () -> this.move 1 0);
    draw = (fun () -> print_string
              ("("  ^ string_of_int !x_ ^
               ", " ^ string_of_int !y_ ^ ")\n"));
  } in this

let point () = class_point (ref 0) (ref 0)

The use case doesn't change (it was valid ocaml already).

So now we have pure sugar. There are still some notable differences with the C++ program, though:

On the other hand, I got rid of allocation and destruction problems (thanks to Ocaml's garbage collection), and had my initialization list back (they're not essential to class based programming, but I like them for other reasons).

Let's implement polymorphism

We have polymorphism when two classes share the same type, but have different methods. In C++ and Java, one would use subtyping. No need to here.

Here, a class is just a record (a struct in C), and its methods are just fields of the record. There is no real distinction between a method and member variable: they are both values stored in the record. Functions and integers work the same:

(* numbers *)           (* functions *)

type foo = {            type bar {
  v1 : int;               f1 : int -> int;
  v2 : float;             f2 : float -> float;
}                       }

let foo1 = {            let bar1 = {
  v1 = 1                  f1 = (fun x -> x + 1);
  v2 = 1.                 f2 = (fun x -> x +. 1.);
}                       }

let foo2 = {            let bar1 = {
  v1 = 2                  f1 = (fun x -> x + 2);
  v2 = 2.                 f2 = (fun x -> x +. 2.);
}                       }

The same goes for the point "class". It is very easy to make different objects of type point, but with different methods:

let fancy_point = {
  move = (fun _ _ -> print_string "No move ");
  lift = (fun ()  -> print_string "No lift ");
  draw = (fun ()  -> print_string "No_draw\n");
}

(* Main program (with the fancy point) *)
let _ =
  let p = if true
          then fancy_point ()
          else point()
  in
    p.draw ();
    p.move 4 2; p.draw ();
    p.lift ();  p.draw ()

The bottom line is, first closures give you polymorphism for free. Conversely, closures can be emulated with subtype polymorphism:

class Computer {
public:
  virtual int compute(int) = 0;
}

Closures and subtype polymorphism are kinda equivalent. In both cases, it's about using functions or procedures without specifying it at compile time. In practice however, closures are simpler: they let you parametrize over behaviour directly. The need to enclose that behaviour in a class tend to break locality of code. That can render functions like map and filter unusable. C++ algorithm library for instance was a usability nightmare untill C++11 added closures.

Let's implement inheritance

Ocaml has a nice syntax to define a record in terms of another:

let foo3 = { foo1 with
             v2 = 3.;
           }

So, we could just use that for inheritance:

let class_fast_point super =
  let rec this = { super with
                     move = (fun x y ->
                               super.move x y;
                               super.move x y);
                 } in this

let _ =
  let p = class_fast_point (point ()) in
    p.draw ();
    p.move 4 2; p.draw ();
    p.lift ();  p.draw ()

Note that we derive objects (not classes), and note the explicit reference to the base object (super). Anyway, this is a minor point: it is still a matter of taking the same group of methods, except for the ones we explicitly override.

Alas, there is a fatal flaw in my program. When we look at its output, we see:

(0, 0)
(8, 4)
(9, 4)

The first two lines are fine: the point moves twice as fast, as intended. The third line, however, is off. the lift method is defined in terms of the move methods, so if the point moves twice as fast, it ought to lift twice as high. Here, you can see that lift does not "see" that we changed the move method. In other words, our inheritance scheme implements early binding.

Late binding needs more sugar:

type point = {
  move : int  -> int -> unit;
  lift : unit -> unit;
  draw : unit -> unit;
}

type class_point = {
  move_ : class_point -> int  -> int -> unit;
  lift_ : class_point -> unit -> unit;
  draw_ : class_point -> unit -> unit;
}

let rec make_point cl = {
  move = cl.move_ cl;
  lift = cl.lift_ cl;
  draw = cl.draw_ cl;
}

let class_point
    (x_ : int ref)
    (y_ : int ref) =
  {
    move_ = (fun this x y ->
               x_ := !x_ + x;
               y_ := !y_ + y);
    lift_ = (fun this () -> (this.move_ this) 1 0);
    draw_ = (fun this () -> print_string
               ("("  ^ string_of_int !x_ ^
                ", " ^ string_of_int !y_ ^ ")\n"));
  }

let point () = class_point (ref 0) (ref 0)


let fast_point super =
  let rec this = { super with
                     move_ = (fun this x y ->
                               (super.move_ this) x y;
                               (super.move_ this) x y)
                 } in this

let _ =
  let p = make_point (point ()) in
    p.draw ();
    p.move 4 2; p.draw ();
    p.lift ();  p.draw ()

let _ = print_newline ()

let _ =
  let p = make_point (fast_point (point ())) in
    p.draw ();
    p.move 4 2; p.draw ();
    p.lift ();  p.draw ()

The added de-sugaring is not overwhelming:

I could have lightened the sugar (no underscores, no classpoint class, no makepoint function), but it would have added sugar in the main program: p.M method calls would become (p.M p). Another way would be using lazy evaluation, but I couldn't make Ocaml lazy enough.

Anyway, the output of the program now correctly demonstrates late binding:

(0, 0)
(4, 2)
(5, 2)

(0, 0)
(8, 4)
(10, 4)