February 2011

How Class based Programming Sucks

It boils down to three mistakes:

Pervasive mutable state.
Indirect support for closures.
No algebraic data types.

Fixing those mistakes would make programs much better.

Pervasive mutable state

Pervasive mutable state is a huge mistake, which is mostly avoidable. Now the question is how class based programming encourages it. Interestingly, languages aren’t the only culprits. The mainstream descriptions of class based programming and anthropomorphism also count a lot.

Class based languages typically make mutable state the default. While immutability is easy most of the time, it requires you to fight the default. The “const” keyword is good, but “mutable” would be better: programmers would type it only when they really need mutable state. “const”, on the other hand, is easier to omit.

Anthropomorphism makes us think about objects as agents instead of as values. This has a strong imperative slant: values don’t change; agents do. We don’t create a new Point next to the first. We make it move. We don’t create List with one more element. We add an element.

See, mutable state makes shorter English sentences, and the agent concept helps make analogies with our fellow humans. In the end, this first impression trumps the fact that avoiding mutable state where possible ultimately yield simpler programs.

Indirect support for closures

With closures, class based programming is syntactic sugar. On the other hand, classes don’t give you the full power of closures. For several reasons, supporting closures through classes alone is a big mistake:

As mathematical objects, functions are not special. They can be arguments for other functions. They can be returned as results. Function composition, for instance, does both: “g ∘ f”. This is no different in this respect from “x + y”.

For programming languages, the consistent thing to do is to treat every value the same way. It shouldn’t matter if the value is a function or an integer. Even if you take a “every value is an object” stance, there is no reason to treat methods and member variables differently, and functions should be objects.

Consistency is important because programming runs on math. If the math is consistent, if it has few special cases, it tend to simplify the programs that run on it.
Closures make functions like map and filter actually usable. Such functions capture the most common patterns, so you can avoid loops most of the time. This means less boilerplate, and less dealing with mutable state.
Sometimes, you have no way around higher order functions. Without closures however, you have to perform lambda lifting and defunctionalization by hand. This often breaks locality of code, and you have to write a whole class with a suitable constructor. You may even have to inherit a base class or implement an interface. Subtype polymorphism is a very poor substitute for closures. This is why C++ algorithm library is unusable.

Fortunately, this mistake is getting fixed. Java has anonymous inner classes, which at the very least don’t break locality of code. And since Javascript, no one dares write a non-system language without closures. Even C++ will support literal functions.

No algebraic data types

Most statically types languages have mechanisms to define new compound types in terms of existing ones. Typically, a value of a compound type can be decomposed into values of the simpler types in it.

In C for instance, the “int*” is defined in terms of “int”. Values of type “int*” are pointers to integers, which can be dereferenced (decomposed), so you can access the underlying integer.

Another common way to compound types is the Cartesian product:

type product = (int * float)      (* Ocaml *)
type product = { i:int; f:float}  (* Ocaml (alternative) *)
struct product { int i; float f}; /* C */

Then again, values of that type can be decomposed:

let (i, f) = my_prod in foo i f (* Ocaml *)
foo my_prod.i my_prod.f         (* Ocaml (alternative) *)
foo(my_prod.i, my_prod.f);      /* C */

The primary compound type of class based languages is the class, which is based on the Cartesian product. That is akin to conjunction: The “product” type above contains an integer and a floating point number.

What they lack is a nice way to express _dis_junction. Meaning, a type that (for instance) would contain either an integer, or a floating point number. Tagged unions would be that.

How for instance would you handle the case of either returning a result, or simply failing? Null pointers and context dependency are not allowed. You need to provide a general mechanism.

I tried something like that in C++ (this was production code —minus some comments):

#ifndef OPTION__
#define OPTION__
#include <exception>

class OptionBase
{
public:
  struct Empty : public std::exception {
    virtual const char* what() const throw() {
      return "Option: no value";
    }
  };
};

template<typename T>
class Option : OptionBase
{
public:
  Option()       throw() : _data(0)           {}
  Option(T data)         : _data(new T(data)) {}
  Option(const Option<T> & rhs) : _data(rhs.empty()
                                        ? 0
                                        : new T(*rhs)) {}
  ~Option() throw() { if (!empty()) delete _data; }

  bool is_empty() const throw() { return _data == 0; }

  // Accessors and mutators
  const T & operator*() const throw(Empty) { chk_empty(); return *_data; }
  T       & operator*()       throw(Empty) { chk_empty(); return *_data; }
  const T * operator->() const throw(Empty) { chk_empty(); return _data; }
  T       * operator->()       throw(Empty) { chk_empty(); return _data; }

  Option<T> & operator=(const Option<T> & rhs)
  {
    if (this != &rhs) { // self assignment
      if (!empty()) delete _data;            // clean-up
      _data = rhs.empty() ? 0 : new T(*rhs); // copy
    }
    return *this;
  }

private:
  const void chk_empty(void) throw (Empty) {
    if (is_empty()) throw Empty();
  }
  T * _data;
};
#endif

This of course isn’t the reference. That would be boost::optional. Anyone interested can take a look at the source code (more than 700 lines of actual code).

My point is, sum types makes your life easier (here in Haskell):

data Maybe a = Just a
             | Nothing

There are other examples of course. Need to represent the status of a mail?

data Mail = NotSent
          | ETA Int
          | Received

Fully fledged algebraic data types are also very useful: Want to implement lists?

data List a = (:) a (List a)
            | Nil

Etc. With classes, you have to handle the flag, the content, and sometimes even the pointers separately and manually. As we have seen with the option type, this is rather clumsy. You could also use Church encoding, but this relies so much on closures that subtype polymorphism won’t cut it.

Note that, this is not specifically about algebraic data types as implemented in Haskell or ML. This is about supporting pattern-matching on user-defined data types. Qi for instance achieves this while relying on different underlying principles. Scala also gives excellent support, by providing suitable syntactic shortcuts for inheritance hierarchies that express disjunction. The result is a tiny bit more cumbersome than ML’s sum types, but also more capable.

Other mistakes?

I’m not aware of any other major mistake of class based programming. There probably are, but I suspect that even my favourite languages and practices would share them.

Conclusion

Note that fixing those three mistakes (pervasive mutable state, no closures, no pattern matching) turns Java into ML. Also remember that class based programming is syntactic sugar over records and closures.

The obvious conclusion is that ML simply dominates Java (and C#) as a language. Time to switch.