Defining Syntactic Sugar

I have seen programmers rejecting arguments containing “syntactic sugar” on the grounds that the term is too vague. So vague in fact that “syntactic sugar” could point to any difference between any two Turing complete languages.

We can do better.

Some compilation processes clearly do more than eliminating sugar. Some of them mess with the original structure so much that no one stand a chance at recogunizing anything. Whole program optimization, for instance.

On the other hand, some compilation processes clearly are sugar elimination. They just move keywords around. The let expression elimination in languages that have first class functions is a famous example:

-- `X` is an identifier,
   `Ex` and `E` are arbitrary expressions.

let X = Ex in E  -- before
(fun X -> E) Ex  -- after

Another example would be the dot notation used in most class based languages:

// `ARGS` is an arbitrary list of arguments, like `x, y, z`.
   `O` and `F` are arbitrary expression, where F is a function.
   (without first class functions, `F` is an identifier)

O.F(ARGS)  // before
F(O, ARGS) // after

This one could easily be added to C. Every time the compiler encounters some O.F(ARGS) expression, it first check if F is within the scope of obj, and if not treat the whole thing as an ordinary function call. If you further assume programs that don’t use function pointers, the check becomes unnecessary.

Those two transformations have 3 interesting properties:

They are local. The transformation of a relevant piece of code is completely independent from the rest of the program.
They don’t duplicate nor erase code. After the let expression elimination, you still find X, Ex, and E exactly once in your code. Same thing when reducing the method call into a function call.
They are independent from their variable parts. These transformations above don’t deconstruct the parameters of the let expression (X, Ex, and E) nor of the method call (O, F, and ARGS). Those parameters are treated as opaque blocks.

As a result, the programs needed to implement those transformations are very simple. Trivial pattern matching could handle this kind of code. The sed command might do it, if not for some obvious complications with text based transformations:

sed `s/let\(.[a-zA-Z]\)=\(.*\)in\(.*\)/(fun \1 -> \3) \2/`
sed `s/\(.*\)\.\(.*\)(\(.*\))/\2(\1, \3)/`

This of course won’t really work, but you get the idea. More reasonably, the text would first be transformed in abstract syntax tree, which will then be de-sugared. A part of the de-sugaring functions might look like this (in Haskell-like syntax):

-- I assume currying: functions have exactly 1 argument
desugar1 LetExpr x ex e = Funcall (Fun x e) ex

-- I don't assume currying: functions have a list of arguments
desugar2 MethodCall o f args = Funcall f (o : args)

So, here is my definition of syntax sugar:

Any compilation process that share the 3 properties above are syntax sugar elimination.
Any sequence of several syntax sugar eliminations is itself syntax sugar elimination.
Syntax sugar is any set of constructs (such as let expressions, or method calls) that can be eliminated by syntax sugar elimination.

I believe this is a fairly conservative definition, in the sense that any construct that is syntax sugar by my definition will be intuitively labelled “syntax sugar” by almost any programmer out there. (The reverse may not be true, however.)

For those looking for a more formal definition for “syntax sugar”, I recommend On the Expressive Power of Programming Languages by Mathias Felleisen (Thanks to Felicia Svilling for the tip).