Defining Syntactic Sugar
I have seen programmers rejecting arguments containing "syntactic sugar" on the grounds that the term is too vague. So vague in fact that "syntactic sugar" could point to any difference between any two Turing complete languages.
We can do better.
Some compilation processes clearly do more than eliminating sugar. Some of them mess with the original structure so much that no one stand a chance at recogunizing anything. Whole program optimization, for instance.
On the other hand, some compilation processes clearly are sugar
elimination. They just move keywords around. The
elimination in languages that have first class functions is a famous
-- `X` is an identifier, `Ex` and `E` are arbitrary expressions. let X = Ex in E -- before (fun X -> E) Ex -- after
Another example would be the dot notation used in most class based languages:
// `ARGS` is an arbitrary list of arguments, like `x, y, z`. `O` and `F` are arbitrary expression, where F is a function. (without first class functions, `F` is an identifier) O.F(ARGS) // before F(O, ARGS) // after
This one could easily be added to C. Every time the compiler
O.F(ARGS) expression, it first check if
within the scope of
obj, and if not treat the whole thing as an
ordinary function call. If you further assume programs that don't use
function pointers, the check becomes unnecessary.
Those two transformations have 3 interesting properties:
They are local. The transformation of a relevant piece of code is completely independent from the rest of the program.
They don't duplicate nor erase code. After the let expression elimination, you still find
Eexactly once in your code. Same thing when reducing the method call into a function call.
They are independent from their variable parts. These transformations above don't deconstruct the parameters of the
E) nor of the method call (
ARGS). Those parameters are treated as opaque blocks.
As a result, the programs needed to implement those transformations
are very simple. Trivial pattern matching could handle this kind of
sed command might do it, if not for some obvious
complications with text based transformations:
sed `s/let\(.[a-zA-Z]\)=\(.*\)in\(.*\)/(fun \1 -> \3) \2/` sed `s/\(.*\)\.\(.*\)(\(.*\))/\2(\1, \3)/`
This of course won't really work, but you get the idea. More reasonably, the text would first be transformed in abstract syntax tree, which will then be de-sugared. A part of the de-sugaring functions might look like this (in Haskell-like syntax):
-- I assume currying: functions have exactly 1 argument desugar1 LetExpr x ex e = Funcall (Fun x e) ex -- I don't assume currying: functions have a list of arguments desugar2 MethodCall o f args = Funcall f (o : args)
So, here is my definition of syntax sugar:
Any compilation process that share the 3 properties above are syntax sugar elimination.
Any sequence of several syntax sugar eliminations is itself syntax sugar elimination.
Syntax sugar is any set of constructs (such as
letexpressions, or method calls) that can be eliminated by syntax sugar elimination.
I believe this is a fairly conservative definition, in the sense that any construct that is syntax sugar by my definition will be intuitively labelled "syntax sugar" by almost any programmer out there. (The reverse may not be true, however.)