Metacompilers

Here are some experiments about metacompilation. For now, I have two toys: a re-implementation of MetaII, and a simple implementation of Parsing Expression Grammars on top of Parsec for Haskell.

Meta II for Lua

This is a straightforward re-implementation of META II from Shorre’s paper, using Lua instead of a custom assembler. You can download it here. Despite this being 60’s technology, it’s quite impressive: this compiler needs only 50 lines of Lua support code to run, and compiles itself in 30 lines.

There’s a similar implementation for C by Long Nguyen which predates mine, but I swear I didn’t looked at it to do mine (which was good for my ego and for learning, but stupid from a practical perspective).

If you know nothing about such metacompilers, I suggest you read this excellent tutorial by James M. Neighbors.

Of course, you need Lua. I have tested it with Lua 5.2.

Install

$ tar -xzf metaII-lua.tar.gz
$ cd metaII
$ ls
bootstrap.lua  makefile  runtime.lua
bootstrap.txt  meta.txt

Usage

The compiler is made up of two Lua files. The compiler itself, and runtime.lua. For now, your only working compiler is meta-bootstrap.lua. You use it like this:

$ lua meta-bootstrap.lua <meta.txt >meta.lua

(You can also compile meta-bootstrap.txt. Actually, that’s how I made meta-bootstrap.lua. I didn’t write it by hand. For those who see the infinite regress, I did write the first compiler by hand. I considered using Long Nguyen’s compiler instead, but preferred the hard path.)

The compiler you just made can also compile itself:

$ lua meta.lua <meta.txt >meta-diff.lua
$ diff meta.lua meta-diff.lua
$

Yep. No difference. For the lazy, the whole process is encoded in the makefile:

$ make
...
Success!
$ ls
bootstrap.lua  meta.lua  makefile       runtime.lua
bootstrap.txt  meta.txt  meta-diff.lua

Note: the compilers stemming from bootstrap.txt and meta.txt are the same, except bootstrap.txt adds comments to the code of the compiler. It may help you decipher the generated Lua code.

Example

Let’s start with what I consider the hello-world of compilation: infix arithmetic expressions. Write this in a new file (say, arithmetic.txt).

.syntax expr
  expr = exp2 *( '+' exp2 {'add'     }
               | '-' exp2 {'sub'     });
  exp2 = exp3 *( '*' exp3 {'mul'     }
               | '/' exp3 {'div'     });
  exp3 = .number          {'push ' $ }
       | '(' expr ')';
.end

The syntax is loosely based on Backus–Naur Form. What’s between {} are things we print as we recognise patterns. They can be seen as patterns that allways succeed. The $ prints the last recognised token. .number is a primitive pattern that recognises integers, and the * denotes repetition. The whole thing turns arithmetic expressions into some sort of reversed Polish notation. So, let’s compile this compiler:

$ lua meta.lua <arithmetic.txt >arithmetic.lua

Then, test it:

$ cat expr.txt
1 + 2 * (3 + 4) - 5
^D
$ lua arithmetic.lua <expr.txt
push 1
push 2
push 3
push 4
add
mul
add
push 5
sub

Ta daa!

NoMeta for Haskell

This basically implements Parsing Expressions Grammars as syntactic sugar over parser combinators. This one is build on top of Parsec. You can get it here.

The principle is a bit different from the other compiler: instead of defining a standalone language, this one works as a pre-processor for Haskell, augmenting its syntax like a naive implementation of the do notation.

Install

$ tar -xzf nometa-haskell-rule.tar.gz
$ cd nometa-haskell-rule
$ ls
funmeta-bootstrap.hs  funmeta.hs  makefile

funmeta-bootstrap.hs is pure hand-written Haskell code that use Parsec in a mostly readable way. funmeta.hs described the same compiler than its bootstrap version, but makes heavy use of the syntax it implements. It should be decipherable as it is.

Usage

Again, the whole bootstrap process is described in the makefile:

$ make
...
...
Success!

Basically, you compile funmeta-bootstrap.hs, then use the result to pre-process the mostly Haskell code in funmeta.hs. Compile the result, and pre-process funmeta.hs with that, and see that you get the same result than with the first time around.

Now you can use Parsec with a fancy syntax.

Limitations

As it stands, this compiler not worth the trouble. My syntax is more readable than the combinators themselves only by a hair. I do have several improvements in mind however:

Right now, I only have a syntax for anonymous rules, which must be surrounded by {| |}. I could instead use a syntax for grammars, which should remove some of this line noise.
Juxtaposition has been hijacked for rule succession. I still need function application however, and I currently use the f(x) syntax. Alas, this syntax goes against Haskell’s natural ways, and force me to place parentheses in unnatural places. I think I should use a f $ x syntax instead. It curries nicely, and means the same thing as it does in native Haskell.
The x in f(x) currently is native Haskell. However, many Haskell rules take rules as arguments, forcing me to write f({|foo|}) to take advantage of my syntax. The escape hatch should go the other way.
My syntax doesn’t take advantage of indentation the way the do notation does. It should.

NoMeta for Haskell (reloaded)

This one is much better. It deals with grammars instead of just rules. It does not however have a syntax for anonymous rules in regular haskell code (I don’t think we really need one: my syntax is only worth the trouble if we define many rules at once).

You can get this version here. It is installed and used just like the other.

Limitations

Syntax error messages appear not to point to the right location of the error in the source file.
My syntax still isn’t indentation sensitive.
The application operator would work better if it had highest priority, instead of lowest (my current choice). I wonder if it’s worth breaking Haskell’s convention about it, though.
The generated code have no line number annotations. And I don’t know how to use GHC’s -F option, which seems to be the way to do pre-processing.

This is basically a proof of concept. If anyone find it useful, please fork it, polish it, and put it on Hackage.