Metacompilers
Here are some experiments about metacompilation. For now, I have two toys: a re-implementation of MetaII, and a simple implementation of Parsing Expression Grammars on top of Parsec for Haskell.
Meta II for Lua
This is a straightforward re-implementation of META II from Shorre’s paper, using Lua instead of a custom assembler. You can download it here. Despite this being 60’s technology, it’s quite impressive: this compiler needs only 50 lines of Lua support code to run, and compiles itself in 30 lines.
There’s a similar implementation for C by Long Nguyen which predates mine, but I swear I didn’t looked at it to do mine (which was good for my ego and for learning, but stupid from a practical perspective).
If you know nothing about such metacompilers, I suggest you read this excellent tutorial by James M. Neighbors.
Of course, you need Lua. I have tested it with Lua 5.2.
Install
$ tar -xzf metaII-lua.tar.gz
$ cd metaII
$ ls
bootstrap.lua makefile runtime.lua
bootstrap.txt meta.txt
Usage
The compiler is made up of two Lua files. The compiler itself, and
runtime.lua
. For now, your only working compiler is
meta-bootstrap.lua
. You use it like this:
$ lua meta-bootstrap.lua <meta.txt >meta.lua
(You can also compile meta-bootstrap.txt
. Actually,
that’s how I made meta-bootstrap.lua
. I didn’t write it by
hand. For those who see the infinite regress, I did write the first
compiler by hand. I considered using Long Nguyen’s compiler instead, but
preferred the hard path.)
The compiler you just made can also compile itself:
$ lua meta.lua <meta.txt >meta-diff.lua
$ diff meta.lua meta-diff.lua
$
Yep. No difference. For the lazy, the whole process is encoded in the makefile:
$ make
...
Success!
$ ls
bootstrap.lua meta.lua makefile runtime.lua
bootstrap.txt meta.txt meta-diff.lua
Note: the compilers stemming from bootstrap.txt
and
meta.txt
are the same, except bootstrap.txt
adds comments to the code of the compiler. It may help you
decipher the generated Lua code.
Example
Let’s start with what I consider the hello-world of compilation:
infix arithmetic expressions. Write this in a new file (say,
arithmetic.txt
).
.syntax expr
expr = exp2 *( '+' exp2 {'add' }
| '-' exp2 {'sub' });
exp2 = exp3 *( '*' exp3 {'mul' }
| '/' exp3 {'div' });
exp3 = .number {'push ' $ }
| '(' expr ')';
.end
The syntax is loosely based on Backus–Naur Form. What’s between
{}
are things we print as we recognise patterns. They can
be seen as patterns that allways succeed. The $
prints the
last recognised token. .number
is a primitive pattern that
recognises integers, and the *
denotes repetition. The
whole thing turns arithmetic expressions into some sort of reversed
Polish notation. So, let’s compile this compiler:
$ lua meta.lua <arithmetic.txt >arithmetic.lua
Then, test it:
$ cat expr.txt
1 + 2 * (3 + 4) - 5
^D
$ lua arithmetic.lua <expr.txt
push 1
push 2
push 3
push 4
add
mul
add
push 5
sub
Ta daa!
NoMeta for Haskell
This basically implements Parsing Expressions Grammars as syntactic sugar over parser combinators. This one is build on top of Parsec. You can get it here.
The principle is a bit different from the other compiler: instead of defining a standalone language, this one works as a pre-processor for Haskell, augmenting its syntax like a naive implementation of the do notation.
Install
$ tar -xzf nometa-haskell-rule.tar.gz
$ cd nometa-haskell-rule
$ ls
funmeta-bootstrap.hs funmeta.hs makefile
funmeta-bootstrap.hs
is pure hand-written Haskell code
that use Parsec in a mostly readable way. funmeta.hs
described the same compiler than its bootstrap version, but makes heavy
use of the syntax it implements. It should be decipherable as it is.
Usage
Again, the whole bootstrap process is described in the makefile:
$ make
...
...
Success!
Basically, you compile funmeta-bootstrap.hs
, then use
the result to pre-process the mostly Haskell code in
funmeta.hs
. Compile the result, and pre-process
funmeta.hs
with that, and see that you get the
same result than with the first time around.
Now you can use Parsec with a fancy syntax.
Limitations
As it stands, this compiler not worth the trouble. My syntax is more readable than the combinators themselves only by a hair. I do have several improvements in mind however:
Right now, I only have a syntax for anonymous rules, which must be surrounded by
{| |}
. I could instead use a syntax for grammars, which should remove some of this line noise.Juxtaposition has been hijacked for rule succession. I still need function application however, and I currently use the
f(x)
syntax. Alas, this syntax goes against Haskell’s natural ways, and force me to place parentheses in unnatural places. I think I should use af $ x
syntax instead. It curries nicely, and means the same thing as it does in native Haskell.The
x
inf(x)
currently is native Haskell. However, many Haskell rules take rules as arguments, forcing me to writef({|foo|})
to take advantage of my syntax. The escape hatch should go the other way.My syntax doesn’t take advantage of indentation the way the
do
notation does. It should.
NoMeta for Haskell (reloaded)
This one is much better. It deals with grammars instead of just rules. It does not however have a syntax for anonymous rules in regular haskell code (I don’t think we really need one: my syntax is only worth the trouble if we define many rules at once).
You can get this version here. It is installed and used just like the other.
Limitations
Syntax error messages appear not to point to the right location of the error in the source file.
My syntax still isn’t indentation sensitive.
The application operator would work better if it had highest priority, instead of lowest (my current choice). I wonder if it’s worth breaking Haskell’s convention about it, though.
The generated code have no line number annotations. And I don’t know how to use GHC’s
-F
option, which seems to be the way to do pre-processing.
This is basically a proof of concept. If anyone find it useful, please fork it, polish it, and put it on Hackage.