Dead simple macroexpansion for niënor
Macros are the heart of niënor. As of right now, the code generator can expect only 22 AST keywords:
define-label!, define-constant, nalloc!, _alloc!, codegen-at!,
_push!, _with-locals!, _defun, _defun-vector, uxn-call!, funcall!,
if, _with-label, _λ, _tailcall-fast!, _tailcall!, _get!, _addrof,
_in-epilogue!, _symbol, _compiler-error, _begin
Note: There are some additional compiler macros and toplevel declarations the compiler can recognise, but there is an insignificant amount of them. Macros from the prelude mostly generate raw AST.
This means, that macros (and functions, mostly macros though) defined in the prelude make up most of the language. I've implemented a really simple macro expander based on a part of the scheme syntax-rules sub-language. This article will explain how it works, and give some general tips & tricks I've learned while using it.
Note: Macro expansion is dead simple. Writing macros is less "dead simple".
Basics
Macro expansion is infinitely recursive, if you're coming from a C background, beware of that! You can easily clog up the macroexpander by writing a macro that infinitely recurses into itself. There's no verbose mode (yet) that will help you debug it.
A macro is a set of rules, that determines how a S-expression will be
transposed to another S-expression. For every rule, you need to provide
a capturing sexp, and a result sexp. The first symbol (car)
of the capturing S-expression will always be captured literally. The
rest of the capturing sexp may be captured literally, if you
ask the macroexpander nicely.
A variable number of arguments can be matched using the improper list
syntax. Ellipsis syntax is not supported. This means a vararg can only
be matched on the end of a list. Examples may include:
(a b . rest),
((a . rest) (b c . rest2) (d . rest3) . rest). The lack of
ellipsis is a design choice, not a matter of laziness1.
Examples
Let's write a macro that turns every (hello world) to
(goodbye world). This will be a really simple,
single-definition macro ruleset. We'll capture both hello
and world symbols literally.
(define-macro-rule (world)
(hello world)
(goodbye world))- The first list
(world)is a set of symbols we want to capture literally when searching for rule-abiding expressions. - The second list
(hello world)is a match blueprint. It says that "for every list withhelloas it's car, andworldas it's singular argument, transpose to whatever the transposition argument specifies." Note that we're capturing world literally, so a list like(hello reader)would not get captured — we'll get to that later. - The third list
(goodbye world)is the transposition rule. Here both symbols get treated literally, asgoodbyewasn't defined anywhere in the macro, andworldwas defined to be treated literally.
We can check if it works using nienor -M.
$ cat temp.scm
(define-macro-rule (world)
(hello world)
(goodbye world))
(hello world)
$ nienor -M "`cat temp.scm`"
(goodbye world)Cool, right? Let's try something harder.
A let binding in a scheme environment is normally implemented using macros and lambdas. This wouldn't really fly in niënor, as closures generated by that would have to be manually freed at runtime. They're implemented in a bit more memory-saving fashion under the hood, but let's ignore the cost for a moment and implement them the regular way.
The idea basically is: (let ((key value)) body) has to
change to ((λ (key) body) value). This is how most schemes
get binding for free!
In our macro language it would look like this:
(define-macro-rule ()
(_let ((word value) . rest) . body)
((λ (word) (_let rest . body)) value))
(define-macro-rule (()) ; <- match the empty list as a literal
(_let () . body)
(begin . body))
(_let ((a b) ; test binding, a=b, c=d
(c d))
(+ a c))In the first rule, we capture the first arguments we could get, and
give them names. This means that we can use the expressions that
actually got matched with the names we gave to them. We also capture a
rest list, which is the rest of the bindings requested, and
a body list, which is the body of that let
expression — in our case, it would be ((+ a c)).
In the second rule, we define what should we do when we run out of
binding requests. When the binding list is empty (matched to
()), we return just the body.
Testing this yields:
$ nienor -m temp.scm | grep λ
((_λ (a) (((_λ (c) ((_begin ((_begin (a c (uxn-call! (2) add))))))) d))) b)
# ^- this basically is ((λ (a) ((λ (c) (+ a c)) d)) b)Note: This is not really let, but let*, as
the bindings happen one by one, not all at the same time. Writing a real
let has been left as an exercise to the reader.
Let's try something even harder!
Consider this structure:
(define-struct user
name
age
country)Let's write a macro that converts this expression:
(bind-users (u1 u2 u3)
names => "jan" "maria" "adam"
ages => 23 21 42
countries => 'PL 'RU 'US
do
(do-something-with u1 u2 u3))into this expression.
(let ((u1 (make-user* "jan" 23 'PL))
(u2 (make-user* "maria" 21 'RU))
(u3 (make-user* "adam" 42 'US)))
(do-something-with u1 u2 u3))It will be hard, as we don't get the values we're searching for in
lists, instead we'll have to parse it item by item to find separators.
The trick here is to use accumulators. Here, I'm using a
(_ . stuff) trick, to keep values I'm looking for in a
list. It's probably possible to do it in a simpler way, but it's my
preferred way of storing values in accumulators.
(define-macro-rule (names => _) ; 1st capture
(bind-users unames names => . rest)
(_bind-users0 (_ . unames) (_) . rest))
(define-macro-rule (_) ; reverse unames
(_bind-users0 (_ uname . unames) (_ . acc) . rest)
(_bind-users0 (_ . unames) (_ uname . acc) . rest))
(define-macro-rule ()
(_bind-users0 (_) (_ . acc) . rest)
(_bind-users1 acc (_) . rest))
(define-macro-rule (_) ; get names
(_bind-users1 unames (_ . names) name . rest)
(_bind-users1 unames (_ name . names) . rest))
(define-macro-rule (ages => _) ; find separator
(_bind-users1 unames names ages => . rest)
(_bind-users2 unames names (_) . rest))
(define-macro-rule (_) ; get ages
(_bind-users2 unames names (_ . ages) age . rest)
(_bind-users2 unames names (_ age . ages) . rest))
(define-macro-rule (countries => _) ; find separator
(_bind-users2 unames names ages countries => . rest)
(_bind-users3 unames names ages (_) . rest))
(define-macro-rule (_) ; get countries
(_bind-users3 unames names ages (_ . countries) country . rest)
(_bind-users3 unames names ages (_ country . countries) . rest))
(define-macro-rule (do _) ; get last separator
(_bind-users3 unames names ages countries do . rest)
(_bind-users-finish unames names ages countries . rest))
(define-macro-rule (_) ; finish by building a let
(_bind-users-finish unames (_ . names) (_ . ages) (_ . countries) . body)
(let (_bind-users-build-let unames names ages countries) . body))
(define-macro-rule ()
(_bind-users-build-let (uname . unames) (name . names) (age . ages) (country . countries))
((uname (make-user* name age country)) . (_bind-users-build-let unames names ages countries)))
(define-macro-rule (())
(_bind-users-build-let () () () ())
())
(define (main)
(bind-users (u1 u2 u3)
names => "jan" "maria" "adam"
ages => 23 21 42
countries => 'PL 'RU 'US
do
(do-something-with u1 u2 u3)))As you can see the lack of ellipsis really hurts when trying to write a bigger macro ruleset. It was quite painful. But it is possible!
Macroexpanding the above program yields:
$ nienor -m temp.scm | grep '_defun main'
(_defun main ()
((_begin
((_begin
((make-user* "adam" 42 (_symbol US))
(make-user* "maria" 21 (_symbol RU))
(make-user* "jan" 23 (_symbol PL))))
(_with-locals! (u3 u2 u1) # this is exactly what the (let ...) expression defined in the prelude
((do-something-with u1 u2 u3))))))) # expands toThis code pushes 3 make-user* calls, and then asks the compiler to generate code that binds 3 locals from the top 3 values of the stack.
(pretty-printed for readability, actual output is a bit less readable.)
Future work
An ellipsis would be nice.
Epilogue
No one should write macros as complicated as this last example. This simple macro expander & matcher works okay enough for what the niënor's prelude requires. Thank you for reading. If you're interested, you can download niënor and follow its development here.
yeah right↩︎