Virtual mischief
This is a post about a task I authored for a CTF competition at the 20th SFI.
How many virtual machine ecosystems are there? That's ridiculous! Fortunately we have developed a universal one that will fit everyone's use cases! Unfortunately, We don't have enough manpower to develop N implementations of it that could fight the infinite amounts of open source and proprietary jvm implementations. Can You help us do that? We have the docs right there, alongside some test data You could find useful.
The task description, followed by a download
link. The archive consists of a flag.bin
file and a
spec.pdf
file.
This task asks you to implement a virtual machine for an ad-hoc specification and use it to run a binary file with it.
cool, right?
Warning: I have never designed or really looked into a virtual machine before writing this task. Source code for this task was entirely based off of my small amount of informal and anecdotal knowledge about VMs and assemblers. Don't consider this a good source of knowledge.
The spec is quite odd, as it requires you to implement a virtual machine with 2 real stacks + 1 "CTF stack", no registers and with the ability to generate executable code for itself on the fly. Only recently I've realised that the 2 stack solution is close to what uxn is doing.
Hello, World!
After writing the spec a terrible realization hit me: I'd have to write an assembler to generate code for this. I had some time to spare though, so after injecting some caffeine I wrote it. In an evening. In emacs lisp. The last one was a quirky choice, but I have had some elisp experience at that time, and it made it possible to totally skip the "parse source and generate AST" part of writing an assembler. The assembler is terrible. It would be an insult to two-pass assemblers to call it one. It barely qualifies as a 1.5 pass system and even that could be a stretch. It's also very inefficient in doing this 1.5 passes, but hey - it works. After some fiddling I was able to compile a working "hello world" program, and then run it in my implementation of the VM. Great!
(defconst A//hello-world
`((define mainpush "Hello, World!\n")
(
:loop
(& jempt :ret)write)
(
(goto :loop)
:ret
(ret))
:_start (call main)))
So what's happening here? This is the entirety of a file called
hello-world.el
, so it's actually emacs lisp source - that
explains the defconst
in the beginning.
Next, I'm defining a main
"function" - it's just a
label, but it looks like a function in the source code, and
requires a return statement somewhere in the body. This is
checked because I have zero experience writing assembly, and I forgot to
(ret)
a lot.
;; this is the code that does this advanced check:
when (and ; when
(; the assembler is configured to error out on noreturn
A/error-on-noret eq 'define f) ; the function declaration does not specify that a given function does not return
(not (--any-p (and (listp it) (member 'ret it)) func))) ; there is no (... ret ...) in the function body
(error "Function %s does not return" name)) ; die! (
Next, I push "Hello, World!\n"
to the stack. The spec
asks for "an unbounded stack" with "a type large enough to hold a 32-bit
signed integer". I forgot to mention - the VM operates on 32-bit
numbers, so when pushing Hello, World! we actually push 4 times the
length of bytes. This is quite wasteful but wcyd.
Then I declare a label, it looks like a label in MS Windows batch files, and that's thanks to how keywords look in lisp - it's just a keyword (a fancy symbol).
keywordp :a)
ELISP> (t
symbolp :a)
ELISP> (t
keywordp 'a)
ELISP> (nil
Next we have a macro! Ooo, fancy - I've added a few macros to the
assembler so i had to write less code B). This
(& arg1 argn ...)
just pushes argn ...
before executing arg1
. jempt
pops a number
from the stack, checks if the stack is empty, and if so, jumps to a
place in the program signified by the number popped. It consumes the
argument, so we don't have to cleanup anything.
Then we call (write)
- this pops a number from the
stack, assumes it's just 0-255 ascii, and writes it to stdout (spec says
it will always be ascii).
Then there's a simple goto and a return condition we jump to from the
uppermentioned jempt
macro. Only call
and
ret
calls manipulate the 2nd (return) stack.
A :_start
label is required by the assembler, and this
is where a program execution starts. The 1st bytes of the code generated
are a literal goto
address of :_start
.
Simple, right?
Interesting, yet unused
Unfortunately I've not used this VM to the fullest in
flag.bin
, because I was afraid adding some debugging parts
to this task might be too hard. I did implement a random number
generator for it, but I also did not use it :/
;; https://www.stix.id.au/wiki/Fast_8-bit_pseudorandom_number_generator
define rand
(#x103 1) ; x++
(-> 1)
(& add #x103 1)
(<-S #x100 1) ; get a
(-> #x102 1) ; get c
(-> ; (a ^ c) <-+
(xor) #x103 1) ; get x |
(-> ; ... ^ x <-+
(xor) #x100 1) ; a = ... ---+
(<-S #x100 2) ; get a, b
(->
(add)#x101 1) ; b = b + a
(<-S 1)
(push #x101 1) ; get b
(-> ; b >> 1 <-+
(>>) #x102 1) ; get c |
(-> ; c + ... <-+
(add) #x100 1) ; get a |
(-> ; ... ^ a <-+
(xor) #x102 1) ; c = ... ---+
(<-S
#x102 1) ; return c
(->
(ret)
)
define init-rand ;; c b a
(#x100 3) ; rand data kept at [a=0x100, b=0x101, c=0x102, x=0x103]
(<-S
(call rand)
(pop) (ret))
This is how it looked like, <-
, ->
and <-S
are macros that moved values from the stack to
the memory and vice versa.
I also gave up on using the self-generating executable code idea I
had in mind because... I don't really know why to be honest. It's still
required by the specification and wmem
and
rmem
is used heavily, but the flag never jumps to freshly
written code.
During the development I wrote a small function that does that though, here's how it could have worked:
define write-byte-at ;; ptr b -> ptr + 1
(0 1) ; save b to 0x0
(<-S ; dup ptr to save it
(dup) 0 1) ; restore b
(-> ; write b to ptr
(wmem) 1)
(& add ; return ptr + 1
(ret))
define test-memory-rewriting
(
:p0 "hello\n")
(call print ; push a pointer to this function
(push :p)
0) ; \
(call write-byte-at 10) ; |
(call write-byte-at 0) ; |
(call write-byte-at 0) ; |
(call write-byte-at 0) ; / push \n
(call write-byte-at
0) ; \
(call write-byte-at ; |
(call write-byte-at ?i) 0) ; |
(call write-byte-at 0) ; |
(call write-byte-at 0) ; / push ?i
(call write-byte-at
0) ; \
(call write-byte-at ; |
(call write-byte-at ?H) 0) ; |
(call write-byte-at 0) ; |
(call write-byte-at 0) ; / push ?H
(call write-byte-at
11) ; write
(call write-byte-at 11) ; write
(call write-byte-at 11) ; write
(call write-byte-at 18) ; ret
(call write-byte-at
; pop :p from stack of 1st call
(pop) (ret))
On the 1st call, this function would print "hello\n"
,
rewrite it's own code and return.
On each subsequent call, it would print "Hi\n"
, and
return. It didn't overwrite anything else after it, as the body was far
longer in the normal version, than after rewriting.
The CTF stack
yeah... it won't yield the flag
The actual message was to something different in the final
flag.bin
, but the troll
function remained the
same.
epilogue
It was a lot of fun to write code for this thing and later see people solving it. It wasn't a hard task by any means, but I hope everyone solving it had some fun too.
You can check out the source code for the assembler and the flag as well as the writeup here.