Implementing greener threads by putting uxn in your uxn

Green threads are software-defined threads. To implement threads for uxn I considered two possible approaches:

  1. Hack the VM source code / create a device that implements low level threads. This would have the upside of being fast, but a big downside of working only on custom VM builds

OR

  1. do what the title suggests

Method

The simplest way to implement green threads in a VM like uxn, would be having multiple contexts, in uxnemu this could be achieved by modifying

typedef struct Uxn {
  Uint8 *ram, dev[0x100];
  Stack wst, rst;
} Uxn;

to

typedef struct Uxn {
  Uint8 *ram, dev[0x100];
  Stack wst, rst;
  Uint16 pc, ticker;
  /* I think it would also be wise to preserve the Zero Page,
     as variables are commonly stored there */
  Uint8 zp[0x100];
} Uxn;

This would allow simple context switching. Instead of a global pc (program counter), we would now have an instance-specific one, so code could be suspended and resumed in the same state. As I annotated, keeping the Zero-Page in the context would probably also be great, as software commonly stores variables there (at least I do that in niënor for function-local variables, so for my use case, I really need to keep that). Ticker is a variable that counts down from an arbitrary number to some other number that signals the context should be switched. I don't know how common this solution is, but it seems to work fine in Owl Lisp's VM, so I went with that.

Of course I didn't want to hack on the VM source (for the downside mentioned above), so to use the simplest solution, i'd need a uxn VM implemented in uxn — simple as.

I'll refer to this as the deep VM from now on.

This would make threads slow1, but implementation-independent. Threads would run in the deep VM, and non-threaded code would run in the regular VM.

This makes me think... if VM-defined threads are green threads, would deep VM green threads be... green-er threads?

Solution

Everything is based on this struct, it's 1:1 something I'd write when hacking on the C code.

(define-struct uxn
  ticker    ; from some arbitrary N to 1, when 0 - vm is finished
  result    ; this holds a result when ticker=0
  wst       ; working stack (malloc'd on (make-uxn*))
  rst       ; return stack (malloc'd as above)
  zp        ; Zero Page (malloc'd as above)
  pc        ; program counter
  (wp 8)    ; working stack pointer (single byte - 8 bits — that's nienor's syntax for packed structs, as struct elements are 16-bit, or, a short by default)
  (rp 8))   ; return stack pointer (single byte)

The "arbitrary value" I chose for the ticker to count down from is 1<<9 (512) — it really is arbitrary.

A thread is a struct of

(define-struct thread
  uxn    ; pointer to a uxn instance
  next)  ; next thread or 0

so, a linked list.

I'm targeting the code generated by niënor, so I didn't need to implement the whole VM, just the bare minimum required to run code generated by it.

The deep VM differs a bit from the original, as it treats a JMP2r with an empty return stack as a BRK also setting (uxn-result) of itself to the top of the working stack. This makes it possible to pass niënor functions to the VM, and make them yield values, for example:

(define (main)
  (add-thread (λ () (* 2 21)))
  (start-thread-controller)
  (print-number (uxn-result (thread-uxn *threads*))) ; => 42 (uxn-result of the 1st and only thread)
  (exit!))

Also it is very slow, and I'm planning to do some further optimizations for it to become actually usable, and not just fun & experimental.

Thread controller

I did not implement any way to pass values between threads yet, so the thread controller is really, really simple. It just runs all threads from the global pool, in order, skipping finished threads.

Examples & Issues

two counters

(define (counter)
  (loopn (i 0 5 1)
    (print-number* i)))

(define (main)
  (add-thread counter)
  (add-thread counter)
  (start-thread-controller)
  (putchar #\newline)
  (exit!))
$ ol -r nienor.scm t/test-threads.scm && uxnemu a.out
Assembled a.out in 12.49KB (19.50% used), 398 labels
0011223344
Run: Ended.

Ugh! first issue, see that 12.49KB used? Yeah, most of this is the VM. It's very heavy — I think this can be improved though, as there's a lot of code that gets generated for a (cond ...) statement, as it gets reduced to a big if-else tree. If this were to be a (case opcode (...) ...) compiled to a jump table, I think both speed AND size would improve, but I didn't implement that yet :).

two lines

(define-vector (draw)
  (_run-threads *threads* 0))

(define (main)
  (set-screen-size! 400 400)
  (set-colors! #x0f00 #x00f0 #x000f)
  (fill! 0 0 color-1 layer-0)

  (add-thread (λ ()
                (loopn (i 0 400 1)
                  (pixel! i i color-2 0 layer-0 0 0))))
  (add-thread (λ ()
                (loopn (j 0 400 1)
                  (pixel! j (- 400 j) color-3 0 layer-0 0 0))))

  (set-draw-handler! draw))
$ ol -r nienor.scm t/test-threads.scm && uxnemu a.out
Assembled a.out in 12.89KB (20.11% used), 408 labels
Run: Ended.

Here, I'm not calling (start-thread-controller) directly, as I want the threads to run in the screen vector. This allows them to slowly finish without blocking the main VM thread. But...

Wait... thats the real speed?!

Well, it seems that having an unoptimized stack based VM in a stack based VM isn't certainly the fastest way to do computations. Also what are those glitched pixels?

Well, it seems that doing slow computations is A-ok, but IO seems to be a problem (as always). The task of putting a pixel on the screen requires to do 3 global IO actions (set X, set Y, paint w/ color). If the context gets switched between one of these actions, weird things happen. I have not thought of a way to mitigate that, but it is probably fixable.

Future work

Epilogue

Thank you for reading. If I've sparked your interest, you can download niënor and follow its development here.


  1. Very slow↩︎