Introduction

I really like SKI calculus. The thing that initially introduced me to it was the Esolangs wiki. Also, I read the works of John Tromp on binary lambda calculus and binary combinatory logic (I really enjoyed these and I can recommend them to anyone who wishes to study Kolmogorov complexity’s limits). One evening in June 2020, I implemented the following combinator calculus beta-reduction machine for the purposes of a code golf submission:

#define J putchar
#define H O->a
#define G H->a
#define K O->b
typedef struct x{struct x*a,*b;int q;}Y;Y*O;z=1;A(q){O=calloc(6,4);O->q=q;}h(Y*O
){Y*u;O=H&&H->q==2?z=K:H&&G&&G->q==1?z=H->b:H&&G&&G->a&&!G->a->q?u=A(3),(u->a=A(
3))->a=G->b,(u->b=A(3))->a=H->b,u->a->b=u->b->b=K,z=u:(O->q==3?H=h(H),K=h(K):0,O
);}r(x){Y*O;x=getchar()-73;x=x+33?A(x?x!=10:2):!getchar(K=r(H=r(O=A(3))))+O;}q(Y
*O){O&&J(O->q["SKI "],O->q-3||J(41,q(K),q(H),J(40)));}main(){Y*O=r();for(;z;O=h(
O))z=0;q(O);}

It’s 476 bytes big and most of the space is being taken by the complex combinator expansion logic. This encouraged me to research the topic further. First, I started researching other combinators - I already was aware of lambda calculus and functional programming, so naturally the first combinator I took on was the fixed point combinator (Y). Later, I discovered the BCW system made by Haskell Curry and quickly wrote a transpiler from the BCW system to the SKI system:

#!/usr/bin/perl -p
s/B/((S(KS))K)/g;
s/C/((S((S(K((S(KS))K)))S))(KK))/g;
s/W/((SS)(SK))/g;

The BCWK system (borrowing the K from SKI calculus) is fairly reasonable and intuitive. First two combinators swap their arguments: B x y z = (((B x) y) z) = ((B x) (y z)) = B x (y z), C x y z = (((C x) y) z) = ((x z) y) = x z y. The last combinator, W, is just W x y = ((W x) y) = ((x y) y) = x y y.

Notation

Pondering these made me think about the notation that I use for writing down my SKI calculus programs. So far, I assumed that the input is a perfect binary tree of combinators. But since it’s a binary tree, i.e. each node can have only either two combinators, two nodes, a combinator and a node, or a node and a combinator inside it, we don’t need the closing parens anymore!. Let me explain. This is a mostly readable version of the evaluator above:

/* All the possible node types. */
#define NODE_BI 1
#define NODE_S 2
#define NODE_K 3
#define NODE_I 4

struct node_t {
    struct node_t * left, * right;
    char type;
};

struct node_t * new_node(void) { /* implementation detail */ }

// The reading function
struct node_t * read() {
    switch(getchar()) {
        case '(': {
            struct node_t * u = new_node(NODE_BI);
            u->left = read();
            u->right = read();
            getchar(); // skip ')'
            return u;
        }
        
        case 'I': return new_node(NODE_I);
        case 'S': return new_node(NODE_S);
        case 'K': return new_node(NODE_K);
    }
}

An important observation is yet to be made - the reading function doesn’t actually use the closing paren for anything. So we replace it with a different character. I chose an apostrophe - '. Assuming the input doesn’t contain any data except (, S, K and I, the following program will convert the new notation to the old notation:

c;main(){c=getchar();if(c=='`'){putchar('(');main();main();putchar(')');}else putchar(c);}

… and the other way round:

c;main(){for(;~(c=getchar());)if(c!=')')putchar(c=='('?'`':c);}

Provided that I already went through the combinators and the closing parens, the element of the encoding to pick on would of course be the opening parens. In many cases - like this one: (((SK)K)I) they just feel unnecessary and redundant. The tree structure is fairly obvious at the first glance and SKKI would be so much more readable. This would mean that tree node nesting binds to the left. Illustratively, α β γ becomes ((α β) γ), α β γ δ becomes (((α β) γ) δ), and so on. A(BCDE)F becomes ((A(((BC)D)E))F).e I decided to call this the free-form encoding of combinator calculus. The free-form notation can be easily translated to the binary free notation using this small bit of code:

#define R return
#define P putchar
#define T x->q
typedef struct a{struct a*x,*y;int q;}b,*c;c M(q){
c x=malloc(sizeof(b));x->x=x->y=0;T=q;R x;}c f();c
n(){int c=getchar();if(c==40)R f();if(c==41||c==-1
)R 0;if(isalpha(c))R M(c);R n();}c f(){c x=M(-1),y
,z;z=x->x=n();if(!z)R 0;while(y=n())z=z->y=y;R x;}
void q(c x){if(!x)R;if(~T)P(T);else{c z=x->x;while
(z->y){P(40);z=z->y;}z=x->x;q(z);while(z=z->y){q(z
);P(41);}}}main(){q(f());}

This, of course, assumes that the free-form notation derives from the binary tree notation. But what if we used our simplified notation instead? This would bring down the size even further - α β γ becomes ((α β) γ), α β γ δ becomes (((α β) γ) δ), and so on. Let’s sum up all the encoding laws we came up with so far:

  • in the binary tree notation, the closing paren is never necessary.
  • the free-form notation is just as powerful as the binary tree notation.

Having worked out these, now let’s bring the encoding to some binary format. First, we start by noticing that the three combinators and the tree node opening character are 4 different states in total, which means that we can encode them with 4 bits. The encoding would be, for example:

00 - '
01 - S
10 - K
11 - I

… but just as well we could drop the I combinator (as it’s simply just SKK), meaning that we have three states now, so we can encode our data using one or two binary digits:

0  - '
10 - S
11 - K

Because the drawbacks of each encoding aren’t instantaneously obvious, we can test them on a few example SKI calculus expressions. Example:

{
    (((S((S(KI))((S((S(KI))I))(K(K(KI))))))(K((S(KK))I)))(
    (S((S(KI))((S(K((S(KS))(S(KI)))))((S(KK))I))))(KI)))
    <=> [ replacing I with (SK)K ]
    (((S((S(K((SK)K)))((S((S(K((SK)K)))((SK)K)))(K(K(K((SK
    )K)))))))(K((S(KK))((SK)K))))((S((S(K((SK)K)))((S(K((S
    (KS))(S(K((SK)K))))))((S(KK))((SK)K)))))(K((SK)K))))
}
=> (convering to simplified form) {
    '''S''S'KI''S''S'KII'K'K'KI'K''S'KKI''S''S'KI''S'K''S'KS'S'KI''S'KKI'KI
    <=>
    '''S''S'K''SKK''S''S'K''SKK''SKK'K'K'K''SKK'K''S'KK''SKK''S''S'K''SKK''
    S'K''S'KS'S'K''SKK''S'KK''SKK'K''SKK
}
=> (conveting to binary) {
    0000000100000100101100000100000100101111001000100010110010000001001010110000010000010010110000010010000001001001000100101100000100101011001011
    <=>
    00010001001100101111001000100110010111100101111011011011001011110110010011110010111100100010011001011110010011001001110010011001011110010011110010111101100101111
}
=> (converting to hexadecimal) {
    233896F70BDC3C4AFCC09510647F2E47B62F78384995A33A1B538CB46296BD6AA11B51F11E8018A6D2744F76F956D1B48DB0651FF163CEB3
    <=>
    2E9E8A31763346A56A1C2472720F66E0C0F95635EE71C4861C282CC2856F5F64F8AFA8967B8B8AADC44313EE9C6B0E88B1F7057BD7F8BF144E24B8062EF65340DF7
}

The implications are fairly simple - adding new combinators is generally beneficial for the code size (I managed to replicate the same effect on a bunch of SKI calculus expressions). This was just about what I expected - all in all, dictionary compession is pretty much the baseline standard nowadays.

Implementing

In February 2021, I implemented a tiny (976 byte .exe file) SKI calculus evaluator with a GUI for Windows. The code follows.


; A tiny SKI calculus evaluator.
; Copyright (C) Kamila Szewczyk, 2020.
; ------------------------------------
; A bit less braindead version of the bundled 976-byte binary,
; it doesn't leak the memory like a sieve and starts a bit faster,
; at the expense of being larger out-of-the-box (I could pack it
; with my packer to around 1,66K).

format PE GUI 4.0

entry _start

include 'win32ax.inc'

; Node in memory:

; ESI     ESI+4     ESI+8
; v       v         v
; +--------------------------+
; | left  |  right  |  type  |
; +--------------------------+

node.left  equ 0
node.right equ 4
node.type  equ 8

; SKI nodes
TYPE_S  equ 0
TYPE_K  equ 1
TYPE_I  equ 2

; alloc_node node holding two other nodes.
TYPE_BI equ 3

section '.text' code readable executable writeable
    ; Program entry point.
    ; Create the dialog, hook up the dialog procedure,
    ; enter an event loop.
    proc _start
        ; Create a heap, store it's handle in asmh.
        invoke HeapCreate, 0, 0, 0
        mov DWORD [asmh], eax
        ; Get the handle of the current module
        invoke GetModuleHandleA, 0
        ; ... and use it to create a dialog box.
        ; 1 here is the resource identifier for the form.
        invoke CreateDialogParamA, eax, 1, 0, DialogProc, 0
        ; store the dialog handle in hDlg.
        mov DWORD [hDlg], eax
        ; show the dialog.
        invoke ShowWindow, eax, SW_SHOW
        ; window message loop.
    .message_loop:
        ; fetch the next message to msg.
        invoke GetMessage, msg, 0, 0, 0
        ; GetMessage returns 0 => quit
        test eax, eax
        je .quit
        ; if the return value != -1
        inc eax
        jne .isdlg
        ; return value == -1 => an error occured.
        ; ExitProcess(1)
        push 1
        jmp .die
    .isdlg:
        ; is it a dialog message?
        invoke IsDialogMessageA, hDlg, msg
        ; nope, ignore.
        test eax, eax
        jne .message_loop
        ; Otherwise, translate and dispatch it.
        invoke TranslateMessage, msg
        invoke DispatchMessage, msg
        jmp .message_loop
    .quit:
        ; ExitProcess(0)
        push 0
    .die:
        call [ExitProcess]
    endp

    ; Dialog procedure - handling incoming messages.
    proc DialogProc
        ; Stack frame construction.
        push ebp
        mov ebp, esp
        sub esp, 16
        mov edx, DWORD [ebp+12]
        mov eax, DWORD [ebp+8]
        mov ecx, DWORD [ebp+16]
        ; handle WM_CLOSE
        cmp edx, WM_CLOSE
        je .close_handler
        ; handle WM_COMMAND
        cmp edx, WM_COMMAND
        je .command_handler
        ; don't handle everything not being WM_DESTROY
        ; (return FALSE)
        cmp edx, WM_DESTROY
        jne .no_op
        ; ... so we're handling WM_DESTROY here.
        invoke PostQuitMessage, 0
        jmp .c_exit
    .close_handler:
        ; WM_CLOSE => pass around the WM_DESTROY message.
        invoke DestroyWindow, eax
    .c_exit:
        ; common WM_DESTROY and WM_CLOSE fallthrough.
        ; return TRUE.
        xor ebx, ebx
        inc ebx
        ; the only way out is to
        jmp .die
    .command_handler:
        ; 2 is the '&Quit' button ID.
        ; If anything other has been pressed, branch.
        cmp cx, 2
        jne .not_quit
        ; Quit button pressed -> die
        invoke DestroyWindow, eax
    .no_op:
        ; a RETURN FALSE stub for lacking handlers for
        ; WM_COMMAND cases and unknown message ID's.
        xor ebx, ebx
        jmp .die
    .not_quit:
        ; '&Quit' wasn't pressed, so maybe it was '&Evaluate'?
        ; return FALSE if LOWORD(ecx) != 1
        xor ebx, ebx
        dec cx
        jne .die
        ; '&Evaluate' pressed, handle that.
        ; Get the handle to the 3rd dialog item => the expression input
        invoke GetDlgItem, eax, 3
        ; stuff it in wnd
        mov DWORD [wnd], eax
        ; get the text length to allocate approperiate amount of space on the stack
        invoke GetWindowTextLengthA, eax
        ; Save the esp
        mov ecx, esp
        ; Reserve space for the null terminator.
        ; Basically, we're constructing a buffer on the stack
        lea edx, [eax+1]
        add eax, 17
        and eax, 0xFFFFFFF0
        sub ecx, eax
        mov esp, ecx
        ; While we're at it, null-terminate it.
        mov BYTE [esp], 0
        ; Read the control data, put it in the buffer.
        mov DWORD [ebp-12], ecx
        invoke GetWindowTextA, DWORD [wnd], ecx, edx
        mov ecx, DWORD [ebp-12]
        ; Evaluate it.
        call eval
        ; Reset the control text.
        invoke SetWindowText, DWORD [wnd], eax
    .die:
        ; Pop off the VLA
        lea esp, [ebp-8]
        ; Set the correct return value.
        mov eax, ebx
        ; Balance the stack
        pop ebx
        pop esi
        pop ebp
        ret 16
    endp

    ; Calculate the size of the tree, stringified.
    ; Takes the tree in eax.
    proc str_size
        ; Preserve and clear eax, make a copy of the
        ; pointer in ebx.
        push esi ebx
        xor esi, esi
        mov ebx, eax
    .loop:
        ; if node.type == TYPE_BI, then it has two children
        cmp DWORD [ebx+node.type], TYPE_BI
        jne .quit
        ; Apparently it does.
        ; left-recurse to get the lhs size
        mov eax, DWORD [ebx+node.left]
        call str_size
        ; eax contains the lhs size, so everything left
        ; is the rhs size. loop on the right node.
        mov ebx, DWORD [ebx+node.right]
        ; add two bytes for '(' and ')'
        lea esi, [esi+eax+2]
        jmp .loop
    .quit:
        ; The node doesn't have two children - return 1
        ; (a single byte for either S, K or I)
        lea eax, [esi+1]
        pop ebx esi
        ret
    endp

    ; Stringify the tree.
    ; Take it in eax. The buffer is static and
    ; it's the callers' duty to allocate it.
    proc stringify
        ; copy the node pointer to ebx
        push ebx
        mov ebx, eax
        ; first, take the node type.
        mov edx, DWORD [eax+node.type]
        ; because no matter where we branch the buffer will be used,
        ; preload it.
        mov eax, DWORD [buf]
        ; increment the current buffer pointer stored in the variable
        ; and hold own instance, which points to one byte before
        inc eax
        mov DWORD [buf], eax
        dec eax
        ; has two children?
        cmp edx, TYPE_BI
        jne .combinator
        ; alloc_node tree starts with '('
        mov BYTE [eax], '('
        ; Recurse on the lhs and rhs
        mov eax, DWORD [ebx+node.left]
        call stringify
        mov eax, DWORD [ebx+node.right]
        call stringify
        ; increment pointer, store the ')'.
        mov eax, DWORD [buf]
        mov BYTE [eax], ')'
        inc eax
        mov DWORD [buf], eax
        dec eax
        jmp .stop
    .combinator:
        ; stringify the combinator.
        ; use the lookup string for that.
        mov dl, BYTE [ski+edx]
        ; store back the letter.
        mov BYTE [eax], dl
        ; the pointer is already incremented, so we fall thru to return
    .stop:
        pop ebx
        ret
    endp

    ; a wrapper over HeapFree, which frees the pointer in eax.
    ; XXX: inline?
    proc free
        invoke HeapFree, DWORD [asmh], 0, eax
        ret
    endp

    ; free a tree recursively
    proc free_tree
        ; preserve ebx, make a copy of eax
        push ebx
        mov ebx, eax
        ; has children?
        cmp DWORD [eax+node.type], TYPE_BI
        jne .no_children
        ; recurse over children.
        mov eax, DWORD [eax+node.left]
        call free_tree
        mov eax, DWORD [ebx+node.right]
        call free_tree
    .no_children:
        ; take the copy, restore ebx, free the parent.
        mov eax, ebx
        pop ebx
        jmp free
    endp

    ; Allocate a new tree node.
    ; takes new nodes' type in eax.
    proc alloc_node
        ; preserve eax, because it will get trashed by HeapAlloc
        push ebx
        mov ebx, eax
        ; Call HeapAlloc, alloc 4 (left) + 4 (right) + 4 (type) B.
        ; Zero the memory so we don't have to set left and right to NULL.
        invoke HeapAlloc, DWORD [asmh], HEAP_ZERO_MEMORY, 4 + 4 + 4
        ; Set the type.
        mov DWORD [eax+node.type], ebx
        pop ebx
        ret
    endp

    ; read a node from the input buffer, and return it in eax.
    proc read_node
        ; preserve ebx
        push ebx
        ; load the code pointer
        mov eax, DWORD [code]
        ; increment it, store back
        inc eax
        mov DWORD [code], eax
        dec eax
        ; load a byte
        mov al, BYTE [eax]
        ; if al>'K' then al may be 'S'
        cmp al, 'K'
        je  .read_k
        jg  .maybe_s
        ; reading a tree
        cmp al, '('
        je  .read_bitree
        ; if it's not 'I', spew out an error.
        cmp al, 'I'
        jne .parse_error
        ; build an i-node
        push TYPE_I
        pop eax
        jmp .build_node
    .maybe_s:
        ; if it's not 'S', spew out an error.
        cmp al, 'S'
        jne .parse_error
        ; otherwise, clear eax (load TYPE_S)
        ; and build a new node.
        xor eax, eax
        jmp .build_node
    .read_bitree:
        ; load the approperiate type and allocate a node
        push TYPE_BI
        pop eax
        call alloc_node
        mov ebx, eax
        ; read the left node
        call read_node
        mov DWORD [ebx+node.left], eax
        ; eax = 0 => return NULL to broadcast an error.
        test eax, eax
        je .nullify
        ; read the right node
        call read_node
        mov DWORD [ebx+node.right], eax
        test eax, eax
        je .nullify
        ; no errors - increment the code pointer to skip the trailing `)`.
        inc DWORD [code]
        jmp .die
    .read_k:
        ; set eax to 1 (loading TYPE_K)
        ; and fall thru to construction of a new node.
        xor eax, eax
        inc eax
    .build_node:
        pop ebx
        jmp alloc_node
    .parse_error:
        ; in case of a parse error, display a message and fall thru to returning NULL.
        invoke MessageBoxA, 0, msge, 0, MB_OK
    .nullify:
        xor ebx, ebx
    .die:
        ; set the return value and quit
        mov eax, ebx
        pop ebx
        ret
    endp

    ; duplicate a tree in eax.
    proc dup_tree
        push esi ebx
        mov ebx, eax
        ; Make a new node with this node's type.
        mov eax, DWORD [eax+node.type]
        call alloc_node
        ; if type != TYPE_BI then return that instance.
        cmp DWORD [ebx+node.type], TYPE_BI
        jne .shallow
        ; else, clone recursively. copy the original
        ; ptr, because it will get overwritten
        mov esi, eax
        ; clone the left node
        mov eax, DWORD [ebx+node.left]
        call dup_tree
        mov DWORD [esi+node.left], eax
        ; clone the right node
        mov eax, DWORD [ebx+node.right]
        call dup_tree
        mov DWORD [esi+node.right], eax
        ; restore eax
        mov eax, esi
    .shallow:
        pop ebx esi
        ret
    endp

    proc eval_step
        push edi esi ebx
        mov ebx, eax
        ; has one child? if node.left == NULL
        mov eax, DWORD [eax+node.left]
        test eax, eax
        je .no_left
        ; if the first child's type is I
        cmp DWORD [eax+node.type], TYPE_I
        jne .not_inode
        ; got identity, so take the right node.
        mov esi, DWORD [ebx+node.right]
        ; free this node and the left node.
        jmp .clean
    .not_inode:
        ; it's not I. eax is now orig->left
        ; if orig->left->left->type == K
        mov edx, DWORD [eax+node.left]
        ; wait, maybe there is no left node
        test edx, edx
        je .no_left
        ; check the type.
        cmp DWORD [edx+node.type], TYPE_K
        ; branch if it's not K either.
        jne .not_knode
        ; free orig->right and orig->left->left
        ; keep and return orig->left->right
        mov esi, DWORD [eax+node.right]
        mov eax, DWORD [ebx+node.right]
        call free_tree
        mov eax, DWORD [ebx+node.left]
        mov eax, DWORD [eax+node.left]
        ; fallthru to free the orig->left->left node
    .clean:
        call free_tree
        mov eax, ebx
        call free
    .yield_saved:
        mov ebx, esi
        jmp .done
    .not_knode:
        ; if it's not a K or I node, then for sure it's either
        ; a node we have to evaluate recursively _or_ a S node.
        ; check for existence of X = orig->left->left->left
        mov edx, DWORD [edx]
        test edx, edx
        je .no_left
        ; X->type != TYPE_S?
        cmp DWORD [edx+node.type], TYPE_S
        jne .no_left
        ; ok, so definitely it's a S node.
        ; to get ((Sx)y)z = ((xz)(yz)), first build the outer binode.
        push TYPE_BI
        pop eax
        call alloc_node
        ; OK, save it in esi
        mov esi, eax
        ; build two another binodes, and put them as the left and right
        ; node of this tree.
        push 3
        pop eax
        call alloc_node
        mov DWORD [esi+node.left], eax
        push 3
        pop eax
        call alloc_node
        mov DWORD [esi+node.right], eax
        ; now the magic happens. do the following:
        ; (esi + node.left)->left = dup(orig->left->left->right)
        ; (esi + node.left)->right = dup(orig->right)
        ; (esi + node.right)->left = dup(orig->left->right)
        ; (esi + node.right)->right = dup(orig->right)
        ; I'm not sure if this many dup calls are required, but they
        ; help to shave off some space and trouble needed to free the
        ; correct elements of the trees. we're not really aiming for
        ; performance here, so it's alright.
        mov edi, DWORD [esi+node.left]
        mov eax, DWORD [ebx+node.left]
        mov eax, DWORD [eax+node.left]
        mov eax, DWORD [eax+node.right]
        call dup_tree
        mov DWORD [edi+node.left], eax
        mov eax, DWORD [ebx+node.right]
        mov edi, DWORD [esi+node.left]
        call dup_tree
        mov DWORD [edi+node.right], eax
        mov eax, DWORD [ebx+node.left]
        mov edi, DWORD [esi+node.right]
        mov eax, DWORD [eax+node.right]
        call dup_tree
        mov DWORD [edi+node.left], eax
        mov eax, DWORD [ebx+node.right]
        mov edi, DWORD [esi+node.right]
        call dup_tree
        mov DWORD [edi+node.right], eax
        ; free the original tree
        mov eax, ebx
        call free_tree
        jmp .yield_saved
    .no_left:
        ; maybe it's a binode, which we just need to evaluate
        ; deeper to get some observable result?
        cmp DWORD [ebx+node.type], TYPE_BI
        jne .done
        ; recurse twice. first set the left node, then the right node.
        call eval_step
        mov DWORD [ebx+node.left], eax
        mov eax, DWORD [ebx+node.right]
        call eval_step
        mov DWORD [ebx+node.right], eax
    .done:
        mov eax, ebx
        pop ebx esi edi
        ret
    endp

    ; the evaluation wrapper called by the DialogProc.
    ; takes the input buffer in ecx.
    eval:
        push esi ebx
        mov ebx, ecx
        ; store the input in the code buffer.
        mov DWORD [code], ecx
        ; read the expression.
        call read_node
        ; if read_node returns null, then an error occured
        test eax, eax
        je .read_fail
        ; call the evaluation procedure.
        call eval_step
        mov esi, eax
        ; find out the size of the buffer, stringified.
        call str_size
        ; allocate it a byte of extra space.
        inc eax
        invoke HeapAlloc, DWORD [asmh], 0, eax
        ; initialize the output buffer.
        mov DWORD [buf], eax
        ; save the output copy to ourselves to later return it.
        mov ebx, eax
        ; take back the saved buffer, stringify the input into it
        mov eax, esi
        call stringify
        ; NULL terminate
        mov eax, DWORD [buf]
        mov BYTE [eax], 0
        ; free the original tree
        mov eax, esi
        call free_tree
    .read_fail:
        ; in any case, return the value we've got.
        mov eax, ebx
        pop ebx esi
        ret

wnd:  dd 0
msg   MSG
hDlg: dd 0
asmh: dd 0
buf:  dd 0
code: dd 0
ski:  db 'SKI', 0
msge: db '?', 0

section '.rsrc' resource data readable
directory RT_DIALOG, dialogs
resource dialogs, 1, LANG_ENGLISH+SUBLANG_DEFAULT, demo
dialog demo,'SKI calculus',70,70,330,20,WS_CAPTION+WS_POPUP+WS_SYSMENU+DS_MODALFRAME
    dialogitem 'STATIC', '&Code: ', 4, 4, 5, 21, 9, WS_VISIBLE+WS_CHILD+WS_GROUP
    dialogitem 'BUTTON', '&Quit', 2, 269, 4, 50, 11, BS_PUSHBUTTON+WS_CHILD+WS_VISIBLE+WS_GROUP
    dialogitem 'BUTTON', '&Evaluate', 1, 218, 4, 50, 11, BS_DEFPUSHBUTTON+WS_CHILD+WS_VISIBLE+WS_GROUP
    dialogitem 'EDIT', '', 3, 28, 3, 187, 14, ES_LEFT+WS_CHILD+WS_VISIBLE+WS_BORDER+WS_GROUP+ES_AUTOHSCROLL
enddialog

section '.idata' import data readable writable
library kernel32, 'KERNEL32.DLL', \
        user32, 'USER32.DLL'

include 'api\kernel32.inc'
include 'api\user32.inc'

The code was initially published on Github alongside a larger, initial 1.6 kilobyte build. The evaluator being optimised for code size leaks memory like a sieve and doesn’t perform any validation of the input. But it works! And it’s small! Have you ever seen a sub-kilobyte, relatively usable functional programming language(*) evaluator?

* - of course, SKI calculus itself stretches the definition of a functional programming langauge, but considering that Haskell internally uses a typed lambda calculus-like IR, and lambda calculus terms are convertible to combinator calculus terms via abstraction ellimination, this might not be as big of an overstatement as you might have thought. SKI calculus is nowhere near being usable, but it’s still more usable than Iota calculus or any other more minimalistic functional language interpreter.

Making combinator calculus terms

Now, as I have learned how to evaluate, try various different encodings of combinator calculus, and make my own small terms, the only thing left is compiling a higher level language to SK(I) calculus, or more generally, just combinator calculus. The idea follows.

Custom combinators are defined using %[name] [code until the end of the line]. [name] can be any letter, including greek letters. For example - %Y (((SK)K)((S(K((SS)(S((SK)K)))))K)) defines the fixed point combinator as Y. Let’s define a few combinators for a good start:

%I ((SK)K)
%B ((S(KS))K)
%C ((S((S(KB))S))(KK))
%W ((SS)(SK))
%Y (((SK)K)((S(K((SS)(S((SK)K)))))K))

Now, we can define a few basic barewords that are recognized inside expressions. For example:

$true:K
$false:KI
$succ:(SB)
$add:((SI)(K succ))
$0:(KI)
$1:I
$2:(succ 1)
$3:(succ 2)
$4:(succ 3)
$5:(succ 4)
$6:(succ 5)

… and for a good measure, multiplication and predecessor for Church numerals:

$mul:((S((S(KS))((S(K(SI)))((S(KK)) add))))(K(K 0)))
$pred:((S(K((SI)(K 0))))((S((SI)(K((S(K((S(K(S((SI)(K 0)))))K)))((S((\
    S((SI)(KK)))((SI)(K 0))))((S(K succ))((SI)(K 0))))))))(K((S((SI)(KK)))(K 0)))))

\ at the end of the line means that the bare word declaration is spans also the next line. Now the language gets a lispy feel. For instance, (pred 2) compiles to a really long chain of S, K’s and parens.

Concerns

There are a few issues with this very simple design:

  • Numbers. Defining numbers, as awesome as it might feel, is really unweildy especially if it has to be done manually. For this reason, the language should introduce a bunch of macros that capture barewords based on regular expressions.
  • “Something went wrong during compilation, and I don’t know what.” - the classical issue with languages allowing crazy levels of metaprogramming. The language would somehow need to provide a way to “trace” all the expansions of barewords to find the casue of the error. Another (much better) solution would be adding types and constraints…
  • “My code doesn’t work and it’s a mess of S’s and K’s” - a type system and a bunch of constraints/concepts tied to each bareword would solve this problem. It would also be nice to have a builtin evaluator for these, anyway.

All of these concerns beg for a simple answer - “Get rid of the ‘powerful’ metaprogramming and replace it with consistent and predictable metaprogramming” and “Add a type system”.

The show ends here

Disappointing, right? Now that we had a vision of possibly a Scheme-like language being compiled to combinator calculus…

But this content is reserved for future posts. Stay tuned, and I’ll supply a writeup on the combinator calculus language, soon.