Combinator calculus - a lab notebook
Introduction
I really like SKI calculus. The thing that initially introduced me to it was the Esolangs wiki. Also, I read the works of John Tromp on binary lambda calculus and binary combinatory logic (I really enjoyed these and I can recommend them to anyone who wishes to study Kolmogorov complexity’s limits). One evening in June 2020, I implemented the following combinator calculus beta-reduction machine for the purposes of a code golf submission:
#define J putchar
#define H O->a
#define G H->a
#define K O->b
typedef struct x{struct x*a,*b;int q;}Y;Y*O;z=1;A(q){O=calloc(6,4);O->q=q;}h(Y*O
){Y*u;O=H&&H->q==2?z=K:H&&G&&G->q==1?z=H->b:H&&G&&G->a&&!G->a->q?u=A(3),(u->a=A(
3))->a=G->b,(u->b=A(3))->a=H->b,u->a->b=u->b->b=K,z=u:(O->q==3?H=h(H),K=h(K):0,O
);}r(x){Y*O;x=getchar()-73;x=x+33?A(x?x!=10:2):!getchar(K=r(H=r(O=A(3))))+O;}q(Y
*O){O&&J(O->q["SKI "],O->q-3||J(41,q(K),q(H),J(40)));}main(){Y*O=r();for(;z;O=h(
O))z=0;q(O);}
It’s 476 bytes big and most of the space is being taken by the complex combinator expansion logic. This encouraged me to research the topic further. First, I started researching other combinators - I already was aware of lambda calculus and functional programming, so naturally the first combinator I took on was the fixed point combinator (Y
). Later, I discovered the BCW system made by Haskell Curry and quickly wrote a transpiler from the BCW system to the SKI system:
#!/usr/bin/perl -p
s/B/((S(KS))K)/g;
s/C/((S((S(K((S(KS))K)))S))(KK))/g;
s/W/((SS)(SK))/g;
The BCWK system (borrowing the K from SKI calculus) is fairly reasonable and intuitive. First two combinators swap their arguments: B x y z = (((B x) y) z) = ((B x) (y z)) = B x (y z)
, C x y z = (((C x) y) z) = ((x z) y) = x z y
. The last combinator, W
, is just W x y = ((W x) y) = ((x y) y) = x y y
.
Notation
Pondering these made me think about the notation that I use for writing down my SKI calculus programs. So far, I assumed that the input is a perfect binary tree of combinators. But since it’s a binary tree, i.e. each node can have only either two combinators, two nodes, a combinator and a node, or a node and a combinator inside it, we don’t need the closing parens anymore!. Let me explain. This is a mostly readable version of the evaluator above:
/* All the possible node types. */
#define NODE_BI 1
#define NODE_S 2
#define NODE_K 3
#define NODE_I 4
struct node_t {
struct node_t * left, * right;
char type;
};
struct node_t * new_node(void) { /* implementation detail */ }
// The reading function
struct node_t * read() {
switch(getchar()) {
case '(': {
struct node_t * u = new_node(NODE_BI);
u->left = read();
u->right = read();
getchar(); // skip ')'
return u;
}
case 'I': return new_node(NODE_I);
case 'S': return new_node(NODE_S);
case 'K': return new_node(NODE_K);
}
}
An important observation is yet to be made - the reading function doesn’t actually use the closing paren for anything. So we replace it with a different character. I chose an apostrophe - '
. Assuming the input doesn’t contain any data except (
, S
, K
and I
, the following program will convert the new notation to the old notation:
c;main(){c=getchar();if(c=='`'){putchar('(');main();main();putchar(')');}else putchar(c);}
… and the other way round:
c;main(){for(;~(c=getchar());)if(c!=')')putchar(c=='('?'`':c);}
Provided that I already went through the combinators and the closing parens, the element of the encoding to pick on would of course be the opening parens. In many cases - like this one: (((SK)K)I)
they just feel unnecessary and redundant. The tree structure is fairly obvious at the first glance and SKKI
would be so much more readable. This would mean that tree node nesting binds to the left. Illustratively, α β γ
becomes ((α β) γ)
, α β γ δ
becomes (((α β) γ) δ)
, and so on. A(BCDE)F
becomes ((A(((BC)D)E))F)
.e I decided to call this the free-form encoding of combinator calculus. The free-form notation can be easily translated to the binary free notation using this small bit of code:
#define R return
#define P putchar
#define T x->q
typedef struct a{struct a*x,*y;int q;}b,*c;c M(q){
c x=malloc(sizeof(b));x->x=x->y=0;T=q;R x;}c f();c
n(){int c=getchar();if(c==40)R f();if(c==41||c==-1
)R 0;if(isalpha(c))R M(c);R n();}c f(){c x=M(-1),y
,z;z=x->x=n();if(!z)R 0;while(y=n())z=z->y=y;R x;}
void q(c x){if(!x)R;if(~T)P(T);else{c z=x->x;while
(z->y){P(40);z=z->y;}z=x->x;q(z);while(z=z->y){q(z
);P(41);}}}main(){q(f());}
This, of course, assumes that the free-form notation derives from the binary tree notation. But what if we used our simplified notation instead? This would bring down the size even further - α β γ
becomes ((α β) γ)
, α β γ δ
becomes (((α β) γ) δ)
, and so on. Let’s sum up all the encoding laws we came up with so far:
- in the binary tree notation, the closing paren is never necessary.
- the free-form notation is just as powerful as the binary tree notation.
Having worked out these, now let’s bring the encoding to some binary format. First, we start by noticing that the three combinators and the tree node opening character are 4 different states in total, which means that we can encode them with 4 bits. The encoding would be, for example:
00 - '
01 - S
10 - K
11 - I
… but just as well we could drop the I combinator (as it’s simply just SKK
), meaning that we have three states now, so we can encode our data using one or two binary digits:
0 - '
10 - S
11 - K
Because the drawbacks of each encoding aren’t instantaneously obvious, we can test them on a few example SKI calculus expressions. Example:
{
(((S((S(KI))((S((S(KI))I))(K(K(KI))))))(K((S(KK))I)))(
(S((S(KI))((S(K((S(KS))(S(KI)))))((S(KK))I))))(KI)))
<=> [ replacing I with (SK)K ]
(((S((S(K((SK)K)))((S((S(K((SK)K)))((SK)K)))(K(K(K((SK
)K)))))))(K((S(KK))((SK)K))))((S((S(K((SK)K)))((S(K((S
(KS))(S(K((SK)K))))))((S(KK))((SK)K)))))(K((SK)K))))
}
=> (convering to simplified form) {
'''S''S'KI''S''S'KII'K'K'KI'K''S'KKI''S''S'KI''S'K''S'KS'S'KI''S'KKI'KI
<=>
'''S''S'K''SKK''S''S'K''SKK''SKK'K'K'K''SKK'K''S'KK''SKK''S''S'K''SKK''
S'K''S'KS'S'K''SKK''S'KK''SKK'K''SKK
}
=> (conveting to binary) {
0000000100000100101100000100000100101111001000100010110010000001001010110000010000010010110000010010000001001001000100101100000100101011001011
<=>
00010001001100101111001000100110010111100101111011011011001011110110010011110010111100100010011001011110010011001001110010011001011110010011110010111101100101111
}
=> (converting to hexadecimal) {
233896F70BDC3C4AFCC09510647F2E47B62F78384995A33A1B538CB46296BD6AA11B51F11E8018A6D2744F76F956D1B48DB0651FF163CEB3
<=>
2E9E8A31763346A56A1C2472720F66E0C0F95635EE71C4861C282CC2856F5F64F8AFA8967B8B8AADC44313EE9C6B0E88B1F7057BD7F8BF144E24B8062EF65340DF7
}
The implications are fairly simple - adding new combinators is generally beneficial for the code size (I managed to replicate the same effect on a bunch of SKI calculus expressions). This was just about what I expected - all in all, dictionary compession is pretty much the baseline standard nowadays.
Implementing
In February 2021, I implemented a tiny (976 byte .exe file) SKI calculus evaluator with a GUI for Windows. The code follows.
; A tiny SKI calculus evaluator.
; Copyright (C) Kamila Szewczyk, 2020.
; ------------------------------------
; A bit less braindead version of the bundled 976-byte binary,
; it doesn't leak the memory like a sieve and starts a bit faster,
; at the expense of being larger out-of-the-box (I could pack it
; with my packer to around 1,66K).
format PE GUI 4.0
entry _start
include 'win32ax.inc'
; Node in memory:
; ESI ESI+4 ESI+8
; v v v
; +--------------------------+
; | left | right | type |
; +--------------------------+
node.left equ 0
node.right equ 4
node.type equ 8
; SKI nodes
TYPE_S equ 0
TYPE_K equ 1
TYPE_I equ 2
; alloc_node node holding two other nodes.
TYPE_BI equ 3
section '.text' code readable executable writeable
; Program entry point.
; Create the dialog, hook up the dialog procedure,
; enter an event loop.
proc _start
; Create a heap, store it's handle in asmh.
invoke HeapCreate, 0, 0, 0
mov DWORD [asmh], eax
; Get the handle of the current module
invoke GetModuleHandleA, 0
; ... and use it to create a dialog box.
; 1 here is the resource identifier for the form.
invoke CreateDialogParamA, eax, 1, 0, DialogProc, 0
; store the dialog handle in hDlg.
mov DWORD [hDlg], eax
; show the dialog.
invoke ShowWindow, eax, SW_SHOW
; window message loop.
.message_loop:
; fetch the next message to msg.
invoke GetMessage, msg, 0, 0, 0
; GetMessage returns 0 => quit
test eax, eax
je .quit
; if the return value != -1
inc eax
jne .isdlg
; return value == -1 => an error occured.
; ExitProcess(1)
push 1
jmp .die
.isdlg:
; is it a dialog message?
invoke IsDialogMessageA, hDlg, msg
; nope, ignore.
test eax, eax
jne .message_loop
; Otherwise, translate and dispatch it.
invoke TranslateMessage, msg
invoke DispatchMessage, msg
jmp .message_loop
.quit:
; ExitProcess(0)
push 0
.die:
call [ExitProcess]
endp
; Dialog procedure - handling incoming messages.
proc DialogProc
; Stack frame construction.
push ebp
mov ebp, esp
sub esp, 16
mov edx, DWORD [ebp+12]
mov eax, DWORD [ebp+8]
mov ecx, DWORD [ebp+16]
; handle WM_CLOSE
cmp edx, WM_CLOSE
je .close_handler
; handle WM_COMMAND
cmp edx, WM_COMMAND
je .command_handler
; don't handle everything not being WM_DESTROY
; (return FALSE)
cmp edx, WM_DESTROY
jne .no_op
; ... so we're handling WM_DESTROY here.
invoke PostQuitMessage, 0
jmp .c_exit
.close_handler:
; WM_CLOSE => pass around the WM_DESTROY message.
invoke DestroyWindow, eax
.c_exit:
; common WM_DESTROY and WM_CLOSE fallthrough.
; return TRUE.
xor ebx, ebx
inc ebx
; the only way out is to
jmp .die
.command_handler:
; 2 is the '&Quit' button ID.
; If anything other has been pressed, branch.
cmp cx, 2
jne .not_quit
; Quit button pressed -> die
invoke DestroyWindow, eax
.no_op:
; a RETURN FALSE stub for lacking handlers for
; WM_COMMAND cases and unknown message ID's.
xor ebx, ebx
jmp .die
.not_quit:
; '&Quit' wasn't pressed, so maybe it was '&Evaluate'?
; return FALSE if LOWORD(ecx) != 1
xor ebx, ebx
dec cx
jne .die
; '&Evaluate' pressed, handle that.
; Get the handle to the 3rd dialog item => the expression input
invoke GetDlgItem, eax, 3
; stuff it in wnd
mov DWORD [wnd], eax
; get the text length to allocate approperiate amount of space on the stack
invoke GetWindowTextLengthA, eax
; Save the esp
mov ecx, esp
; Reserve space for the null terminator.
; Basically, we're constructing a buffer on the stack
lea edx, [eax+1]
add eax, 17
and eax, 0xFFFFFFF0
sub ecx, eax
mov esp, ecx
; While we're at it, null-terminate it.
mov BYTE [esp], 0
; Read the control data, put it in the buffer.
mov DWORD [ebp-12], ecx
invoke GetWindowTextA, DWORD [wnd], ecx, edx
mov ecx, DWORD [ebp-12]
; Evaluate it.
call eval
; Reset the control text.
invoke SetWindowText, DWORD [wnd], eax
.die:
; Pop off the VLA
lea esp, [ebp-8]
; Set the correct return value.
mov eax, ebx
; Balance the stack
pop ebx
pop esi
pop ebp
ret 16
endp
; Calculate the size of the tree, stringified.
; Takes the tree in eax.
proc str_size
; Preserve and clear eax, make a copy of the
; pointer in ebx.
push esi ebx
xor esi, esi
mov ebx, eax
.loop:
; if node.type == TYPE_BI, then it has two children
cmp DWORD [ebx+node.type], TYPE_BI
jne .quit
; Apparently it does.
; left-recurse to get the lhs size
mov eax, DWORD [ebx+node.left]
call str_size
; eax contains the lhs size, so everything left
; is the rhs size. loop on the right node.
mov ebx, DWORD [ebx+node.right]
; add two bytes for '(' and ')'
lea esi, [esi+eax+2]
jmp .loop
.quit:
; The node doesn't have two children - return 1
; (a single byte for either S, K or I)
lea eax, [esi+1]
pop ebx esi
ret
endp
; Stringify the tree.
; Take it in eax. The buffer is static and
; it's the callers' duty to allocate it.
proc stringify
; copy the node pointer to ebx
push ebx
mov ebx, eax
; first, take the node type.
mov edx, DWORD [eax+node.type]
; because no matter where we branch the buffer will be used,
; preload it.
mov eax, DWORD [buf]
; increment the current buffer pointer stored in the variable
; and hold own instance, which points to one byte before
inc eax
mov DWORD [buf], eax
dec eax
; has two children?
cmp edx, TYPE_BI
jne .combinator
; alloc_node tree starts with '('
mov BYTE [eax], '('
; Recurse on the lhs and rhs
mov eax, DWORD [ebx+node.left]
call stringify
mov eax, DWORD [ebx+node.right]
call stringify
; increment pointer, store the ')'.
mov eax, DWORD [buf]
mov BYTE [eax], ')'
inc eax
mov DWORD [buf], eax
dec eax
jmp .stop
.combinator:
; stringify the combinator.
; use the lookup string for that.
mov dl, BYTE [ski+edx]
; store back the letter.
mov BYTE [eax], dl
; the pointer is already incremented, so we fall thru to return
.stop:
pop ebx
ret
endp
; a wrapper over HeapFree, which frees the pointer in eax.
; XXX: inline?
proc free
invoke HeapFree, DWORD [asmh], 0, eax
ret
endp
; free a tree recursively
proc free_tree
; preserve ebx, make a copy of eax
push ebx
mov ebx, eax
; has children?
cmp DWORD [eax+node.type], TYPE_BI
jne .no_children
; recurse over children.
mov eax, DWORD [eax+node.left]
call free_tree
mov eax, DWORD [ebx+node.right]
call free_tree
.no_children:
; take the copy, restore ebx, free the parent.
mov eax, ebx
pop ebx
jmp free
endp
; Allocate a new tree node.
; takes new nodes' type in eax.
proc alloc_node
; preserve eax, because it will get trashed by HeapAlloc
push ebx
mov ebx, eax
; Call HeapAlloc, alloc 4 (left) + 4 (right) + 4 (type) B.
; Zero the memory so we don't have to set left and right to NULL.
invoke HeapAlloc, DWORD [asmh], HEAP_ZERO_MEMORY, 4 + 4 + 4
; Set the type.
mov DWORD [eax+node.type], ebx
pop ebx
ret
endp
; read a node from the input buffer, and return it in eax.
proc read_node
; preserve ebx
push ebx
; load the code pointer
mov eax, DWORD [code]
; increment it, store back
inc eax
mov DWORD [code], eax
dec eax
; load a byte
mov al, BYTE [eax]
; if al>'K' then al may be 'S'
cmp al, 'K'
je .read_k
jg .maybe_s
; reading a tree
cmp al, '('
je .read_bitree
; if it's not 'I', spew out an error.
cmp al, 'I'
jne .parse_error
; build an i-node
push TYPE_I
pop eax
jmp .build_node
.maybe_s:
; if it's not 'S', spew out an error.
cmp al, 'S'
jne .parse_error
; otherwise, clear eax (load TYPE_S)
; and build a new node.
xor eax, eax
jmp .build_node
.read_bitree:
; load the approperiate type and allocate a node
push TYPE_BI
pop eax
call alloc_node
mov ebx, eax
; read the left node
call read_node
mov DWORD [ebx+node.left], eax
; eax = 0 => return NULL to broadcast an error.
test eax, eax
je .nullify
; read the right node
call read_node
mov DWORD [ebx+node.right], eax
test eax, eax
je .nullify
; no errors - increment the code pointer to skip the trailing `)`.
inc DWORD [code]
jmp .die
.read_k:
; set eax to 1 (loading TYPE_K)
; and fall thru to construction of a new node.
xor eax, eax
inc eax
.build_node:
pop ebx
jmp alloc_node
.parse_error:
; in case of a parse error, display a message and fall thru to returning NULL.
invoke MessageBoxA, 0, msge, 0, MB_OK
.nullify:
xor ebx, ebx
.die:
; set the return value and quit
mov eax, ebx
pop ebx
ret
endp
; duplicate a tree in eax.
proc dup_tree
push esi ebx
mov ebx, eax
; Make a new node with this node's type.
mov eax, DWORD [eax+node.type]
call alloc_node
; if type != TYPE_BI then return that instance.
cmp DWORD [ebx+node.type], TYPE_BI
jne .shallow
; else, clone recursively. copy the original
; ptr, because it will get overwritten
mov esi, eax
; clone the left node
mov eax, DWORD [ebx+node.left]
call dup_tree
mov DWORD [esi+node.left], eax
; clone the right node
mov eax, DWORD [ebx+node.right]
call dup_tree
mov DWORD [esi+node.right], eax
; restore eax
mov eax, esi
.shallow:
pop ebx esi
ret
endp
proc eval_step
push edi esi ebx
mov ebx, eax
; has one child? if node.left == NULL
mov eax, DWORD [eax+node.left]
test eax, eax
je .no_left
; if the first child's type is I
cmp DWORD [eax+node.type], TYPE_I
jne .not_inode
; got identity, so take the right node.
mov esi, DWORD [ebx+node.right]
; free this node and the left node.
jmp .clean
.not_inode:
; it's not I. eax is now orig->left
; if orig->left->left->type == K
mov edx, DWORD [eax+node.left]
; wait, maybe there is no left node
test edx, edx
je .no_left
; check the type.
cmp DWORD [edx+node.type], TYPE_K
; branch if it's not K either.
jne .not_knode
; free orig->right and orig->left->left
; keep and return orig->left->right
mov esi, DWORD [eax+node.right]
mov eax, DWORD [ebx+node.right]
call free_tree
mov eax, DWORD [ebx+node.left]
mov eax, DWORD [eax+node.left]
; fallthru to free the orig->left->left node
.clean:
call free_tree
mov eax, ebx
call free
.yield_saved:
mov ebx, esi
jmp .done
.not_knode:
; if it's not a K or I node, then for sure it's either
; a node we have to evaluate recursively _or_ a S node.
; check for existence of X = orig->left->left->left
mov edx, DWORD [edx]
test edx, edx
je .no_left
; X->type != TYPE_S?
cmp DWORD [edx+node.type], TYPE_S
jne .no_left
; ok, so definitely it's a S node.
; to get ((Sx)y)z = ((xz)(yz)), first build the outer binode.
push TYPE_BI
pop eax
call alloc_node
; OK, save it in esi
mov esi, eax
; build two another binodes, and put them as the left and right
; node of this tree.
push 3
pop eax
call alloc_node
mov DWORD [esi+node.left], eax
push 3
pop eax
call alloc_node
mov DWORD [esi+node.right], eax
; now the magic happens. do the following:
; (esi + node.left)->left = dup(orig->left->left->right)
; (esi + node.left)->right = dup(orig->right)
; (esi + node.right)->left = dup(orig->left->right)
; (esi + node.right)->right = dup(orig->right)
; I'm not sure if this many dup calls are required, but they
; help to shave off some space and trouble needed to free the
; correct elements of the trees. we're not really aiming for
; performance here, so it's alright.
mov edi, DWORD [esi+node.left]
mov eax, DWORD [ebx+node.left]
mov eax, DWORD [eax+node.left]
mov eax, DWORD [eax+node.right]
call dup_tree
mov DWORD [edi+node.left], eax
mov eax, DWORD [ebx+node.right]
mov edi, DWORD [esi+node.left]
call dup_tree
mov DWORD [edi+node.right], eax
mov eax, DWORD [ebx+node.left]
mov edi, DWORD [esi+node.right]
mov eax, DWORD [eax+node.right]
call dup_tree
mov DWORD [edi+node.left], eax
mov eax, DWORD [ebx+node.right]
mov edi, DWORD [esi+node.right]
call dup_tree
mov DWORD [edi+node.right], eax
; free the original tree
mov eax, ebx
call free_tree
jmp .yield_saved
.no_left:
; maybe it's a binode, which we just need to evaluate
; deeper to get some observable result?
cmp DWORD [ebx+node.type], TYPE_BI
jne .done
; recurse twice. first set the left node, then the right node.
call eval_step
mov DWORD [ebx+node.left], eax
mov eax, DWORD [ebx+node.right]
call eval_step
mov DWORD [ebx+node.right], eax
.done:
mov eax, ebx
pop ebx esi edi
ret
endp
; the evaluation wrapper called by the DialogProc.
; takes the input buffer in ecx.
eval:
push esi ebx
mov ebx, ecx
; store the input in the code buffer.
mov DWORD [code], ecx
; read the expression.
call read_node
; if read_node returns null, then an error occured
test eax, eax
je .read_fail
; call the evaluation procedure.
call eval_step
mov esi, eax
; find out the size of the buffer, stringified.
call str_size
; allocate it a byte of extra space.
inc eax
invoke HeapAlloc, DWORD [asmh], 0, eax
; initialize the output buffer.
mov DWORD [buf], eax
; save the output copy to ourselves to later return it.
mov ebx, eax
; take back the saved buffer, stringify the input into it
mov eax, esi
call stringify
; NULL terminate
mov eax, DWORD [buf]
mov BYTE [eax], 0
; free the original tree
mov eax, esi
call free_tree
.read_fail:
; in any case, return the value we've got.
mov eax, ebx
pop ebx esi
ret
wnd: dd 0
msg MSG
hDlg: dd 0
asmh: dd 0
buf: dd 0
code: dd 0
ski: db 'SKI', 0
msge: db '?', 0
section '.rsrc' resource data readable
directory RT_DIALOG, dialogs
resource dialogs, 1, LANG_ENGLISH+SUBLANG_DEFAULT, demo
dialog demo,'SKI calculus',70,70,330,20,WS_CAPTION+WS_POPUP+WS_SYSMENU+DS_MODALFRAME
dialogitem 'STATIC', '&Code: ', 4, 4, 5, 21, 9, WS_VISIBLE+WS_CHILD+WS_GROUP
dialogitem 'BUTTON', '&Quit', 2, 269, 4, 50, 11, BS_PUSHBUTTON+WS_CHILD+WS_VISIBLE+WS_GROUP
dialogitem 'BUTTON', '&Evaluate', 1, 218, 4, 50, 11, BS_DEFPUSHBUTTON+WS_CHILD+WS_VISIBLE+WS_GROUP
dialogitem 'EDIT', '', 3, 28, 3, 187, 14, ES_LEFT+WS_CHILD+WS_VISIBLE+WS_BORDER+WS_GROUP+ES_AUTOHSCROLL
enddialog
section '.idata' import data readable writable
library kernel32, 'KERNEL32.DLL', \
user32, 'USER32.DLL'
include 'api\kernel32.inc'
include 'api\user32.inc'
The code was initially published on Github alongside a larger, initial 1.6 kilobyte build. The evaluator being optimised for code size leaks memory like a sieve and doesn’t perform any validation of the input. But it works! And it’s small! Have you ever seen a sub-kilobyte, relatively usable functional programming language(*) evaluator?
* - of course, SKI calculus itself stretches the definition of a functional programming langauge, but considering that Haskell internally uses a typed lambda calculus-like IR, and lambda calculus terms are convertible to combinator calculus terms via abstraction ellimination, this might not be as big of an overstatement as you might have thought. SKI calculus is nowhere near being usable, but it’s still more usable than Iota calculus or any other more minimalistic functional language interpreter.
Making combinator calculus terms
Now, as I have learned how to evaluate, try various different encodings of combinator calculus, and make my own small terms, the only thing left is compiling a higher level language to SK(I) calculus, or more generally, just combinator calculus. The idea follows.
Custom combinators are defined using %[name] [code until the end of the line]
. [name]
can be any letter, including greek letters. For example - %Y (((SK)K)((S(K((SS)(S((SK)K)))))K))
defines the fixed point combinator as Y
. Let’s define a few combinators for a good start:
%I ((SK)K)
%B ((S(KS))K)
%C ((S((S(KB))S))(KK))
%W ((SS)(SK))
%Y (((SK)K)((S(K((SS)(S((SK)K)))))K))
Now, we can define a few basic barewords that are recognized inside expressions. For example:
$true:K
$false:KI
$succ:(SB)
$add:((SI)(K succ))
$0:(KI)
$1:I
$2:(succ 1)
$3:(succ 2)
$4:(succ 3)
$5:(succ 4)
$6:(succ 5)
… and for a good measure, multiplication and predecessor for Church numerals:
$mul:((S((S(KS))((S(K(SI)))((S(KK)) add))))(K(K 0)))
$pred:((S(K((SI)(K 0))))((S((SI)(K((S(K((S(K(S((SI)(K 0)))))K)))((S((\
S((SI)(KK)))((SI)(K 0))))((S(K succ))((SI)(K 0))))))))(K((S((SI)(KK)))(K 0)))))
\
at the end of the line means that the bare word declaration is spans also the next line. Now the language gets a lispy feel. For instance, (pred 2)
compiles to a really long chain of S
, K
’s and parens.
Concerns
There are a few issues with this very simple design:
- Numbers. Defining numbers, as awesome as it might feel, is really unweildy especially if it has to be done manually. For this reason, the language should introduce a bunch of macros that capture barewords based on regular expressions.
- “Something went wrong during compilation, and I don’t know what.” - the classical issue with languages allowing crazy levels of metaprogramming. The language would somehow need to provide a way to “trace” all the expansions of barewords to find the casue of the error. Another (much better) solution would be adding types and constraints…
- “My code doesn’t work and it’s a mess of S’s and K’s” - a type system and a bunch of constraints/concepts tied to each bareword would solve this problem. It would also be nice to have a builtin evaluator for these, anyway.
All of these concerns beg for a simple answer - “Get rid of the ‘powerful’ metaprogramming and replace it with consistent and predictable metaprogramming” and “Add a type system”.
The show ends here
Disappointing, right? Now that we had a vision of possibly a Scheme-like language being compiled to combinator calculus…
But this content is reserved for future posts. Stay tuned, and I’ll supply a writeup on the combinator calculus language, soon.