Combinator calculus - a lab notebook
Introduction⌗
I really like SKI calculus. The thing that initially introduced me to it was the Esolangs wiki. Also, I read the works of John Tromp on binary lambda calculus and binary combinatory logic (I really enjoyed these and I can recommend them to anyone who wishes to study Kolmogorov complexity’s limits). One evening in June 2020, I implemented the following combinator calculus beta-reduction machine for the purposes of a code golf submission:
It’s 476 bytes big and most of the space is being taken by the complex combinator expansion logic. This encouraged me to research the topic further. First, I started researching other combinators - I already was aware of lambda calculus and functional programming, so naturally the first combinator I took on was the fixed point combinator (Y
). Later, I discovered the BCW system made by Haskell Curry and quickly wrote a transpiler from the BCW system to the SKI system:
The BCWK system (borrowing the K from SKI calculus) is fairly reasonable and intuitive. First two combinators swap their arguments: B x y z = (((B x) y) z) = ((B x) (y z)) = B x (y z)
, C x y z = (((C x) y) z) = ((x z) y) = x z y
. The last combinator, W
, is just W x y = ((W x) y) = ((x y) y) = x y y
.
Notation⌗
Pondering these made me think about the notation that I use for writing down my SKI calculus programs. So far, I assumed that the input is a perfect binary tree of combinators. But since it’s a binary tree, i.e. each node can have only either two combinators, two nodes, a combinator and a node, or a node and a combinator inside it, we don’t need the closing parens anymore!. Let me explain. This is a mostly readable version of the evaluator above:
An important observation is yet to be made - the reading function doesn’t actually use the closing paren for anything. So we replace it with a different character. I chose an apostrophe - '
. Assuming the input doesn’t contain any data except (
, S
, K
and I
, the following program will convert the new notation to the old notation:
… and the other way round:
Provided that I already went through the combinators and the closing parens, the element of the encoding to pick on would of course be the opening parens. In many cases - like this one: (((SK)K)I)
they just feel unnecessary and redundant. The tree structure is fairly obvious at the first glance and SKKI
would be so much more readable. This would mean that tree node nesting binds to the left. Illustratively, α β γ
becomes ((α β) γ)
, α β γ δ
becomes (((α β) γ) δ)
, and so on. A(BCDE)F
becomes ((A(((BC)D)E))F)
.e I decided to call this the free-form encoding of combinator calculus. The free-form notation can be easily translated to the binary free notation using this small bit of code:
This, of course, assumes that the free-form notation derives from the binary tree notation. But what if we used our simplified notation instead? This would bring down the size even further - α β γ
becomes ((α β) γ)
, α β γ δ
becomes (((α β) γ) δ)
, and so on. Let’s sum up all the encoding laws we came up with so far:
- in the binary tree notation, the closing paren is never necessary.
- the free-form notation is just as powerful as the binary tree notation.
Having worked out these, now let’s bring the encoding to some binary format. First, we start by noticing that the three combinators and the tree node opening character are 4 different states in total, which means that we can encode them with 4 bits. The encoding would be, for example:
00 - '
01 - S
10 - K
11 - I
… but just as well we could drop the I combinator (as it’s simply just SKK
), meaning that we have three states now, so we can encode our data using one or two binary digits:
0 - '
10 - S
11 - K
Because the drawbacks of each encoding aren’t instantaneously obvious, we can test them on a few example SKI calculus expressions. Example:
{
(((S((S(KI))((S((S(KI))I))(K(K(KI))))))(K((S(KK))I)))(
(S((S(KI))((S(K((S(KS))(S(KI)))))((S(KK))I))))(KI)))
<=> [ replacing I with (SK)K ]
(((S((S(K((SK)K)))((S((S(K((SK)K)))((SK)K)))(K(K(K((SK
)K)))))))(K((S(KK))((SK)K))))((S((S(K((SK)K)))((S(K((S
(KS))(S(K((SK)K))))))((S(KK))((SK)K)))))(K((SK)K))))
}
=> (convering to simplified form) {
'''S''S'KI''S''S'KII'K'K'KI'K''S'KKI''S''S'KI''S'K''S'KS'S'KI''S'KKI'KI
<=>
'''S''S'K''SKK''S''S'K''SKK''SKK'K'K'K''SKK'K''S'KK''SKK''S''S'K''SKK''
S'K''S'KS'S'K''SKK''S'KK''SKK'K''SKK
}
=> (conveting to binary) {
0000000100000100101100000100000100101111001000100010110010000001001010110000010000010010110000010010000001001001000100101100000100101011001011
<=>
00010001001100101111001000100110010111100101111011011011001011110110010011110010111100100010011001011110010011001001110010011001011110010011110010111101100101111
}
=> (converting to hexadecimal) {
233896F70BDC3C4AFCC09510647F2E47B62F78384995A33A1B538CB46296BD6AA11B51F11E8018A6D2744F76F956D1B48DB0651FF163CEB3
<=>
2E9E8A31763346A56A1C2472720F66E0C0F95635EE71C4861C282CC2856F5F64F8AFA8967B8B8AADC44313EE9C6B0E88B1F7057BD7F8BF144E24B8062EF65340DF7
}
The implications are fairly simple - adding new combinators is generally beneficial for the code size (I managed to replicate the same effect on a bunch of SKI calculus expressions). This was just about what I expected - all in all, dictionary compession is pretty much the baseline standard nowadays.
Implementing⌗
In February 2021, I implemented a tiny (976 byte .exe file) SKI calculus evaluator with a GUI for Windows. The code follows.
The code was initially published on Github alongside a larger, initial 1.6 kilobyte build. The evaluator being optimised for code size leaks memory like a sieve and doesn’t perform any validation of the input. But it works! And it’s small! Have you ever seen a sub-kilobyte, relatively usable functional programming language(*) evaluator?
* - of course, SKI calculus itself stretches the definition of a functional programming langauge, but considering that Haskell internally uses a typed lambda calculus-like IR, and lambda calculus terms are convertible to combinator calculus terms via abstraction ellimination, this might not be as big of an overstatement as you might have thought. SKI calculus is nowhere near being usable, but it’s still more usable than Iota calculus or any other more minimalistic functional language interpreter.
Making combinator calculus terms⌗
Now, as I have learned how to evaluate, try various different encodings of combinator calculus, and make my own small terms, the only thing left is compiling a higher level language to SK(I) calculus, or more generally, just combinator calculus. The idea follows.
Custom combinators are defined using %[name] [code until the end of the line]
. [name]
can be any letter, including greek letters. For example - %Y (((SK)K)((S(K((SS)(S((SK)K)))))K))
defines the fixed point combinator as Y
. Let’s define a few combinators for a good start:
%I ((SK)K)
%B ((S(KS))K)
%C ((S((S(KB))S))(KK))
%W ((SS)(SK))
%Y (((SK)K)((S(K((SS)(S((SK)K)))))K))
Now, we can define a few basic barewords that are recognized inside expressions. For example:
$true:K
$false:KI
$succ:(SB)
$add:((SI)(K succ))
$0:(KI)
$1:I
$2:(succ 1)
$3:(succ 2)
$4:(succ 3)
$5:(succ 4)
$6:(succ 5)
… and for a good measure, multiplication and predecessor for Church numerals:
$mul:((S((S(KS))((S(K(SI)))((S(KK)) add))))(K(K 0)))
$pred:((S(K((SI)(K 0))))((S((SI)(K((S(K((S(K(S((SI)(K 0)))))K)))((S((\
S((SI)(KK)))((SI)(K 0))))((S(K succ))((SI)(K 0))))))))(K((S((SI)(KK)))(K 0)))))
\
at the end of the line means that the bare word declaration is spans also the next line. Now the language gets a lispy feel. For instance, (pred 2)
compiles to a really long chain of S
, K
’s and parens.
Concerns⌗
There are a few issues with this very simple design:
- Numbers. Defining numbers, as awesome as it might feel, is really unweildy especially if it has to be done manually. For this reason, the language should introduce a bunch of macros that capture barewords based on regular expressions.
- “Something went wrong during compilation, and I don’t know what.” - the classical issue with languages allowing crazy levels of metaprogramming. The language would somehow need to provide a way to “trace” all the expansions of barewords to find the casue of the error. Another (much better) solution would be adding types and constraints…
- “My code doesn’t work and it’s a mess of S’s and K’s” - a type system and a bunch of constraints/concepts tied to each bareword would solve this problem. It would also be nice to have a builtin evaluator for these, anyway.
All of these concerns beg for a simple answer - “Get rid of the ‘powerful’ metaprogramming and replace it with consistent and predictable metaprogramming” and “Add a type system”.
The show ends here⌗
Disappointing, right? Now that we had a vision of possibly a Scheme-like language being compiled to combinator calculus…
But this content is reserved for future posts. Stay tuned, and I’ll supply a writeup on the combinator calculus language, soon.