Introduction⌗

λ calculus is a system for expressing computation, terms of which are built using the following rules:

Abstraction: λx. X - an anonymous function definition.
Application: (X Y) - applying a function X to the argument Y.
Variables: x - a bound or free reference.

The following operations are commonly performed on λ terms:

α conversion: λx. A[x] → λy. A[y] (renaming bound variables in an expression).
β reduction: ((λx. A[x]) B) → A[x := B] (substitution).
η conversion: f → λx. (f x) (adding abstraction).
η reduction: λx. (f x) → f (removing abstraction).

Elementary examples⌗

Consider the following sequence of operations aiming at providing the normal form of the initial term:

(((λx. (λy. (λz. ((x z) (y z))))) (λx. (λy. x))) (λx. (λy. x)))

α => (((λx. (λy. (λz. ((x z) (y z))))) (λX₀. (λy. X₀))) (λx. (λy. x)))
β => ((λy. (λz. (((λX₀. (λy. X₀)) z) (y z)))) (λx. (λy. x)))
β => ((λy. (λz. ((λy. z) (y z)))) (λx. (λy. x)))
α => ((λy. (λX₁. ((λy. X₁) (y X₁)))) (λx. (λy. x)))
β => ((λy. (λX₁. X₁)) (λx. (λy. x)))
β => (λX₁. X₁)

(λX₁. X₁)

Some terms do not have a normal form. For example, the term Ω = (λx.x x) (λx.x x) can be β-reduced infinitely.

Preliminary considerations⌗

I started implementing the λ-calculus evaluator with the symbol table. The symbol table could contain many useful pre-defined terms, but for the purposes of this article, I will limit it to Church terms for true and false. Other terms could be easily added.

I begin my implementation with some variable shuffling to ensure that the default symbol table is loaded if an alternative has not been supplied. I also reset the values of system variables ⎕IO and ⎕ML:

lam←{
    ⎕IO⎕ML←1 1
    ∆d←⍪('true' ('Lam' (,'t') ('Lam' (,'f') ('Var' (,'t')))))
    ∆d,←('false' ('Lam' (,'t') ('Lam' (,'f') ('Var' (,'f')))))
    ⍺←∆d⋄∆t←⍺
}

The next step is defining functions to insert and remove a value from the symbol table. Insertion is trivial:

⍺←∆d⋄∆t←⍺⋄∆hi←{∆t,←⍺⍵}

Removing entries from the symbol table involves finding the index of the entry and then, using ↑ and ↓, ignoring the removed column when reassigning the table:

⍺←∆d⋄∆t←⍺⋄∆hi←{∆t,←⍺⍵}⋄∆hr←{∆t∘←((∆t↑[2]⍨¯1∘+),∆t↓[2]⍨⊢)⍵⍳⍨,1↑∆t}

I define a few helper functions. Raising an error and hashing an array:

err←{⍵⎕SIGNAL 8}
hash←{{1e10|⍺+31×⍵}/128+1(220⌶)⍵}

The array hash method uses the I-beam 220 (to serialise the array to a signed byte array), the array is turned into an unsigned array to improve the characteristics of the hash function. The hash holds an accumulator, to which the product of the next input and a small prime number (31) is added. The operation is performed modulo 1e10 to avoid overflows.

Next, I define a vector of standalone single character tokens that will be consumed by the lexer.

l←'()λ.='

To facilitate α conversion, I create a helper function to append a subscript index to a given variable:

sd←'₀₁₂₃₄₅₆₇₈₉'⋄ad←{⍺,sd[,1+10⊥⍣¯1⊢⍵]}

Lexical analysis⌗

Lexical analysis of λ calculus terms is remarkably simple. The only lexemes that need to be recognised are the standalone single character tokens described above, whitespace and line comments (to be ignored) and identifiers. As a recursive function, the lexer begins with a check for the input length and the single character token matcher:

lx←{0=≢⍵:⍬⋄6≠i←l⍳⊃⍵:(⊂0,⊂i⊃l),∇1↓⍵}

Then, whitespace and line comments are ignored (terminated by the ASCII code 10 or end of the input):

lx←{0=≢⍵:⍬⋄6≠i←l⍳⊃⍵:(⊂0,⊂i⊃l),∇1↓⍵⋄3≠(⎕UCS 10 32)⍳⊃⍵:∇1↓⍵⋄'#'=⊃⍵:∇⍵↓⍨1⍳⍨1,⍨⍵=⎕UCS 10}

The only remaining token type that needs to be recognised is the identifier. Define the characters that are allowed inside of an identifier:

lx←{0=≢⍵:⍬⋄6≠i←l⍳⊃⍵:(⊂0,⊂i⊃l),∇1↓⍵⋄3≠(⎕UCS 10 32)⍳⊃⍵:∇1↓⍵⋄'#'=⊃⍵:∇⍵↓⍨1⍳⍨1,⍨⍵=⎕UCS 10
    s←≢id←sd,⎕A,⎕C ⎕A}

Then, I compute the bitmask that decides whether individual characters in the input stream are valid identifier characters and count leading ones to determine the length of the identifier. If the identifier has length 0, an unexpected token has been observed.

lx←{0=≢⍵:⍬⋄6≠i←l⍳⊃⍵:(⊂0,⊂i⊃l),∇1↓⍵⋄3≠(⎕UCS 10 32)⍳⊃⍵:∇1↓⍵⋄'#'=⊃⍵:∇⍵↓⍨1⍳⍨1,⍨⍵=⎕UCS 10
    s←≢id←sd,⎕A,⎕C ⎕A⋄k←⊥⍨⌽s≥id⍳⍵⋄k=0:err'eltoken'⍵⋄(⊂1,k↑⍵),∇k↓⍵}

Syntactic analysis⌗

Begin the recursive descent parser code by defining useful abbreviations for single-character tokens:

pr←{L←0'λ'⋄P←0'('⋄E←0'='⋄C←0')'⋄D←0'.'}

If the input is in the form ID = ..., process a binding. Otherwise, read a term:

pr←{L←0'λ'⋄P←0'('⋄E←0'='⋄C←0')'⋄D←0'.'
    E≡⊃1↓⍵:bi ⍵⋄⊃⌽tr ⍵}

A binding simply processes the term following the aforementioned token sequence. The binding function alters the symbol table approperiately:

pr←{L←0'λ'⋄P←0'('⋄E←0'='⋄C←0')'⋄D←0'.'
    bi←{k←⊃⍵⋄lx v←tr 2↓⍵⋄0≠≢lx:err'estray'⋄_←∆hr k⋄_←∆hi k v⋄⍬}
    E≡⊃1↓⍵:bi ⍵⋄⊃⌽tr ⍵}

If the term begins with a λ sign, the term is an abstraction. Otherwise, the input is a sequence of one or more applications of atoms to each other.

pr←{L←0'λ'⋄P←0'('⋄E←0'='⋄C←0')'⋄D←0'.'
    tr←{L≡⊃⍵:ab 1↓⍵⋄↑{(L≢⊃⍺)∧(1≢⊃⊃⍺)∧(P≢⊃⍺):⍺⍵⋄lx t←at ⍺⋄lx ∇'App'⍵ t}/at ⍵}
    bi←{k←⊃⍵⋄lx v←tr 2↓⍵⋄0≠≢lx:err'estray'⋄_←∆hr k⋄_←∆hi k v⋄⍬}
    E≡⊃1↓⍵:bi ⍵⋄⊃⌽tr ⍵}

An abstraction accepts the token stream that starts with a list of identifiers separated with a dot from the abstraction’s body (a term). Multiple arguments in a lambda definition are converted to a cascading single argument lambda function, i.e. (λx y z. z) → (λx (λy (λz. z))).

pr←{L←0'λ'⋄P←0'('⋄E←0'='⋄C←0')'⋄D←0'.'
    ab←{i←⊥⍨⌽1=⊃¨⍵⋄i<1:err'elambda'⋄D≢⍵⊃⍨i+1:err'edot'
        lx tv←tr(1+i)↓⍵⋄nm←1↓¨i↑⍵⋄lx(nm{0=≢⍺:⍵⋄(¯1↓⍺)∇'Lam'(⊃⌽⍺)⍵}tv)}
    tr←{L≡⊃⍵:ab 1↓⍵⋄↑{(L≢⊃⍺)∧(1≢⊃⊃⍺)∧(P≢⊃⍺):⍺⍵⋄lx t←at ⍺⋄lx ∇'App'⍵ t}/at ⍵}
    bi←{k←⊃⍵⋄lx v←tr 2↓⍵⋄0≠≢lx:err'estray'⋄_←∆hr k⋄_←∆hi k v⋄⍬}
    E≡⊃1↓⍵:bi ⍵⋄⊃⌽tr ⍵}

Finally, an atom is defined either as a parenthesised term, a variable reference or another lambda function.

pr←{L←0'λ'⋄P←0'('⋄E←0'='⋄C←0')'⋄D←0'.'
    at←{P≡⊃⍵:{lx t←tr(1↓⍵)⋄C≢⊃lx:err'eparen'⋄(1↓lx) t}⍵
        1≡⊃⊃⍵:(1↓⍵)('Var'(1↓⊃⍵))⋄L≡⊃⍵:ab 1↓⍵⋄err'etoken'}
    ab←{i←⊥⍨⌽1=⊃¨⍵⋄i<1:err'elambda'⋄D≢⍵⊃⍨i+1:err'edot'
        lx tv←tr(1+i)↓⍵⋄nm←1↓¨i↑⍵⋄lx(nm{0=≢⍺:⍵⋄(¯1↓⍺)∇'Lam'(⊃⌽⍺)⍵}tv)}
    tr←{L≡⊃⍵:ab 1↓⍵⋄↑{(L≢⊃⍺)∧(1≢⊃⊃⍺)∧(P≢⊃⍺):⍺⍵⋄lx t←at ⍺⋄lx ∇'App'⍵ t}/at ⍵}
    bi←{k←⊃⍵⋄lx v←tr 2↓⍵⋄0≠≢lx:err'estray'⋄_←∆hr k⋄_←∆hi k v⋄⍬}
    E≡⊃1↓⍵:bi ⍵⋄⊃⌽tr ⍵}

Abstract syntax trees are trivially turned into a natural notation string as follows:

str←{'Lam'≡⊃⍵:∊'(λ'(2⊃⍵)'. ',(∇3⊃⍵),')'⋄'Var'≡⊃⍵:2⊃⍵⋄'App'≡⊃⍵:∊'('(∇2⊃⍵)' '(∇3⊃⍵)')'}

δ expansion⌗

δ expansion is applied by the evaluator after the parsing step to expand the underlying definitions of variables bound in the symbol table. Start by defining a function that determines what to replace the variable with when walking the syntax tree. Extra precaution needs to be taken, since variables may be shadowed by abstractions:

de←{lk←{(⊂⍵)∊⍺:'Var'⍵⋄i←(⊂⍵)⍳⍨,1↑∆t⋄i>⊃⌽⍴∆t:'Var'⍵⋄⊃∆t[2;i]}}

Finally, tree traversal is performed. The recursive traversal function keeps an accumulator of shadowed variables in scope that can not be replaced. Var nodes are passed to lk, App nodes are recursed into, Lam nodes are recursed into and append the variable they bind to the accumulator.

de←{lk←{(⊂⍵)∊⍺:'Var'⍵⋄i←(⊂⍵)⍳⍨,1↑∆t⋄i>⊃⌽⍴∆t:'Var'⍵⋄⊃∆t[2;i]}
    {⍺←⊂''⋄'Var'≡⊃⍵:(⍺ lk ⊃⌽⍵)⋄'App'≡⊃⍵:'App' (⍺∇2⊃⍵) (⍺∇3⊃⍵)
    'Lam'≡⊃⍵:(2↑⍵),⊂(⍺,⊂2⊃⍵)∇3⊃⍵}⍵}

α conversion⌗

Start by defining a table that tracks the amount of times a name has appeared in different contexts. Define a function that performs table lookup and increments a desired entry. The table lookup function automatically appends the subscript suffix to the identifier.

ac←{⍺←⍪''0⋄a←⍺⋄∆i←{i←(,1↑a)⍳⊂⍵⋄_←⍺{⋄i>⊃⌽⍴a:a,←⍵ 0⋄a[2;i]+←⍺⋄0}⍵⋄⍵ad,a[2;i]}}

Tree traversal is as trivial as previously demonstrated. Variables are delegated to ∆i, application is being recursed into, abstractons trigger table increments.

a←⍪''0⋄ac←{∆i←{i←(,1↑a)⍳⊂⍵⋄_←⍺{⋄i>⊃⌽⍴a:a,←⍵ 0⋄a[2;i]+←⍺⋄0}⍵⋄⍵ad,a[2;i]}
    {'Var'≡⊃⍵:'Var' (0 ∆i ⊃⌽⍵)⋄'App'≡⊃⍵:'App' (∇2⊃⍵) (∇3⊃⍵)
    'Lam'≡⊃⍵:(⊂'Lam'),((∇3⊃⍵) ,⍨⍥⊂ (1 ∆i 2⊃⍵))}⍵}

β reduction⌗

Variables are invariant under β reduction. Abstractions are recursed into.

br←{'Lam'≡⊃⍵:(2↑⍵),⊂∇3⊃⍵⋄'Var'≡⊃⍵:⍵⋄'App'≢⊃⍵:err'eint'}

When encountering an application, normal forms of the function and the argument are computed. If the function is not an abstraction, no operation is performed:

br←{'Lam'≡⊃⍵:(2↑⍵),⊂∇3⊃⍵⋄'Var'≡⊃⍵:⍵⋄'App'≢⊃⍵:err'eint'
    an bn←∇¨1↓⍵⋄'Lam'≢⊃an:⍵}

α conversion is performed to avoid name clashes:

br←{'Lam'≡⊃⍵:(2↑⍵),⊂∇3⊃⍵⋄'Var'≡⊃⍵:⍵⋄'App'≢⊃⍵:err'eint'
    an bn←∇¨1↓⍵⋄'Lam'≢⊃an:⍵⋄an bn←1↓ac 'App' an bn⋄av←2⊃an}

At this point, the only operation left is replacing all instances of the name 2⊃an with bn in 3⊃an. A simple recursive approach follows, minding shadowing:

br←{'Lam'≡⊃⍵:(2↑⍵),⊂∇3⊃⍵⋄'Var'≡⊃⍵:⍵⋄'App'≢⊃⍵:err'eint'
    an bn←∇¨1↓⍵⋄'Lam'≢⊃an:⍵⋄an bn←1↓ac 'App' an bn⋄av←2⊃an
    {v←'Var'≡⊃⍵⋄v∧av≡2⊃⍵:bn⋄v:⍵⋄v←'Lam'≡⊃⍵⋄v∧av≡2⊃⍵:⍵⋄v:(2↑⍵),⊂∇3⊃⍵
     'App'≡⊃⍵:'App' (∇2⊃⍵) (∇3⊃⍵)}3⊃an}

Wrapping up⌗

To evaluate an expression from a string, it needs to be lexed and parsed first. If the parser encounters a binding, there is nothing left to evaluate, so the function should abort early. Otherwise, δ expansion is performed on the syntax tree and β reduction is repeatedly applied to the input. This is accomplished by the following code:

rd←{h←⍬⋄i←{⍵∊h:1⋄h,←⍵⋄0}⋄in←de ⍵⋄r←br⍣{(i hash ⍺)∨⍺≡⍵}in⋄(in≡r)∨(hash r)∊¯1↓h:err'einf'⋄r}
ast←pr lx ⍵⋄ast≡⍬:_←⍬⋄str rd ast

Shortly after this blog post was published, Nick Nickolov wrote a simple evaluator for simplified Lambda Calculus. Make sure to check it out.

The source code⌗

The full source code to this article can be downloaded from Github.

 lam←{
     ⎕IO⎕ML←1 1
     ∆d←⍪('true' ('Lam' (,'t') ('Lam' (,'f') ('Var' (,'t')))))
     ∆d,←('false' ('Lam' (,'t') ('Lam' (,'f') ('Var' (,'f')))))
     ⍺←∆d⋄∆t←⍺⋄∆hi←{∆t,←⍺⍵}⋄∆hr←{∆t∘←((∆t↑[2]⍨¯1∘+),∆t↓[2]⍨⊢)⍵⍳⍨,1↑∆t}⋄err←{⍵⎕SIGNAL 8}
     hash←{{1e10|⍺+31×⍵}/128+1(220⌶)⍵}⋄sd←'₀₁₂₃₄₅₆₇₈₉'⋄l←'()λ.='⋄ad←{⍺,sd[,1+10⊥⍣¯1⊢⍵]}
     str←{'Lam'≡⊃⍵:∊'(λ'(2⊃⍵)'. ',(∇3⊃⍵),')'⋄'Var'≡⊃⍵:2⊃⍵⋄'App'≡⊃⍵:∊'('(∇2⊃⍵)' '(∇3⊃⍵)')'}
     lx←{0=≢⍵:⍬⋄6≠i←l⍳⊃⍵:(⊂0,⊂i⊃l),∇1↓⍵⋄3≠(⎕UCS 10 32)⍳⊃⍵:∇1↓⍵⋄'#'=⊃⍵:∇⍵↓⍨1⍳⍨1,⍨⍵=⎕UCS 10
         s←≢id←sd,⎕A,⎕C ⎕A⋄k←⊥⍨⌽s≥id⍳⍵⋄k=0:err'eltoken'⍵⋄(⊂1,k↑⍵),∇k↓⍵}
     pr←{L←0'λ'⋄P←0'('⋄E←0'='⋄C←0')'⋄D←0'.'
         at←{P≡⊃⍵:{lx t←tr(1↓⍵)⋄C≢⊃lx:err'eparen'⋄(1↓lx) t}⍵
             1≡⊃⊃⍵:(1↓⍵)('Var'(1↓⊃⍵))⋄L≡⊃⍵:ab 1↓⍵⋄err'etoken'}
         ab←{i←⊥⍨⌽1=⊃¨⍵⋄i<1:err'elambda'⋄D≢⍵⊃⍨i+1:err'edot'
             lx tv←tr(1+i)↓⍵⋄nm←1↓¨i↑⍵⋄lx(nm{0=≢⍺:⍵⋄(¯1↓⍺)∇'Lam'(⊃⌽⍺)⍵}tv)}
         tr←{L≡⊃⍵:ab 1↓⍵⋄↑{(L≢⊃⍺)∧(1≢⊃⊃⍺)∧(P≢⊃⍺):⍺⍵⋄lx t←at ⍺⋄lx ∇'App'⍵ t}/at ⍵}
         bi←{k←⊃⍵⋄lx v←tr 2↓⍵⋄0≠≢lx:err'estray'⋄_←∆hr k⋄_←∆hi k v⋄⍬}
         E≡⊃1↓⍵:bi ⍵⋄⊃⌽tr ⍵}
     a←⍪''0⋄ac←{∆i←{i←(,1↑a)⍳⊂⍵⋄_←⍺{⋄i>⊃⌽⍴a:a,←⍵ 0⋄a[2;i]+←⍺⋄0}⍵⋄⍵ad,a[2;i]}
         {'Var'≡⊃⍵:'Var' (0 ∆i ⊃⌽⍵)⋄'App'≡⊃⍵:'App' (∇2⊃⍵) (∇3⊃⍵)
          'Lam'≡⊃⍵:(⊂'Lam'),((∇3⊃⍵) ,⍨⍥⊂ (1 ∆i 2⊃⍵))}⍵}
     de←{lk←{(⊂⍵)∊⍺:'Var'⍵⋄i←(⊂⍵)⍳⍨,1↑∆t⋄i>⊃⌽⍴∆t:'Var'⍵⋄⊃∆t[2;i]}
         {⍺←⊂''⋄'Var'≡⊃⍵:(⍺ lk ⊃⌽⍵)⋄'App'≡⊃⍵:'App' (⍺∇2⊃⍵) (⍺∇3⊃⍵)
          'Lam'≡⊃⍵:(2↑⍵),⊂(⍺,⊂2⊃⍵)∇3⊃⍵}⍵}
     br←{'Lam'≡⊃⍵:(2↑⍵),⊂∇3⊃⍵⋄'Var'≡⊃⍵:⍵⋄'App'≢⊃⍵:err'eint'
         an bn←∇¨1↓⍵⋄'Lam'≢⊃an:⍵⋄an bn←1↓ac 'App' an bn⋄av←2⊃an
         {v←'Var'≡⊃⍵⋄v∧av≡2⊃⍵:bn⋄v:⍵⋄v←'Lam'≡⊃⍵⋄v∧av≡2⊃⍵:⍵⋄v:(2↑⍵),⊂∇3⊃⍵
          'App'≡⊃⍵:'App' (∇2⊃⍵) (∇3⊃⍵)}3⊃an}
     rd←{h←⍬⋄i←{⍵∊h:1⋄h,←⍵⋄0}⋄in←de ⍵⋄r←br⍣{(i hash ⍺)∨⍺≡⍵}in⋄(in≡r)∨(hash r)∊¯1↓h:err'einf'⋄r}
     ast←pr lx ⍵⋄ast≡⍬:_←⍬⋄str rd ast
 }

The evaluator correctly handles edge cases:

      lam '((λx.x x)(λx.x x))'
einf
      lam'((λx.x x)(λx.x x))'
      ∧
      lam'(((λx y z. (x z (y z))) (λx y. x)) (λx y. x))' ⍝ SKK = I
(λz. z)

Journey to the Center of the Lambda Calculus