Tuesday, 6 January 2009

Refactoring Towards the Y Combinator: Part 2

This is a multi-part series. Part 1. Part 2. Part 3. Part 4. Part 5. Part 6.

Lets get started with discovering the Y combinator. To start off, I came up with the most straightforward recursive implementation of the factorial function I could think of, restricting myself to anonymous methods (or more specifically, lambdas):

public delegate int Factorial(int x);

/// <summary>
/// This is the cheating way. Define a variable and use a closure on that variable.
/// Its cheating because it only works in an impure language.
/// </summary>
public class Solution01 : ISolution
{
    public Factorial MakeFactorialFunction()
    {            
        Factorial factorial = null;
        factorial = (int x) =>
        {
            return x < 2 ? 1 : x * factorial(x - 1);
        };
        
        return factorial;
    }
}

I define a delegate for the Factorial function – it takes an int, returns an int. I could have used Func<int,int> but believe me, as this gets more complicated, you’ll be glad I introduced the delegate. I also define a MakeFactorialFunction() method, which is intended to be used as so:

var s1 = new Solution01();
Factorial f = s1.MakeFactorialFunction();
int factorialOf10 = f(10);

You can mostly ignore the Solution01 and ISolution stuff, they are just artefacts from the testing harness I set up. Lets focus on the code that matters:

Factorial factorial = null;
factorial = (int x) =>
{
    return x < 2 ? 1 : x * factorial(x - 1);
};

return factorial;

In this code, I call factorial recursively by utilizing a closure. The body of the lambda expression refers to a variable that lies outside the scope of the lambda; that variable is ‘factorial’. You might have noticed, my code comment above states that I consider this implementation to be “cheating”. This is because I reassign the value of factorial. First it is null, then it is a lambda expression. Perhaps you are wondering why I bother reassign it? Why not just do it like this:

Factorial factorial = (int x) =>
{
    return x < 2 ? 1 : x * factorial(x - 1);
};

I am afraid this won’t work, the compiler identifies this as an error, “use of unassigned local variable ‘factorial’”. This is why I use the extra step of declaring factorial to be null. Now I said before that reassigning the factorial variable is cheating. Why? The Y combinator is much more relevant to the functional programming paradigm than it is to the imperative paradigm – functional programming has a concept called “purity”, which means no side effects, and no variables (no variables with varying values, that is) . For the purposes of this exercise, it will be helpful to limit myself to pure operations. The reassignment of the factorial variable is a impure operation. So how to get rid of it?

My next attempt:

public delegate int SelfCallingFactorial(int x, SelfCallingFactorial self);

/// <summary>
/// We can remove the impurity by having the factorial function accept a reference to itself, so it can call itself.
/// We can't just return this implementation because it doesn't match the Factorial signature, so we have to return
/// a lambda that calls it. Unfortunately, doing so will create a closure and as a result, as long as a reference
/// to the result of MakeFactorialFunction method is held, the Solution2 instance will not be garbage collected.
/// </summary>
public class Solution02 : ISolution
{
    public Factorial MakeFactorialFunction()
    {
        SelfCallingFactorial factorial = (int x, SelfCallingFactorial self) =>
        {
            return x < 2 ? 1 : x * self(x - 1, self);
        };

        return x =>
        {
            return factorial(x, factorial);
        };
    }
}

Okay, so what has happened? Rather than having the factorial function call itself by utilizing the variable, we could pass a reference to the factorial function as an argument to itself. I modify the Factorial delegate to accept an instance of itself, and call it a SelfCallingFactorial instead. You can see that delegate definition at the top. I then define ‘factorial’ similarly as before, but this time I have a ‘self’ parameter which I can invoke as the recursive step. When it is invoked, I have to pass two arguments: the value I want to calculate the factorial of, and another self reference. Try to think of the ‘self’ parameter as a manually implemented equivalent of the ‘this’ keyword in normal C#, but for functions. The ‘this’ keyword allows code inside an object to obtain a reference to that object, similarly, my ‘self’ parameter allows code inside a function to call that function!

The result of this trick is that I have successfully removed the impurity - there is no reassignment in this solution. However we are not finished yet: unfortunately, I cannot simply return ‘factorial’ after it has been defined, because it is a SelfCallingFactorial, and the MakeFactorialFunction method is supposed to return a Factorial. Instead I have to return a lambda that will execute my SelfCallingFactorial. This creates another closure, again, on the ‘factorial’ variable. As mentioned in the code comment, the instance of Solution02 used to create the factorial function will stay in scope as long as the factorial function stays in scope, because the returned factorial function holds a reference to the ‘factorial’ local variable defined in the MakeFactorialFunction() method. Fortunately, this is easy to fix:

/// <summary>
/// Creating the factorial method inside the returned lambda resolves the garbage collection problem. 
/// This solution does not involve a closure, so its pretty good. But...
/// </summary>
public class Solution03 : ISolution
{
    public Factorial MakeFactorialFunction()
    {
        return (int x1) =>
        {
            SelfCallingFactorial factorial = (int x, SelfCallingFactorial self) =>
            {
                return x < 2 ? 1 : x * self(x - 1, self);
            };

            return factorial(x1, factorial);
        };
    }
}

Solution03 is just a very minor revision on Solution02. All I had to do was move the code that defines the SelfCallingFactorial inside the returned lambda expression (if you look at the code above, you will see that it was previously outside). Now when a Solution03 instance is used to construct a FactorialFunction, that instance will be free to be garbage collected while the result, the factorial function, can continued to be used. Now please don’t get me wrong, I know that this garbage collection ‘issue’ is quite irrelevant – I am actually just using it as an excuse to get the code closer to my satisfaction. I am not exactly sure why I want it this way – it just seems neater.

So, where to go from Solution03? It seems pretty good, and we are still nowhere near the Y combinator. In the next instalment, I will start making the code a little more general, with the eventual goal that the same technique might be applied to other recursive algorithms (such as Fibonacci).

2 comments: