Tuesday, November 6, 2007

.NUTS: Yield Not, Waste Not

Everyone agrees that iterators are cool. They're so cool, they've got their own design pattern. Even cooler than that, they made the folks at Sun introduce some (gasp!) new syntax into Java. Not to be outdone, C# goes even further: not only can you use iterators easily, you can also roll your own generators using the yield keyword. How cool is that?

To Litter or Not to Litter
As it turns out, it's not as cool as it sounds. If you care about memory allocation and garbage, then using yield is a bad idea. Before I go any further, let's explore this concern itself. Why would anyone care about memory allocation in a language with a garbage collector? Isn't the whole purpose of the garbage collector to abstract away memory allocation and help avoid all those nasty pointer-related bugs? If you find yourself nodding along to these questions, then I suggest that you read Joel's Law of Leaky Abstractions.

The reasons to care about garbage are diverse. In my case, it's because I'm working with XNA. That means that if I make too much garbage at the wrong time, the garbage collector will fire up and mess up my frame rate. As Shawn Hargreaves explained, there are two ways to prevent this. Myself, I prefer what he calls the Right Frequency Path, which is why using yield is a bad idea for me.

There are other kinds of software, where garbage might not be as critical an issue as in games, but you should still be aware of it. Too much garbage can cripple the performance of any system.

Garbage Generators

So far I have only claimed that yield is wasteful. Let's take a look at why it's so. Consider this short program:
using System;
using System.Collections.Generic;
using System.Text;

namespace Sandbox
{
 static class IteratorTest
 {
     public static IEnumerable<int> FibonacciYield(int max)
     {
         if (max > 1)
         {
             int a = 1;
             int b = 1;

             yield return a;
             yield return b;

             int c = a + b;

             while (c < max)
             {
                 yield return c;
                 a = b;
                 b = c;
                 c = a + b;
             }
         }
     }
 }

 class Program
 {
     static void Main(string[] args)
     {
         int sum = 0;
         for (int l = 0; l < 100000; l++)
         {
             foreach (int i in IteratorTest.FibonacciYield(17))
             {
                 sum += i;
             }
         }
         Console.WriteLine(sum);
     }
 }
}

Running CLR Profiler on this program might surprise you. Although there is no explicit allocation in the code, plenty of memory winds up in the landfill. The culprit is a class with a weird name: <FibonacciYield>d__0. In this simple example, it is responsible for a whopping 3.4 MB garbage pile. Where did that come from?

That's the price one has to pay for the magic. After all, when you write your own generator with yield, you're returning IEnumerable<T>, without even instancing it, let alone implementing it. But the return value has to come from somewhere and that somewhere is a compiler-generated class. If you run Reflector on the program and then take a look at the method FibonacciYield, you'll get this:
public static IEnumerable<int> FibonacciYield(int max)
{
  <FibonacciYield>d__0 d__ = new <FibonacciYield>d__0(-2);
  d__.<>3__max = max;
  return d__;
}

Just as suspected, that's where and how allocation happens. But what about our mysterious compiler-generated class with an ugly name?

Under The Hood

The dissasembly of the <FibonacciYield>d__0 class shows not only an implementation of IEnumerable<int> interface, but also a full-fledged IEnumerator<int> implementation. It has all the stuff a generator should have: the Current property, the Reset method and the MoveNext method.

At first, the code might look intimidating -- all this came out of that short method?! -- but a closer look reveals certain patterns. For example, take a look at the fields:
private int <>1__state;
private int <>2__current;
public int <>3__max;
public int <a>5__1;
public int <b>5__2;
public int <c>5__3;
public int max;

The <1>__state and <2>__current fields have to do with how the yield keyword itself works and I'll get back to them in a few moments. The rest show an interesting pattern: the local variables in the original method turn into <a>5__1, <b>5__2 and <c>5__3 fields, whereas the max argument becomes <>3__max field. But what about the max field? Why have both max and <>3__max? And why convert local variables into fields in the first place?

To answer that, remember how the foreach loop works: by calling the MoveNext method of the IEnumerator<T> interface and inspecting its Current property, until MoveNext returns false. This means that the state of whatever you're doing in your generator has to be preserved between two calls to MoveNext. In our case, the local variables of the original method are a part of the state that has to be preserved.

Speaking of state, the magic is explained when you look at the implementation of the MoveNext method:
private bool MoveNext()
{
   switch (this.<>1__state)
   {
       case 0:
           this.<>1__state = -1;
           if (this.max <= 1)
           {
               break;
           }
           this.<a>5__1 = 1;
           this.<b>5__2 = 1;
           this.<>2__current = this.<a>5__1;
           this.<>1__state = 1;
           return true;

       case 1:
           this.<>1__state = -1;
           this.<>2__current = this.<b>5__2;
           this.<>1__state = 2;
           return true;

       case 2:
           this.<>1__state = -1;
           this.<c>5__3 = this.<a>5__1 + this.<b>5__2;
           while (this.<c>5__3 < this.max)
           {
               this.<>2__current = this.<c>5__3;
               this.<>1__state = 3;
               return true;
           Label_00C5:
               this.<>1__state = -1;
               this.<a>5__1 = this.<b>5__2;
               this.<b>5__2 = this.<c>5__3;
               this.<c>5__3 = this.<a>5__1 + this.<b>5__2;
           }
           break;

       case 3:
           goto Label_00C5;
   }
   return false;
}

This is the heart of the compiler-generated finite state machine designed to react correctly to subsequent calls to MoveNext. The initial state is 0. Every time the compiler encounters a yield return statement, it creates a new state and generates the code that:
  1. sets <>1__state to the new state
  2. sets <>2__current to the value in yield return
  3. returns true.
By the way, don't try to plug this code back into the sample program. The C# compiler will reject it, for reasons about which I will rant in another post.

The final mystery is resolved by looking at the GetEnumerator method implementation:
IEnumerator<int> IEnumerable<int>.GetEnumerator()
{
   IteratorTest.<FibonacciYield>d__0 d__;
   if (Interlocked.CompareExchange(ref this.<>1__state, 0, -2) == -2)
   {
       d__ = this;
   }
   else
   {
       d__ = new IteratorTest.<FibonacciYield>d__0(0);
   }
   d__.max = this.<>3__max;
   return d__;
}

After doing some tedious deciphering, this turns out to mean the following: if MoveNext That has already been called on this particular object, then create a fresh clone; if not, return this object. That's why we have both <>3__max and max: since arguments can be used as local variables within a method, the former is the original value supplied to FibonacciYield and the latter serves as a "local variable" field.

Making It Tidy

Now that we know how yield works, we can concentrate on avoiding memory allocation. Fortunately, C# allows you to write generators without implementing the IEnumerator<T> interface, as documented in help file:
In C#, it is not strictly necessary for a collection class to inherit from IEnumerable and IEnumerator in order to be compatible with foreach; as long as the class has the required GetEnumerator, MoveNext, Reset, and Current members, it will work with foreach. Omitting the interfaces has the advantage of allowing you to define the return type of Current to be more specific than object, thereby providing type-safety.

Omitting the interfaces, in this case, has the added advantage of being able to implement a generator as a struct, without it getting boxed later. Your final generator would look like this:
public struct ScroogeGenerator
{
    private int state;
    private int current;
    private int max;
    private int a;
    private int b;
    private int c;

    public ScroogeGenerator(int max)
    {
        this.state = 0;
        this.max = max;
        current = a = b = c = 0;
    }

    public int Current
    {
        get { return current; }
    }

    public bool MoveNext()
    {
        switch (state)
        {
            case 0:
                if (max <= 1) break;
                a = 1;
                b = 1;
                state = 1;
                current = a;
                return true;
            case 1:
                state = 2;
                current = b;
                return true;
            case 2:
                c = a + b;
                if (c >= max) break;
                a = b;
                b = c;
                current = c;
                return true;
        }
        return false;
    }

    public void Reset()
    {
        throw new Exception("The method or operation is not implemented.");
    }

    public ScroogeGenerator GetEnumerator() { return this; }
}

A few things to keep in mind:
  • When you write your own value-type generator, make sure you never cast it into an interface, or it will get boxed and make garbage. The best way to ensure that is not to implement any of the IEnumerable or IEnumerator interfaces.
  • The C# foreach loop does not call the Reset method. You can leave it unimplemented or empty if you'll use your generator in C# foreach loops only.
  • Don't reuse an instance of a value-type generator across different foreach loops. If you intend to do that, implement Reset and call it manually before reusing the generator.