Beard's Eye: performance

Showing posts with label performance. Show all posts

Tuesday, November 6, 2007

.NUTS: Yield Not, Waste Not

Everyone agrees that iterators are cool. They're so cool, they've got their own design pattern. Even cooler than that, they made the folks at Sun introduce some (gasp!) new syntax into Java. Not to be outdone, C# goes even further: not only can you use iterators easily, you can also roll your own generators using the yield keyword. How cool is that?

To Litter or Not to Litter
As it turns out, it's not as cool as it sounds. If you care about memory allocation and garbage, then using yield is a bad idea. Before I go any further, let's explore this concern itself. Why would anyone care about memory allocation in a language with a garbage collector? Isn't the whole purpose of the garbage collector to abstract away memory allocation and help avoid all those nasty pointer-related bugs? If you find yourself nodding along to these questions, then I suggest that you read Joel's Law of Leaky Abstractions.

The reasons to care about garbage are diverse. In my case, it's because I'm working with XNA. That means that if I make too much garbage at the wrong time, the garbage collector will fire up and mess up my frame rate. As Shawn Hargreaves explained, there are two ways to prevent this. Myself, I prefer what he calls the Right Frequency Path, which is why using yield is a bad idea for me.

There are other kinds of software, where garbage might not be as critical an issue as in games, but you should still be aware of it. Too much garbage can cripple the performance of any system.

Garbage Generators

So far I have only claimed that yield is wasteful. Let's take a look at why it's so. Consider this short program:

using System;
using System.Collections.Generic;
using System.Text;

namespace Sandbox
{
 static class IteratorTest
 {
     public static IEnumerable<int> FibonacciYield(int max)
     {
         if (max > 1)
         {
             int a = 1;
             int b = 1;

             yield return a;
             yield return b;

             int c = a + b;

             while (c < max)
             {
                 yield return c;
                 a = b;
                 b = c;
                 c = a + b;
             }
         }
     }
 }

 class Program
 {
     static void Main(string[] args)
     {
         int sum = 0;
         for (int l = 0; l < 100000; l++)
         {
             foreach (int i in IteratorTest.FibonacciYield(17))
             {
                 sum += i;
             }
         }
         Console.WriteLine(sum);
     }
 }
}

Running CLR Profiler on this program might surprise you. Although there is no explicit allocation in the code, plenty of memory winds up in the landfill. The culprit is a class with a weird name: <FibonacciYield>d__0. In this simple example, it is responsible for a whopping 3.4 MB garbage pile. Where did that come from?

That's the price one has to pay for the magic. After all, when you write your own generator with yield, you're returning IEnumerable<T>, without even instancing it, let alone implementing it. But the return value has to come from somewhere and that somewhere is a compiler-generated class. If you run Reflector on the program and then take a look at the method FibonacciYield, you'll get this:

public static IEnumerable<int> FibonacciYield(int max)
{
  <FibonacciYield>d__0 d__ = new <FibonacciYield>d__0(-2);
  d__.<>3__max = max;
  return d__;
}

Just as suspected, that's where and how allocation happens. But what about our mysterious compiler-generated class with an ugly name?

Under The Hood

The dissasembly of the <FibonacciYield>d__0 class shows not only an implementation of IEnumerable<int> interface, but also a full-fledged IEnumerator<int> implementation. It has all the stuff a generator should have: the Current property, the Reset method and the MoveNext method.

At first, the code might look intimidating -- all this came out of that short method?! -- but a closer look reveals certain patterns. For example, take a look at the fields:

private int <>1__state;
private int <>2__current;
public int <>3__max;
public int <a>5__1;
public int <b>5__2;
public int <c>5__3;
public int max;

The <1>__state and <2>__current fields have to do with how the yield keyword itself works and I'll get back to them in a few moments. The rest show an interesting pattern: the local variables in the original method turn into <a>5__1, <b>5__2 and <c>5__3 fields, whereas the max argument becomes <>3__max field. But what about the max field? Why have both max and <>3__max? And why convert local variables into fields in the first place?

To answer that, remember how the foreach loop works: by calling the MoveNext method of the IEnumerator<T> interface and inspecting its Current property, until MoveNext returns false. This means that the state of whatever you're doing in your generator has to be preserved between two calls to MoveNext. In our case, the local variables of the original method are a part of the state that has to be preserved.

Speaking of state, the magic is explained when you look at the implementation of the MoveNext method:

private bool MoveNext()
{
   switch (this.<>1__state)
   {
       case 0:
           this.<>1__state = -1;
           if (this.max <= 1)
           {
               break;
           }
           this.<a>5__1 = 1;
           this.<b>5__2 = 1;
           this.<>2__current = this.<a>5__1;
           this.<>1__state = 1;
           return true;

       case 1:
           this.<>1__state = -1;
           this.<>2__current = this.<b>5__2;
           this.<>1__state = 2;
           return true;

       case 2:
           this.<>1__state = -1;
           this.<c>5__3 = this.<a>5__1 + this.<b>5__2;
           while (this.<c>5__3 < this.max)
           {
               this.<>2__current = this.<c>5__3;
               this.<>1__state = 3;
               return true;
           Label_00C5:
               this.<>1__state = -1;
               this.<a>5__1 = this.<b>5__2;
               this.<b>5__2 = this.<c>5__3;
               this.<c>5__3 = this.<a>5__1 + this.<b>5__2;
           }
           break;

       case 3:
           goto Label_00C5;
   }
   return false;
}

This is the heart of the compiler-generated finite state machine designed to react correctly to subsequent calls to MoveNext. The initial state is 0. Every time the compiler encounters a yield return statement, it creates a new state and generates the code that:

sets <>1__state to the new state
sets <>2__current to the value in yield return
returns true.

By the way, don't try to plug this code back into the sample program. The C# compiler will reject it, for reasons about which I will rant in another post.

The final mystery is resolved by looking at the GetEnumerator method implementation:

IEnumerator<int> IEnumerable<int>.GetEnumerator()
{
   IteratorTest.<FibonacciYield>d__0 d__;
   if (Interlocked.CompareExchange(ref this.<>1__state, 0, -2) == -2)
   {
       d__ = this;
   }
   else
   {
       d__ = new IteratorTest.<FibonacciYield>d__0(0);
   }
   d__.max = this.<>3__max;
   return d__;
}

After doing some tedious deciphering, this turns out to mean the following: if MoveNext That has already been called on this particular object, then create a fresh clone; if not, return this object. That's why we have both <>3__max and max: since arguments can be used as local variables within a method, the former is the original value supplied to FibonacciYield and the latter serves as a "local variable" field.

Making It Tidy

Now that we know how yield works, we can concentrate on avoiding memory allocation. Fortunately, C# allows you to write generators without implementing the IEnumerator<T> interface, as documented in help file:

In C#, it is not strictly necessary for a collection class to inherit from IEnumerable and IEnumerator in order to be compatible with foreach; as long as the class has the required GetEnumerator, MoveNext, Reset, and Current members, it will work with foreach. Omitting the interfaces has the advantage of allowing you to define the return type of Current to be more specific than object, thereby providing type-safety.

Omitting the interfaces, in this case, has the added advantage of being able to implement a generator as a struct, without it getting boxed later. Your final generator would look like this:

public struct ScroogeGenerator
{
    private int state;
    private int current;
    private int max;
    private int a;
    private int b;
    private int c;

    public ScroogeGenerator(int max)
    {
        this.state = 0;
        this.max = max;
        current = a = b = c = 0;
    }

    public int Current
    {
        get { return current; }
    }

    public bool MoveNext()
    {
        switch (state)
        {
            case 0:
                if (max <= 1) break;
                a = 1;
                b = 1;
                state = 1;
                current = a;
                return true;
            case 1:
                state = 2;
                current = b;
                return true;
            case 2:
                c = a + b;
                if (c >= max) break;
                a = b;
                b = c;
                current = c;
                return true;
        }
        return false;
    }

    public void Reset()
    {
        throw new Exception("The method or operation is not implemented.");
    }

    public ScroogeGenerator GetEnumerator() { return this; }
}

A few things to keep in mind:

When you write your own value-type generator, make sure you never cast it into an interface, or it will get boxed and make garbage. The best way to ensure that is not to implement any of the IEnumerable or IEnumerator interfaces.
The C# foreach loop does not call the Reset method. You can leave it unimplemented or empty if you'll use your generator in C# foreach loops only.
Don't reuse an instance of a value-type generator across different foreach loops. If you intend to do that, implement Reset and call it manually before reusing the generator.

Monday, August 13, 2007

.NUTS: Enum Conundrum

Welcome to .NUTS

I've been avoiding .NET for quite a long time. It's not that I'm an anti-Microsoft zealot, it's just that I used to code in Java at work and I figured that one platform full of frustrating limitations was more than enough. Recently, however, I decided to learn XNA on my own free time and that means getting involved with .NET and C#. It turned out to be just as frustrating as I feared, but at least it provided me with some nice blog material. Thus is the .NUTS "column" born in my blog, to document all that is weird in the .NET world. Expect stuff that might make you giggle, goggle and groan.

Enums Revisited

I first encountered the concept of enumerated types when I learned Pascal. Before that I owned a Commodore, so I dabbled in Simons' BASIC and 6502 assembler. Therefore, I didn't approach enums the way a C programmer might -- as a cleaner way of defining constants -- but as a way of defining a completely new type that accepts only those identifiers I define. The difference is subtle, but important: it shapes your attitude.

In .NET, enums have, obviously, inherited a lot of baggage from COM, which, in turn, comes from C. Furthermore, that baggage has been squeezed to fit the .NET luggage rack, where everything is an object, even the value types. Of course, value types are not really first-class objects, because they have to be boxed first. All in all, .NET enums are a part of a very leaky abstraction.

Starting Point

My journey of discovery started with an innocent little detail inside an excellent post on Shawn Hargreaves' blog. If you haven't read it yet, I strongly recommend it. I found it quite enlightening, especially the following advice:

If you use an enum type as a dictionary key, internal dictionary operations will cause boxing. You can avoid this by using integer keys, and casting your enum values to ints before adding them to the dictionary.

I found it strange that ints don't get boxed, while enums do. However, I was in a hurry to cobble my prototype together, so I just wrote it off as another one of those language eccentricities one encounters occasionally.

A couple of weeks later, I mentioned this to a friend of mine, who has been working in .NET since it first came out. He was even more flabbergasted: "That's weird. I mean, Microsoft keeps harping on how generic collections allow you to avoid unnecessary boxing. Seeing as how enum is one of the most common types for dictionary keys, the whole thing doesn't make sense at all."

That got me even more intrigued, enough to overcome my laziness and make me start poking around. This is where things started getting really interesting.

Now, those of you who don't care about the why and want the solution, skip to the part called "Finish Line", down below, near the end of this post.

Still reading? Okay, let's roll up those sleeves...

Stage 1: Observation

The obvious first step was to confirm Shawn's claim and work from there. For that, I armed myself with CLR Profiler (thanks to Mike Stall's blog) and I hammered out the following piece of code:

using System;
using System.Collections.Generic;
using System.Text;

namespace Sandbox
{
    enum TestEnum
    {
        e10,
        e9,
        e8,
        e7,
        e6,
        e5,
        e4,
        e3,
        e2,
        e1
    }

    class Program
    {
        static void Main(string[] args)
        {
            Dictionary<TestEnum, int> dict = new Dictionary<TestEnum, int>();

            for (int l = 0; l < 100000; l++)
            {
                TestEnum x = (TestEnum) (l % 10);
                dict[x] = 100000 - (int) x;
            }

            for (TestEnum x = TestEnum.e10; x <= TestEnum.e1; x++)
            {
                Console.WriteLine(dict[x]);
            }
        }
    }
}

The results were as bad as Shawn predicted: the code allocated 4,825,246 bytes (more than 4.5 MB), out of which 74.61% were instances of Sandbox.TestEnum type. If an enum is winding up on the heap like that, it must be getting boxed.

The next thing I did was to change the Main method to look like this:

        static void Main(string[] args)
        {
            Dictionary<int, int> dict = new Dictionary<int, int>();

            for (int l = 0; l < 100000; l++)
            {
                int x = l % 10;
                dict[x] = 100000 - (int) x;
            }

            for (int x = 0; x < 10; x++)
            {
                Console.WriteLine(dict[x]);
            }
        }

The profiler reported 24,692 allocated bytes, with nothing sticking out as especially wasteful. It was clear that .NET handled ints and enums differently, even though they are both value types.

CLR Profiler has a very neat feature that allows you to find out which part(s) of the code allocated the memory, by right-clicking on the offending bar in the histogram view and selecting the "Show Who Allocated" option. That will open the allocation graph view, which in my enum case looked like this:

(click on the image to enlarge)

Apparently, my enum was being boxed by ObjectEqualityComparer, inside the Insert method in the generic Dictionary class. Culprit identified, time to move on to the next stage.

Stage 2: Under the Hood

Contrary to my first impulse -- whip out the IL Disassembler -- I decided to first check whether the help files mentioned ObjectEqualityComparer. No such luck. ILDASM it is, then. The disassembly of Insert method in the generic Dictionary class revealed the following interesting snippet:

  IL_001e:  ldfld      class System.Collections.Generic.IEqualityComparer`1<!0> class System.Collections.Generic.Dictionary`2<!TKey,!TValue>::comparer
  IL_0023:  ldarg.1
  IL_0024:  callvirt   instance int32 class System.Collections.Generic.IEqualityComparer`1<!TKey>::GetHashCode(!0)

So my suspect, ObjectEqualityComparer, is an implementation of an IEqualityComparer interface, contained in the comparer field inside my Dictionary object. Here's what help file says about IEqualityComparer:

This interface allows the implementation of customized equality comparison for collections. That is, you can create your own definition of equality for type T, and specify that this definition be used with a collection type that accepts the IEqualityComparer generic interface. In the .NET Framework, constructors of the Dictionary generic collection type accept this interface.

A default implementation of this interface is provided by the Default property of the EqualityComparer generic class. The StringComparer class implements IEqualityComparer of type String.

A bit more digging in help files revealed this:

Dictionary requires an equality implementation to determine whether keys are equal. You can specify an implementation of the IEqualityComparer generic interface by using a constructor that accepts a comparer parameter; if you do not specify an implementation, the default generic equality comparer EqualityComparer.Default is used. If type TKey implements the System.IEquatable generic interface, the default equality comparer uses that implementation.

So how does the EqualityComparer.Default get defined? According to the help file, like this:

The Default property checks whether type T implements the System.IEquatable generic interface and if so returns an EqualityComparer that uses that implementation. Otherwise it returns an EqualityComparer that uses the overrides of Object.Equals and Object.GetHashCode provided by T.

In other words, the boxing happens because my enum doesn't implement IEquatable interface. The funny thing is that Int32 value type implements IEquatable, as does every other primitive numeric type. Yet enums don't. When you stop to think of it, it sounds logical: the Enum value type can't implement IEquatable because that would theoretically allow you to compare any enum to any other enum and obtain bogus results. Not that I'm convinced by that argument, but it's really a moot point, seeing as how I can't change that myself. That leaves looking for a workaround, which brings us to the last stage.

Stage 3: The Lesser Evil

Once you're reduced to looking for a workaround, your job is to look for the least dirty and the least ugly solution. The first thing I tried was to make my enum implement the IEquatable interface. Naturally, there was no way to make C# swallow that syntax. Next I tried setting the EqualityComparer<TestEnum>.Default property, but unfortunately it's read-only. Finally, grudgingly, I conceded to write my own implementation of IEqualityComparer. The first attempt was to write it as a generic class, but when C# complained about not being able to explicitly cast an unknown value type to int (within my GetHashCode method) it finally drove the lesson home: .NET generics are definitely not the same thing as C++ templates.

The final solution looked like this:

    class TestEnumComparer : IEqualityComparer<TestEnum>
    {
        public static readonly TestEnumComparer Instance = new TestEnumComparer();

        #region IEqualityComparer<TestEnum> Members

        public bool Equals(TestEnum x, TestEnum y)
        {
            return (x == y);
        }

        public int GetHashCode(TestEnum obj)
        {
            return (int) obj;
        }

        #endregion
    }

The dictionary declaration would then look like this:

            Dictionary<TestEnum, int> dict = new Dictionary<TestEnum, int>(TestEnumComparer.Instance);

Finish Line

To sum it up, for each enum type you intend to use as a dictionary key, you'll have to write your implementation of IEqualityComparer and pass it to Dictionary constructor. For each enum type you'll have to type 18 lines of boilerplate code -- you can squeeze it into less, I know -- but that's a relatively small price to pay for avoiding the memory waste caused by boxing your keys. I just wish it wasn't that ugly, but at least it's simple enough. Stay tuned for more .NUTS in the future.