Monday, February 23, 2009

Reflection Recipes

Have you ever seen that cartoon in which Speedy Gonzales guides Daffy Duck across a minefield?
BOOM! There's one! BOOM! There's another! What do you mean you don't know where they are? You haven't missed one yet!

I've been plumbing the depths of .NET reflection lately and it felt just like Daffy's trip: I knew where I was and where I wanted to be, but I hit every landmine on my way there. It's not Microsoft's fault, really. The .NET reflection is a powerful beast and covering every little "gotcha" is a heroic work, one that the guys in charge of MSDN almost pulled off. Nearly everything you'll need to survive is somewhere in the library. If you're trying to do something simple, you'll be fine; if you're trying to pull off something advanced, you'll need a map of the minefield. Here are some of the mines I hit so far.

Attributes: Not What It Looks Like, Honey

It's easy to think of attributes as something that gets attached to other elements of your code when it's compiled. Unfortunately, it's also incorrect. Consider the following code:
[AttributeUsage(AttributeTargets.Method, AllowMultiple = false, Inherited = true)]
public class UniqueIDAttribute : Attribute
{
    private static int nextID = 1;

    public int ID { get; private set; }

    public UniqueIDAttribute()
    {
        ID = nextID++;
    }
}

public class Base
{
    [UniqueID]
    public virtual void MyMethod() { }
}

public class Derived : Base
{
    public override void MyMethod()
    {
        base.MyMethod();
    }
}

class Program
{
    static void Main(string[] args)
    {
        UniqueIDAttribute attr;
        attr = (UniqueIDAttribute) Attribute.GetCustomAttribute(
        typeof(Derived).GetMethod("MyMethod"),
        typeof(UniqueIDAttribute));
        Console.Write(attr.ID);
        Console.Write(" ");
        attr = (UniqueIDAttribute) Attribute.GetCustomAttribute(
        typeof(Base).GetMethod("MyMethod"),
        typeof(UniqueIDAttribute));
        Console.WriteLine(attr.ID);
    }
}

I was honestly surprised that the output of this was "1 2" instead of "1 1" I expected. I thought that placing a custom attribute on a method will attach an instance of that attribute to that method. This is not feasible for a variety of good reasons; for example, how do you handle a case of an attribute placed on a member of a generic class? Most of these problems could be worked around with lots of effort, but Microsoft decided to do things the easy way and invest that effort somewhere else.

Here's what happens when you place an attribute on a code element: the compiler takes all the information necessary to construct that attribute and emits it together with that element. When you query the attribute at run time, the reflection code pulls out this information and uses it to construct the instance of the attribute at that moment. As an example, consider the following code:
[AttributeUsage(AttributeTargets.Method)]
public class SampleAttribute : Attribute
{
    public int Value { get; protected set; }
    public string Comment { get; set; }

    public SampleAttribute(int value)
    {
        Value = value;
    }
}

public class SomeClass
{
    [Sample(17, Comment = "Some text")]
    public void SomeMethod() { }
}

Now take a look at the IL for SomeMethod:
.method public hidebysig instance void  SomeMethod() cil managed
{
.custom instance void Sandbox.CSharp.SampleAttribute::.ctor(int32) = ( 
               01 00 11 00 00 00 01 00 54 0E 07 43 6F 6D 6D 65   // ........T..Comme
               6E 74 09 53 6F 6D 65 20 74 65 78 74 )             // nt.Some text
// Code size       2 (0x2)
.maxstack  8
IL_0000:  nop
IL_0001:  ret
} // end of method Base::SomeMethod

As you can see, the method metadata contains custom attribute information that tells the runtime which constructor to call, what parameter to supply to it and which property to set to which value.

Having understood all that, it's no problem to understand why the CustomAttributeBuilder class exists or how to use it.

Non-Virtual Interface: The Dark Secret

Have you ever wondered how it is that you can implement an interface method in a class, without marking it as virtual? It had been bothering me since I learned C#, but I never really had a compelling reason to solve that particular mystery. It was enough to know that it Just Worked. At least, it was until I tried to dynamically create a class and make it implement an interface too.

I'll stop here to make a quick aside for readers who know only C# (and/or Java) and might wonder why I think that interface methods should necessarily be virtual. A virtual method is late-bound: you don't know at compile time which implementation will be invoked when someone calls the method. Because of that, there's a layer of indirection between the method declaration and the method body; the class that defines the body of a virtual method -- either for the first time or by overriding it -- "configures" this layer of indirection to "point" to that particular body. An interface is nothing but a bunch of abstract (and therefore virtual) methods. That's why it seems counter-intuitive to implement an interface method without specifying the "virtual" keyword.

If you try to run the following code, you'll get an interesting exception:
public interface IMyInterface
{
    void MyMethod();
}

class Program
{
    static void Main(string[] args)
    {
        AssemblyBuilder assembly = AppDomain.CurrentDomain.DefineDynamicAssembly(
            new AssemblyName() { Name = "MyAssembly" },
            AssemblyBuilderAccess.Run);
        ModuleBuilder module = assembly.DefineDynamicModule("MyModule");
        TypeBuilder myClass = module.DefineType(
            "MyClass",
            TypeAttributes.Class | TypeAttributes.Public,
            typeof(Object),
            new Type[] { typeof(IMyInterface) });
        MethodBuilder myMethod = myClass.DefineMethod(
            "MyMethod",
            MethodAttributes.Public | MethodAttributes.HideBySig,
            typeof(void),
            Type.EmptyTypes);
        ILGenerator il = myMethod.GetILGenerator();
        il.Emit(OpCodes.Ret);
        Type myClassType = myClass.CreateType();
    }
}

The exception you'll get is TypeLoadException and it'll be complaining about MyClass not implementing MyMethod. If you add the MethodAttributes.Virtual flag to the method attributes, the problem will disappear.

But what happens if you don't want MyMethod to be virtual? That's not really an option. For the reasons I explained above, the method must be virtual. What you might want is to make sure it cannot be overridden. To do that, you have to also add the MethodAttributes.Final flag to the method attributes. That will give you the equivalent of writing:
public class MyClass : IMyInterface
{
    public void MyMethod() { }
}

The funny thing is that you can't declare a method as virtual sealed, yet that's exactly what the compiler does under the hood. I guess it's just one of those trade-offs: either you use a slightly leaky abstraction or you risk antagonizing people by making them learn the underlying concepts thoroughly.

Naming Your Type: Basic Hygiene

Isn't it funny how something as useful as the string type can also cause so much trouble? Sooner or later everyone has to learn to sanitize their strings or risk having their software let the little Bobby Tables wreak havoc on their data:

I must admit, though, that I didn't expect to have to sanitize the name when creating a type at run time. I found out, quite unexpectedly, that TypeBuilder seems to be allergic to types whose names contain a comma. Not that it blows up right away. You can use a name riddled with commas and still be fine when you create the type. What you can't do is use that type as a parent type. If you do, TypeBuilder will blow up with a rather unhelpful COMException saying "Record not found on lookup."

If you google for it, you'll eventually find out that there's a bug in reflection, supposedly triggered by strings that contain any of the characters []*&+,\ in the type name. My experiments indicate that the only problematic character seems to be the comma, but I'm getting rid of all of those characters anyway.

It's a nasty little bug and it's not documented in MSDN. I figured I should warn people about it, since it's such a comma mistake after all.

Metadata Tokens: Remembering Past Lives

There are two stages in the life of a dynamic class: before and after the call to CreateType. In the first stage, you're using the TypeBuilder to define type features, such as fields, properties, methods, nested types, etc. When you're done defining your type, you call the CreateType method and the magic reflection fairy turns your TypeBuilder into a real live Type. But your little Pinocchio can either be a marionette or a real boy, and never the twain shall meet.

In practice, this means that a FieldBuilder you get from calling DefineField on your TypeBuilder can only be used to define the field. Even though the FieldBuilder inherits from FieldInfo, you cannot use it to get or set the field value on the instances of the type obtained by calling CreateType on the TypeBuilder. If you try to do so, you'll get a NotSupportedException, informing you that the "invoked member is not supported in a dynamic module". Same thing applies to other builder classes.

Looking around the documentation, you'll see that the usual workaround seems to be to look up the member by its name in the newly created type. That can get ugly, though. For example, consider what you would have to do to look up a one of the several methods with the same name. The infuriating thing is that you already have the MethodBuilder for it, which is the equivalent of a MethodInfo in its TypeBuilder, so there should be a relatively painless way of getting its counterpart in a live Type.

It turns out that there is a rather painless way of doing it and it's not even a dirty hack. Every type and every member in has a metadata token that identifies it uniquely within its module. If you have the metadata token for a method, you can resolve it into a MethodBase by calling ResolveMethod on the method's Module. The usual way of obtaining a metadata token is via the MetadataToken property defined in the MemberInfo class. That won't work on a builder class, though. You'll get an InvalidOperationException complaining that the "operation is not valid due to the current state of the object".

In order to get the metadata token from a builder class, you have to use a corresponding method in the ModuleBuilder class. For example, if you have a MethodBuilder and want to get its metadata token, you have to call GetMethodToken on its ModuleBuilder. When you put it all together, converting a MethodBuilder for a TypeBuilder into its counterpart MethodInfo for the resulting live Type can be done like this:
public static MethodInfo GetLiveMethod(MethodBuilder method)
{
    return (MethodInfo) method.Module.ResolveMethod(
        ((ModuleBuilder) method.Module).GetMethodToken(method).Token);
}

Clean, concise and probably more efficient.

Generic Parameters: Working Around The Taint

When I was checking out Ruby, I ran into this neat concept of taint. It stuck with me, probably because I read about it in The Pragmatic Programmer's Guide, whose style is rather memorable. And, of course, because the concept is so neat.

What, then, is taint? Besides being "the opposite of tis", 'tis a way of keeping little Bobby Tables out of mischief. The idea is simple: anything coming from the outside user is considered tainted and anything derived from a tainted value is also considered tainted; if taint checking is active, the tainted values cannot be used for anything dangerous, such as SQL queries or host operating system commands.

What does this have to do with .NET reflection? The generic parameters for dynamic types and dynamic methods behave in a similar fashion. Specifically, if you call an otherwise innocent method MakeGenericType on an existing generic Type, you'll wind up with a Type with limited capabilities. Calls like GetField will fail on such a Type.

The workaround is actually documented in MSDN, which is why I saved this particular landmine for the end. If you've already read this far, chances are you're going to read and remember this solution before you actually run into the problem.

Here's what you have to do instead of calling GetField on the "tainted" Type. First get the FieldInfo from the generic type itself, as in:
FieldInfo myField = typeof(MyGenericType<>).GetField("MyField");

Then, get the "tainted" type:
Type constructedType = typeof(MyGenericType<>).MakeGenericType(myGenericParamBuilder);

Finally, do the workaround magic:
myField = TypeBuilder.GetField(constructedType, myField);

Rinse and repeat until you run out of "taint".

Further KABOOM Avoidance

The .NET reflection is really powerful and flexible, but it can be a real pain in the backside if you're not careful. Now that C# 3 is here, you can opt to use the Expression<TDelegate> instead of old DynamicMethod, but the old-style TypeBuilder is not likely to go anywhere soon. I hope that this map will help you get across the minefield without getting blown up often.