Saturday, June 28, 2008

.NET Linq Deferred Execution

The new Linq (Language Integrated Query) feature shipped with .NET 3.5 is fantastic. With Linq now we can write more concise and meaningful queries. Like anonymous method pitfall we discussed in previous post, Linq has a similar behavior called deferred execution; it could be problematic if you are not aware of that. Let's look at following code:
using System;
using System.Linq;
using System.Collections.Generic;

class Program
{
static void Main()
{
// Test Data
string[] names = new string [] { "NameA", "NameB"};
string _name = "NameA";

// Query by delegate
IEnumerable<string> searchName1 = names.Where(
delegate(string name)
{
return name == _name;
});
// Query by Lambda expression
IEnumerable<string> searchName2 = names.Where(name => name == _name);

// Rename the search keyword
_name = "NameB";

// Redo the queries
IEnumerable<string> searchName3 = names.Where(
delegate(string name)
{
return name == _name;
});
IEnumerable<string> searchName4 = names.Where(name => name == _name);

Console.WriteLine("{0} \t {1} \t {2} \t {3}",
searchName1.First(), searchName2.First(), searchName3.First(), searchName4.First());

Console.Read();
}
}
What's the result? you may think it must be "NameA NameA NameB NameB". But you will get "NameB NameB NameB NameB" instead if you run the console application.

Why's that? Because Linq's Where search is by default a deferred execution function. For example, the statement
string searchName2 = names.Where(name => name == _name);
is telling the compiler that we have a Lambda expression attached to the Where search. But it's not invoked until we are actually reading data from the search result. So searchName1, searchName2, searchName3 and searchName4 in our case are the same because they all compare the same value when their condition is examined.

How to avoid this issue? Reading data immediately after the query such as looping through the data inside the IEnumerable collection. The other way is use ToList() or ToArray() methods to force immediate execution of a Linq query.

Following Linq methods have deferred execution behavior:
Except, Take, TakeWhile, Skip, SkipWhile, Where
While the others don't have such behavior, and will be executed immediately:
Any, Average, Contains, Count, First, FirstOrDefault, Last, LastOrDefault, Single,
SingleOrDefault, Sum, Max, Min, ToList, ToArray, ToDictionary, ToLookup
How to remember all this? The tip is to look at the return type of the method. It would be deferred execution if the return type is IEnumerable<TSource>. Why? Yield return is used by those methods. That's the root cause of the deferred execution.

Thursday, June 05, 2008

C# Anonymous Delegate Pitfall

What is output of following code?
using System;
using System.Collections.Generic;
using System.Text;

namespace AnonymousDelegate
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("AnonymousDelegate1:");
AnonymousDelegate1();
Console.WriteLine();

Console.WriteLine("AnonymousDelegate2:");
AnonymousDelegate2();
Console.WriteLine();

Console.WriteLine("AnonymousDelegate3:");
AnonymousDelegate3();
Console.Read();
}

static void AnonymousDelegate1()
{
List<Action> actions = new List<Action>();
for (int i = 0; i < 3; i++)
{
actions.Add(delegate() { Console.WriteLine(i); });
}
foreach (Action action in actions)
{
action();
}
}

static void AnonymousDelegate2()
{
List<Action> actions = new List<Action>();
for (int i = 0; i < 3; i++)
{
int temp = i;
actions.Add(delegate() { Console.WriteLine(temp); });
}
foreach (Action action in actions)
{
action();
}
}

static void AnonymousDelegate3()
{
List<Action> actions = new List<Action>();
int temp;
for (int i = 0; i < 3; i++)
{
temp = i;
actions.Add(delegate() { Console.WriteLine(temp); });
}
foreach (Action action in actions)
{
action();
}
}
}
}
The result may surprise you:
AnonymousDelegate1:
3
3
3

AnonymousDelegate2:
0
1
2

AnonymousDelegate3:
2
2
2
Why is inconsistent for each test? That stems from the way how .NET handle anonymous delegate or anonymous method.

.NET compiler will create a delegate and a static method for an anonymous delegate, just like a traditional delegate. There's no anonymous concept in intermediate language (IL) in any .NET assembly.

The thing becomes more interesting when a variable used inside anonymous delegate/method is declared outside the anonymous delegate/method, which is called closure scenario. .NET compiler will wrap the anonymous method and the variable to a sealed class: anonymous method becomes a class method, variable becomes a public member. We can examine this by looking inside the IL code (click to see bigger picture):



As we can see there're 3 inner classes (<>c__DisplayClass2/5/9 in red boxes) generated for 3 anonymous methods. All three classes have the same structure:
private sealed class ComiplerGeneratedClass
{
public int i;
public void ActionMethod()
{
Console.WriteLine(this.i);
}
}
That makes sense but why having different results? Let's check the IL code for AnonymousDelegate1:



We can see that only one AnonymousClass object is created, and AnonymousClass.i is used for looping condition (for loop) check.

But when we look at the IL code for method of AnonymousDelegate2, we found AnonymousClass is created 3 times inside the for loop. To be more readable, we translate the IL code back to C# code for all three test methods:
      private sealed class AnonymousClass // Generated by compiler
{
public int i;
public void ActionMethod()
{
Console.WriteLine(this.i);
}
}

static void AnonymousDelegate1()
{
AnonymousClass anonymousClass = new AnonymousClass();
List<Action> actions = new List<Action>();
anonymousClass.i = 0;
for (; anonymousClass.i < 3; anonymousClass.i++)
{
Action action = new Action(anonymousClass.ActionMethod);
actions.Add(action);
}
foreach (Action action in actions)
{
action.Invoke();
}
}

static void AnonymousDelegate2()
{
List<Action> actions = new List<Action>();
for (int i = 0; i < 3; i++)
{
AnonymousClass anonymousClass = new AnonymousClass();
anonymousClass.i = i;
Action action = new Action(anonymousClass.ActionMethod);
actions.Add(action);
}
foreach (Action action in actions)
{
action.Invoke();
}
}

static void AnonymousDelegate3()
{
AnonymousClass anonymousClass = new AnonymousClass();
List<Action> actions = new List<Action>();
for (int i = 0; i < 3; i++)
{
anonymousClass.i = i;
Action action = new Action(anonymousClass.ActionMethod);
actions.Add(action);
}
foreach (Action action in actions)
{
action.Invoke();
}
}
Now the answer is clear. .NET compiler generates different code depending on the scale of closure variable. It looks very confusing at the beginning and it's easy to write buggy code if we don't understand this.

We have to put it in mind that a real class is created, and a public member is used to store a closure variable for anonymous methods with closure. If not sure how the closure variable is handled, use minimum scale of local variable and pass it to the anonymous method, like what AnonymousDelegate2 does; that ensures each anonymous method using independent variable, but be aware of consuming more resource in such case because more anonymous classes are generated at run time.