Wednesday, July 16, 2008

The Cost Of Linq And IQueryable

The Language Integrated Query (Linq) is part of the .NET 3.5. We can easily do Linq queries against IEnumerable objects using very concise syntax by passing a delegate or the new Lambda expression. You can even create your own Linq Provider and implement the IQueryable interface, building something like Linq to Google or Linq to MySql.

It's super great, right? Basically yes, but remember a lot of new features from Microsoft have a cost. In my previous post we discussed the Linq deferred execution issue. Today let's do some tests to see how expensive the Linq queries are. Below is the simple test code. We build a simple key/Value collection (array). Because all collections including array have implemented IEnumerable interface, we can query the objects inside the array with Linq. Furthermore, we can convert the array object to be IQueryable and do query over IQueryable.
using System;
using System.Linq;
using System.Collections.Generic;
using System.Diagnostics;

class Program
{
    // A simple class presenting key/value pair
    class KeyValue
    {
        public KeyValue(string key, int value)
        {
            Key = key;
            Value = value;
        }
        public string Key { get; set; }
        public int Value { get; set; }
    }

    // Simple linear search 
    static KeyValue LinearSearch(KeyValue[] keyValues, string key)
    {
        foreach (KeyValue item in keyValues)
        {
            if (item.Key == key)
                return item;
        }
        return null;
    }

    /// <summary>
    /// Compare search performance with three methods:
    /// 1. Manually go through each object inside collection
    /// 2. Linq search by IEmuerable<T>.Where(lambda_expression);
    /// 3. Convert to IQueryable and then IQueryable<T>.Where(lambda_expression)
    /// </summary>
    /// <param name="keyValues">The array to be search on</param>
    /// <param name="key">Search object by KeyValue.Key</param>
    /// <param name="repeat">Number for repeating the search</param>
    static void SearchTest(KeyValue[] keyValues, string key, int repeat)
    {
        Stopwatch watch = new Stopwatch();

        // Test linear search
        watch.Start();
        for (int i = 0; i < repeat; i++)
            LinearSearch(keyValues, key);
        watch.Stop();
        Console.WriteLine("Linear search over {0} objects (repeating {1} times) takes {2} ms",
            keyValues.Count(), repeat, watch.ElapsedMilliseconds);
        watch.Reset(); watch.Start();

        // Test Linq search
        for (int i = 0; i < repeat; i++)
            keyValues.FirstOrDefault(item => item.Key == key);
        watch.Stop();
        Console.WriteLine("Linq search over {0} objects (repeating {1} times) takes {2} ms",
            keyValues.Count(), repeat, watch.ElapsedMilliseconds);

        // Test IQueryable search
        IQueryable<KeyValue> queryables = keyValues.AsQueryable();
        watch.Reset(); watch.Start();
        for (int i = 0; i < repeat; i++)
            queryables.FirstOrDefault(item => item.Key == key);
        watch.Stop();
        Console.WriteLine("Linq IQueryable search over {0} objects (repeating {1} times) takes {2} ms",
            keyValues.Count(), repeat, watch.ElapsedMilliseconds);
        Console.WriteLine();
    }


    static void Main()
    {
        // Build test data
        KeyValue[] keyValues1 =
            Enumerable.Range(0, 1000).Select(i => new KeyValue(i.ToString(), i)).ToArray();
        KeyValue[] keyValues2 =
            Enumerable.Range(0, 2000).Select(i => new KeyValue(i.ToString(), i)).ToArray();
        KeyValue[] keyValues3 =
            Enumerable.Range(0, 3000).Select(i => new KeyValue(i.ToString(), i)).ToArray();

        // Start Test
        SearchTest(keyValues1, "5000", 1000);
        SearchTest(keyValues2, "10000", 1000);
        SearchTest(keyValues3, "20000", 1000);

        Console.Read();
    }
}
Result:


As expected Linq to objects is slightly slower than the regular linear search, and we can assume Linq uses linear search internally. But IQueryable search is much slower than others. Why that happened? Not like Linq to object where Lambda expression is passing as a function delegate, Lambda expression passed to the IQueryable is converted to Expression tree which is a quite expensive.

The interesting thing is that explicit delegate passed to IQueryable won't be converted to Expression tree. In following code there's no noticeable performance difference between the two queries.
    KeyValue keyValueObj = keyValues.FirstOrDefault(
delegate(KeyValue obj) { return obj.Key == key; });
IQueryable<KeyValue> queryables = keyValues.AsQueryable();
keyValueObj = queryables.FirstOrDefault(
delegate(KeyValue obj) { return obj.Key == key; });
Conclusion: Linq to objects are working fine for .NET collections, but be cautious to make a collection as IQueryable (collection.AsQueryable() method) if you are using Lambda expression inside the IQueryable queries, because there's performance degradation for it.