Sunday, November 27, 2016

The Performance Cost of Boxing in .NET

I recently had to do some performance optimizations against a sorted dictionary that yielded some interesting results...

Background: I am used to using Tuples a lot, simply because they are easy to use and normally quite efficient. Please remember that Tuples were changed from structs to classes back in .NET 4.0.

Problem: A struct decreased performance!

I had a SortedDictionary that was using a Tuple as a key, so I thought "hey, I'll just change that tuple to a struct and reduce the memory usage." ...bad news, that made performance WORSE!

Why would using a struct make performance worse? It's actually quite simple and obvious when you think about it: it was causing comparisons to repeatedly box the primitive data structure, thus allocating more memory on the heap and triggering more garbage collections.

Solution: Use a struct with an IComparer.

I then created a custom struct and used that; it was must faster, but it was still causing boxing because of the non-generic IComparable interface. So finally I added a generic IComparer and passed that into my dictionary constructor; my dictionary then ran fast and efficient, causing a total of ZERO garbage collections!

See for yourself:

The Moral of the Story

Try to be aware of what default implementations are doing, and always remember that boxing to object can add up fast. Also, pay attention to the Visual Studio Diagnostics Tools window; it can be very informative!

Here is how many lines of code it took to achieve a 5x performance increase:

private struct MyStruct
{
    public MyStruct(int i, string s) { I = i; S = s; }
    public readonly int I;
    public readonly string S;
}
 
private class MyStructComparer : IComparer<MyStruct>
{
    public int Compare(MyStruct x, MyStruct y)
    {
        var c = x.I.CompareTo(y.I);
        return c != 0 ? c : StringComparer.Ordinal.Compare(x.S, y.S);
    }
}

Test Program

I have written some detailed comments in the Main function about what each test is doing and how it will affect performance. Let's take a look...

Saturday, November 26, 2016

10x faster than Delegate.DynamicInvoke

This is a follow up to my previous blog posts, Optimizing Dynamic Method Invokes in .NET, and Dynamically Invoke Methods Quickly, with InvokeHelpers.EfficientInvoke. Basically, I have re-implemented this for Tact.NET in a way that makes it smaller, faster, and compatible with the .NET Standard.

So, how much faster is this new way of doing things? EfficientInvoker.Invoke is over 10x faster than Delegate.DynamicInvoke, and 10x faster than MethodInfo.Invoke.

Check out the source on GitHub:

Simple Explanation

Here is an example of a method and a class that we might want to invoke dynamically...

public class Tester
{
    public bool AreEqual(int a, int b)
    {
        return a == b;
    }
}

...and then here is the code that the EfficientInvoker will generate at runtime to call that method:

public static object GeneratedFunction(object target, object[] args)
{
    return (object)((Tester)target).AreEqual((int)args[0], (int)args[1]);
}

See, it's simple!

Monday, October 31, 2016

IOC Container for Tact.NET

Autofac now supports .NET Core, but other IOC frameworks such as Ninject and Unity have yet to port over. While I was looking into IOC frameworks for .NET Core, I had a bad idea...I, for fun, wrote my own Container in Tact.NET!

So, why did I do this? Honestly, it was just a fun academic exercise! I do think that this is a pretty good container, and I intend to use it in some of my personal projects. Would I recommend that YOU use this? Probably not yet, but I would invite you take a look and offer feedback!

Container Design

I have broken the container into two interfaces: IContainer for registration, and IResolver for consumption. There is a abstract class, ContainerBase, that can be inherited to easily create a container that matches other frameworks; for example, I intend to create a IDependencyResolver for ASP.NET.

You may notice that the IContainer does not have any lifetime management methods, that is because ALL of them are implemented as extension methods...

Registrations

To create a new lifetime manager you have to implement the IRegistration interface. There are already implementations for quite a few:

Resolution Handlers

The last part of the container is the IResolutionHandler interface. These resolution handlers are used when an exact registration match is not found during dependency resolution. For example, the EnumerableResolutionHandler will use ResolveAll to get a collection of whatever type is being requested. Another very different example, the ThrowOnFailResolutionHandler will cause an exception to be thrown when no match can be found.

I think that is a pretty good start, but I am hoping that this will continue to grow with time.

Enjoy,
Tom

Introducing Tact.NET

I have decided to create a project to consolidate all of the utilities that I have authored over the years on this blog...

Tact.NET - A tactful collection of utilities for .NET development.

I believe that this will give me the opportunity to update old code more often, as well as develop new utilities faster. For example, the there is already an ILog interface that optimizes (thank you, Jeremy) the Caller Information Extensions I wrote about about back in May.

Everything that I add to this project will be build for the .NET Standard Library, and should support .NET Core on all platforms. Expect to see NuGet packages soon, and hopefully many more blog posts to come.

Enjoy,
Tom

Friday, September 30, 2016

Host HTTP and WebSockets on the Same Port in ASP.NET

How do you support WebSocket connections on the same port as your HTTP server in .NET? It turns out that this is not that hard, and you even have several choices to do so...

Option 1: TCP Proxy

You could use a TCP proxy port to divide traffic between the two targets. For example, all traffic would come in to port 80, but then HTTP traffic would be routed internally to 8001 and WS traffic to 8002. This is useful because it would allow you to use multiple technologies (in the example on their site, both NancyFX and Fleck) to host your web servers.

Option 2: SignalR

SignalR is great because of all the backwards compatibility that it supports for both the server and client. However, it is not the most lightweight framework. If you choose to use it, then it will absolutely support both HTTP and WS.

Option 3: Owin.WebSocket *My Favorite*

Owin supports WebSockets, and that is exactly what SignalR uses to host its WS connections. The awesome Mr. Bryce Godfrey extracted the Owin code from SignalR into a much smaller library called Owin.WebSocket.

The ONLY thing that I did not like about this implementation was that is uses inheritance to define endpoints, whereas I much prefer the ability to use delegates and lambdas. Because of this, I created Owin.WebSocket.Fleck, which allows you to use the Fleck API to map your WebSockets to an Owin context. A pull request is open to merge this into the main repository.

Enjoy,
Tom

Sunday, September 25, 2016

.NET Asynchronous Parallel Batch Processor

Last year, I wrote about how to handle dynamically sized batches of data in an asynchronous manner. That original implementation used an abstract base class, and only supported a single background processing thread. I recently updated that implementation to support lambdas rather than requiring inheritance, and support a dynamic number of background threads.

...basically, this is a ConcurrentQueue that supports taking a lambda and thread count to asynchronously process enqueued items.

Unit Tests

public class ParallelProcessorTests
{
    [Fact]
    public async Task NoDisposeTimeout()
    {
        var results = new ConcurrentQueue<int>();
 
        using (var processor = new ParallelProcessor<int>(2, async (i, token) =>
        {
            await Task.Delay(200, token).ConfigureAwait(false);
            results.Enqueue(i);
        }, disposeTimeoutMs: 0))
        {
            processor.Enqueue(1);
            processor.Enqueue(2);
            processor.Enqueue(3);
            processor.Enqueue(4);
            processor.Enqueue(5);
 
            await Task.Delay(300).ConfigureAwait(false);
        }
 
        Assert.Equal(2, results.Count);
    }
 
    [Fact]
    public void MaxParallelizationLimit()
    {
        const int parallelism = 3;
        var results = new ConcurrentQueue<Tuple<int, int>>();
        var active = 0;
 
        using (var processor = new ParallelProcessor<int>(parallelism, async (i, token) =>
        {
            Interlocked.Increment(ref active);
            await Task.Delay(200, token).ConfigureAwait(false);
            var currentActive = Interlocked.Decrement(ref active) + 1;
 
            var tuple = Tuple.Create(currentActive, i);
            results.Enqueue(tuple);
        }))
        {
            processor.Enqueue(1);
            processor.Enqueue(2);
            processor.Enqueue(3);
            processor.Enqueue(4);
            processor.Enqueue(5);
        }
 
        Assert.Equal(5, results.Count);
 
        var maxParallelism = results.Max(t => t.Item1);
        Assert.Equal(parallelism, maxParallelism);
    }
 
    [Fact]
    public void BatchProcessor()
    {
        var results = new List<Tuple<long, List<int>>>();
        var sw = Stopwatch.StartNew();
 
        using (var processor = new BatchParallelProcessor<int>(1, 2, async (ints, token) =>
        {
            await Task.Delay(100, token).ConfigureAwait(false);
            var tuple = Tuple.Create(sw.ElapsedMilliseconds, ints);
            results.Add(tuple);
        }))
        {
            processor.Enqueue(1);
            processor.Enqueue(2);
            processor.Enqueue(3);
            processor.Enqueue(4);
            processor.Enqueue(5);
        }
 
        Assert.Equal(3, results.Count);
 
        Assert.Equal(2, results[0].Item2.Count);
        Assert.Equal(1, results[0].Item2[0]);
        Assert.Equal(2, results[0].Item2[1]);
 
        Assert.True(results[0].Item1 < results[1].Item1);
        Assert.Equal(2, results[1].Item2.Count);
        Assert.Equal(3, results[1].Item2[0]);
        Assert.Equal(4, results[1].Item2[1]);
 
        Assert.True(results[1].Item1 < results[2].Item1);
        Assert.Equal(1, results[2].Item2.Count);
        Assert.Equal(5, results[2].Item2[0]);
    }
}

Sunday, August 21, 2016

Dynamically Invoke Methods Quickly, with InvokeHelpers.EfficientInvoke

In my previous blog post, I talked about Optimizing Dynamic Method Invokes in .NET. In this post, we will use that information to create a static helper method that is twice as fast as MethodInfo.Invoke.

Basically, we create and cache a delegate in a concurrent dictionary, and then cast both it and it's arguments to dynamics and invoke them directly. The concurrent dictionary introduces overhead, but it still more than twice as fast as calling MethodInfo.Invoke. Please note that this method is highly optimized to reduce the use of hash code look ups, property getters, closure allocations, and if checks.

let's take a look at the code...

InvokeHelpers.EfficientInvoke

public static class InvokeHelpers
{
    private const string TooManyArgsMessage = "Invokes for more than 10 args are not yet implemented";
 
    private static readonly Type VoidType = typeof(void);
 
    private static readonly ConcurrentDictionary<Tuple<string, object>, DelegatePair> DelegateMap 
        = new ConcurrentDictionary<Tuple<string, object>, DelegatePair>();
 
    public static object EfficientInvoke(object obj, string methodName, params object[] args)
    {
        var key = Tuple.Create(methodName, obj);
        var delPair = DelegateMap.GetOrAdd(key, CreateDelegate);
            
        if (delPair.HasReturnValue)
        {
            switch (delPair.ArgumentCount)
            {
                case 0: return delPair.Delegate();
                case 1: return delPair.Delegate((dynamic)args[0]);
                case 2: return delPair.Delegate((dynamic)args[0], (dynamic)args[1]);
                case 3: return delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2]);
                case 4: return delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3]);
                case 5: return delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4]);
                case 6: return delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5]);
                case 7: return delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6]);
                case 8: return delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7]);
                case 9: return delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7], (dynamic)args[8]);
                case 10: return delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7], (dynamic)args[8], (dynamic)args[9]);
                default: throw new NotImplementedException(TooManyArgsMessage);
            }
        }
 
        switch (delPair.ArgumentCount)
        {
            case 0: delPair.Delegate(); break;
            case 1: delPair.Delegate((dynamic)args[0]); break;
            case 2: delPair.Delegate((dynamic)args[0], (dynamic)args[1]); break;
            case 3: delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2]); break;
            case 4: delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3]); break;
            case 5: delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4]); break;
            case 6: delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5]); break;
            case 7: delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6]); break;
            case 8: delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7]); break;
            case 9: delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7], (dynamic)args[8]); break;
            case 10: delPair.Delegate((dynamic)args[0], (dynamic)args[1], (dynamic)args[2], (dynamic)args[3], (dynamic)args[4], (dynamic)args[5], (dynamic)args[6], (dynamic)args[7], (dynamic)args[8], (dynamic)args[9]); break;
            default: throw new NotImplementedException(TooManyArgsMessage);
        }
 
        return null;
    }
 
    private static DelegatePair CreateDelegate(Tuple<string, object> key)
    {
        var method = key.Item2
            .GetType()
            .GetMethod(key.Item1);
 
        var argTypes = method
            .GetParameters()
            .Select(p => p.ParameterType)
            .Concat(new[] { method.ReturnType })
            .ToArray();
 
        var newDelType = Expression.GetDelegateType(argTypes);
        var newDel = Delegate.CreateDelegate(newDelType, key.Item2, method);
 
        return new DelegatePair(newDel, argTypes.Length - 1, method.ReturnType != VoidType);
    }
 
    private class DelegatePair
    {
        public DelegatePair(dynamic del, int argumentCount, bool hasReturnValue)
        {
            Delegate = del;
            ArgumentCount = argumentCount;
            HasReturnValue = hasReturnValue;
        }
 
        public readonly dynamic Delegate;
        public readonly int ArgumentCount;
        public readonly bool HasReturnValue;
    }
}

Now let's take a look at some performance tests...

Real Time Web Analytics