Friday, August 10, 2012

Async Mutex – Control flow in asynchronous functionality

 

Hi All,

In this post I am going to continue talking about asynchronous execution. Having read about async one might think that mutual exclusion and flow control structures are no longer required but this is not the case. For async execution you need similar structures with the async twist to them. In this post I am going to present a simple case where an async mutex is needed and used.

The scenario is quite simple. Through async calls you acquire some data structure, calculate some value and add it to the data structure (which affects all subsequent calculations). In my example I would like to calculate the first 7 elements of the Fibonacci series. For the sake of the argument I will assume that each operation must be performed asynchronously.

(Sorry C# guys, this code is in JavaScript).

 var data = [0, 1];

function retrieveData(callback) {
var x = data[data.length - 1];
var y = data[data.length - 2];
setTimeout(function() {
callback(x, y);
}, 0);
}

function calculateSum(x, y, callback) {
var sum = x + y;
setTimeout(function() {
callback(sum);
}, 0);
}

function addSum(sum, callback) {
data.push(sum);
setTimeout(callback, 0);
}

//Adds count fibonnacci elementd to data
function fibonacci(count) {
var i;
for ( i = 0; i < count; i++) {
retrieveData(function(x, y) {
calculateSum(x, y, function(sum) {
addSum(sum, function() {
console.log(data);
});
});
})
}
}

So what do we have here…  The next Fibonacci number is calculated through three async calls. The first call to retrieveData takes the last two numbers on the array and returns them through an async call to a callback. Think of it as retrieving a record from a remote repository or database. Next we have an async call that does the “logic”. It calculates the sum of the two numbers through the calculateSum function. The sum itself is then passed to a callback which adds it back to the original array, again through an async call. When I run this code on V8 I get the following result:


fibonacci(5);


5 x [0, 1, 1, 1, 1, 1, 1]


5 times the array you see above. But why? Lets look at the loop and try to trace what happens. The loop runs 5 times without interruptions (we are not multi-threaded). Each run it schedules the call to retrieveData on the event loop. Next, the function fibonacci exits and retrieveData is run five times in a row. At this point all five execution scenarios run on the same two values of the array – 0 and 1. Next, calculateSum and addSum run (each runs five times in a row). The end result is now clear. What we need here is to tell the event loop that it can run the functions asynchronously but only one execution may run within the loop. To this end I have implemented an async mutex:

   App.AsyncMutex = function() {
this.queue = [];
this.ownerToken = null;
this.token = 0;
};

App.AsyncMutex.prototype.enter = function(callback) {
if (this.ownerToken !== null) {
this.queue.push(callback);
} else {
this.token++;
this.ownerToken = this.token;
callback(this.token);
}
}

App.AsyncMutex.prototype.leave = function(token) {
if (this.ownerToken === null || this.ownerToken !== token) {
throw new Error("Owner token mismatch. Expected " + this.ownerToken + " but received " + token);
};

var callback;

if (this.queue.length > 0) {
this.token++;
this.ownerToken = this.token;
callback = this.queue.shift();
callback(this.token);
} else {
this.ownerToken = null;
}
}




The async mutex implementation is very simple. To protect some code simply wrap it with a call to mutex.enter. In the callback you receive a token which you use when you are done with the protected code and you call mutex.leave(token). The mutex assures that only one executer is within the protected code and adds all the others to a queue. Once the executer is done it runs the next executer in the list.


To use the mutex we need only a tiny code change:

      function fibonacci(count) {
var i, mux = new App.AsyncMutex();
for ( i = 0; i < count; i++) {
mux.enter(function(token) {
retrieveData(function(x, y) {
calculateSum(x, y, function(sum) {
addSum(sum, function() {
console.log(data);
mux.leave(token);
});
});
});
});
}
}
Run the code again and the result is:
[0, 1, 1] 

[0, 1, 1, 2]

[0, 1, 1, 2, 3]

[0, 1, 1, 2, 3, 5]

[0, 1, 1, 2, 3, 5, 8]

Exactly as we want it to be.

 

 

Thank you for reading.

Boris.

 


Friday, August 3, 2012

Simply Explained – Async

 

Hi All,

The blog is back after a somewhat long break. The break was due to two major changes in my life, the first is that I moved to a new apartment and had to deal with lots of new apartment stuff. The second is that I made a career change of 180 degrees (in terms of technological stack) and moved from Windows-.NET-WPF to Linux-Javascript-NodeJS. This means that you will probably see a change in content in future posts.

Anyway…

Modern applications are becoming more demanding in CPU resources. The CPU manufacturers used to deal with this demand by increasing the clock speed of the processor and by improving the internal architecture. In recent years (~10) this trend has changed to building multi-code processors and the dependence on multi threaded designs to speed up the application. In the world of .NET dealing with threads was usually done by experts as there are many pitfalls such as race conditions and mutual exclusion to be considered. In the last two versions of .NET framework this entire area was dramatically improved by introducing many new classes and the TPL framework and integration into the language in C# 5. You can read more about it in Pavel’s blog here and in this nice post that summarizes some of the key features of multi threaded libraries in .NET 4.

Parallel to the trend of multi-threaded design another trend evolved (or rather forced). The event loop design. It is not new by any means and in fact Windows is using this concept from the very first versions. The basic idea is that there is a queue of events and a process that de-queues the events one by one and handles them. The events may come from external sources like IO or internally from the process. In this post I will demonstrate how to implement a simple factorial method in the three paradigms:

Simple Code

In this paradigm we don’t do anything special and just write the code ignoring the implications of a long running method. The code would look something like that:

The simple way.
Code Snippet
  1. private BigInteger Factorial(int n)
  2. {
  3.   BigInteger result = 1;
  4.   for (int i = 2; i <= n; i++)
  5.   {
  6.     result = result * i;
  7.   }
  8.  
  9.   return result;
  10. }

 

This is how you would use it.
Code Snippet
  1. myLabel.Content = Factorial(50000).ToString();

The main problem here is that when this code is run, everything else doesn’t. The UI will appear to be stuck since it is not being refreshed and no interactions are possible with the process.

What you would probably do in this case is jump straight to the multi-threaded paradigm.

Multi-Threaded

In this paradigm we want to call our calculation on a background thread and once it is done let it report the result on the UI thread (updating the label). The code would be something like this:

Running the calculation in the background.
Code Snippet
  1. Thread thread = new Thread(new ParameterizedThreadStart(ThreadedFactorial));
  2. thread.IsBackground = true;
  3. thread.Start(50000);

 

The code that runs on the background thread.
Code Snippet
  1. public void ThreadedFactorial(object n)
  2. {
  3.   BigInteger result = Factorial(Convert.ToInt32(n));
  4.   this.Dispatcher.Invoke(new Action<string>(PrintResult), new object[] { result.ToString() });
  5. }

 

The result is reported on the UI thread.
Code Snippet
  1. public void PrintResult(string result)
  2. {
  3.   myLabel.Content = result;
  4. }

This implementation solves the issues with the simple implementation and is quite simple. The problems begin when there are resources shared between threads. You have to use locks, semaphores, and other thread synchronization constructs to make things work right. When the code gets larger you find yourself thinking “but what if the context switch is between the “if” statement and the assignment” and end up adding a lock on the entire method. We all been there. (And I will not start with how hard it is to debug concurrency issues.)

One more thing to take into account is that some environments don’t have threads (not .NET). Your browser running JavaScript probably does it on a single thread (the JavaScript not the browser).

The next paragraph shows how to write the same functionality on a single threaded system.

Event-Loop

The dispatcher is a “magical” object. It allows you not only to sync calls from different threads but also to schedule calls on the same thread (the UI thread in our case). Note the second parameter of the method Dispatcher.BeginInvoke is DispatcherPriority which allows you to add your method execution to a specific priority within the event loop of the UI thread. If you are able to break the calculation (or any other task) to small enough pieces then you can schedule those pieces one after the other on the event loop. This will insure that other events such as repaints and IO inputs are still handled by the process since they are interlaced within your calculation. The code may look something like this:

 

Calling the factorial method. Note the last parameter is the callback that should be run once the method is finished. Passing the callback as the last parameter is a common practice in this paradigm.
Code Snippet
  1. PartialFactorial(1, 1, 50000, new Action<string>(PrintResult));

 

The partial factorial method.
Code Snippet
  1. public void PartialFactorial(BigInteger result, int current, int end, Action<string> callback)
  2. {
  3.   BigInteger tempResult = result;
  4.   int i;
  5.   for (i = current; i <= current + 100 && i <= end; i++)
  6.   {
  7.     tempResult = tempResult * i;
  8.   }
  9.  
  10.   if (i <= end)
  11.   {
  12.     this.Dispatcher.BeginInvoke(new Action<BigInteger, int, int, Action<string>>(PartialFactorial), System.Windows.Threading.DispatcherPriority.Background, new object[] { tempResult, i, end, callback });
  13.   }
  14.   else
  15.   { callback(tempResult.ToString()); }
  16. }

There are a couple things to notice here. First, I perform only 100 calculations and then schedule the rest by invoking the same method with the new parameters. The scheduling is done with the Background priority which is after all the IO and rendering is already finished. If the calculation is complete I call the callback provided for me by the original caller. I could schedule the callback call as well.

The main advantage here is that you don’t need to deal with any threads as everything is done on a single thread. There are several disadvantages that I should mention:

  1. The code is a bit awkward. One can get used to this but it will always seem a bit clogged compared to the clean code of the multi-threaded approach. This is mainly due to the specific example I give here. Many processes can be broken into very small work units without making the flow so hard to understand.
  2. There is a loss of call stacks. Every time you schedule a method to the event-loop you are basically starting a new call stack. This is for better and for worse (no stack overflow as in recursion).
  3. The framework has to support this approach. In the example above one of the most time consuming methods was actually converting the BigInt to string (if you talk binary you know why it takes so long). There was no simple way to break this method to smaller bits so the UI is stuck while in this method. Frameworks like NodeJS which are built on the principle of event loop will make sure that no method blocks for that long.

 

I ran a small performance test on all three methods. And the event loop approach was a little slower than the other two but not by a factor. For a factorial of 50000 it took the first two methods ~7.5 seconds to complete while the event loop method took ~8 seconds. I tried different number of iterations within the PartialFactorial methods. Values above 300 made the UI cough a little bit. With a value of 100 there were no visible penalties.

Thank you for reading,

Boris