I am a great fan of
Staged Event Driven Architecture. Few of my architectures and designs are heavily influenced by this approach and thus have resulted in a proven high performance and low latent systems. Here I'm going to discuss about how an application can be designed using SEDA approach. Later, when I get time, I'll write about designing systems with different queuing systems like AMQP, etc.
In multithreaded scenarios we might encounter a situation when the consumer threads have to wait until there is a minimum number of jobs available in the job queue/pool they are working on.
Consider the case when a consumer thread is instructed to
do the work after fetching at least
m jobs from a job pool. A typical approach would be to
- Lock a mutex
- Check if there are at least m jobs in the pool.
- If so, fetch them and do the work.
- Else, sleep for sometime and check again.
There are few unknowns in this approach:
- How long should we sleep?
- If we sleep too much, we'll do less work.
- If we don't sleep enough, we might waste CPU cycles by waking up too early.
- In a publisher-consumer model, there is no way for the publishers to invoke the consumers only via mutex.
This is when conditional variables come really handy. With conditional variables, the consumers can wait/sleep until they are signaled/waken by the publishers. It is also possible for a publisher to wake up only one consumer or to wake up all the consumers at once.
Assuming we have a 2 staged queues, the consumer thread can wait on the stage1 if there is nothing to be popped from it like this:
The corresponding publisher can wake up the consumers like this:
With this approach, there can be multiple stages and there can be thread pools for each stage. These stages can be tuned perfectly to match the performance or throughput or latency criteria that we need to meet. For instance, in the above example, only 1 job was popped off by the consumer. If the time to process the job is less than the time to pop the job, then that stage can be tuned to shed out at least
m jobs (in a list or so).
Check
this code for more elaborate example.