The async DSL is like the thread DSL used for concurrency. Under the async DSL lies the excellent Java Concurrency API where we leverage the ExecutorService for submitting concurrent tasks. By default a thread pool with 5 threads is created. But you can pass in the core pool size if you like. For instance: from(x).async(20).to(y)
Suppose we have this simple route where we poll a folder for new files, process the files and afterwards move the files to a backup folder when complete.
from("file://inbox?move=../backup-${date:now:yyyyMMdd}")
.to("bean:calculateBean");
The route is synchronous and there is only a single consumer running at any given time. This scenario is well known and it doesn't affect thread safety as we only have one active thread involved at any given time.
Now imagine that the inbox folder is filled with filers quicker than we can process. So we want to speed up this process. How can we do this?
Well we could try adding a 2nd route with the same route path. Well that doesn't work so well as we have competing consumers for the same files. That requires however that we use file locking so we wont have two consumers compete for the same file. By default Camel support this with its file locking option on the file component. But what if the component doesn't support this, or its not possible to add a 2nd consumer for the same endpoint? And yes its a bit of a hack and the route logic code is duplicated. And what if we need more, then we need to add a 3rd, a 4th and so on.
What if the processing of the file itself is the bottleneck? That is the calculateBean is slow. So how can we process messages with this bean concurrently?
Yeah we can use the async DSL, so if we insert it in the route we get:
from("file://inbox?move=../backup-${date:now:yyyyMMdd}")
.async(10)
.to("bean:calculateBean");
So by inserting async(10) we have instructed Camel that from this point forward in the route it should use a thread pool with up till 10 concurrent threads. So when the file consumer delivers a message to the async, then the async take it from there and the file consumer can return and continue to poll the next file. By leveraging this fact we can still use a single file consumer to poll new files. And polling a directory to just grab the file handle is very fast. And we wont have problem with file locking, sorting, filtering and whatnot. And at the same time we can leverage the fact that we can process the file messages concurrently by the calculate bean.
Here at the end lets take a closer look what happens with the synchronous thread and the asynchronous thread. The synchronous thread hands over the exchange to the new async thread and as such the synchronous thread completes. The asynchronous thread is then routing and processing the message. And when this thread finishes it will take care of the file completion strategy to move the file into the backup folder. This is an important note, that the on completion is done by the async thread. This ensures the file is not moved before the file is processed successfully. Suppose the calculate bean could not process one of the files. If it was the sync thread that should do the on completion strategy then the file would have been moved to early into the backup folder. By handing over this to the async thread we do it after we have processed the message completely.
For more information about the new Async API in Camel check out my previous blog entry and the Camel documentation.
Update: async was renamed to threads in Camel 2.0 final.