Thursday, July 23, 2015

Scatter Gather - Using Java 8 CompletableFuture and Rx-Java Observable

I wanted to explore a simple scatter-gather scenario using Java 8 CompletableFuture and using Rx-Java Observable.


The scenario is simple - Spawn about 10 tasks, each returning a string, and ultimately collect the results into a list.

Sequential

A sequential version of this would be the following:

public void testSequentialScatterGather() throws Exception {
 List<String> list =
   IntStream.range(0, 10)
     .boxed()
     .map(this::generateTask)
     .collect(Collectors.toList());

 logger.info(list.toString());
}

private String generateTask(int i) {
 Util.delay(2000);
 return i + "-" + "test";
}

With CompletableFuture

A method can be made to return a CompletableFuture using a utility method called supplyAsync, I am using a variation of this method which accepts an explicit Executor to use, also I am deliberately throwing an exception for one of the inputs:

private CompletableFuture<String> generateTask(int i,
  ExecutorService executorService) {
 return CompletableFuture.supplyAsync(() -> {
  Util.delay(2000);
  if (i == 5) {
   throw new RuntimeException("Run, it is a 5!");
  }
  return i + "-" + "test";
 }, executorService);
}

Now to scatter the tasks:

List<CompletableFuture<String>> futures =
  IntStream.range(0, 10)
    .boxed()
    .map(i -> this.generateTask(i, executors).exceptionally(t -> t.getMessage()))
    .collect(Collectors.toList());

At the end of scattering the tasks the result is a list of CompletableFuture. Now, to obtain the list of String from this is a little tricky, here I am using one of the solutions suggested in Stackoverflow:

CompletableFuture<List<String>> result = CompletableFuture.allOf(futures.toArray(new CompletableFuture[futures.size()]))
  .thenApply(v -> futures.stream()
       .map(CompletableFuture::join)
       .collect(Collectors.toList()));

CompletableFuture.allOf method is being used here purely to compose the next action to take once all the scattered tasks are completed, once the tasks are completed the futures are again streamed and collected into a list of string.

The final result can then be presented asynchronously:
result.thenAccept(l -> {
 logger.info(l.toString());
});


With Rx-java Observable

Scatter gather with Rx-java is relatively cleaner than the CompletableFuture version as Rx-java provides better ways to compose the results together, again the method which performs the scattered task:

private Observable<String> generateTask(int i, ExecutorService executorService) {
    return Observable
            .<String>create(s -> {
                Util.delay(2000);
                if ( i == 5) {
                    throw new RuntimeException("Run, it is a 5!");
                }
                s.onNext( i + "-test");
                s.onCompleted();
            }).onErrorReturn(e -> e.getMessage()).subscribeOn(Schedulers.from(executorService));
}

and to scatter the tasks:

List<Observable<String>> obs =
        IntStream.range(0, 10)
            .boxed()
            .map(i -> generateTask(i, executors)).collect(Collectors.toList());

Once more I have a List of Observable's, and what I need is a List of results, Observable provides a merge method to do just that:

Observable<List<String>> merged = Observable.merge(obs).toList();

which can be subscribed to and the results printed when available:

merged.subscribe(
                l -> logger.info(l.toString()));

6 comments:

  1. Thanks for rx example. One thing that's different to CompletableFuture - how to wait on parent thread until all items are processed?
    In CompletableFuture's I can just do join() on the combined future.
    In Rx, I couldn't find the equivalent.

    ReplyDelete
    Replies
    1. You can make a observable blocking using the toBlocking method though.

      Delete
  2. I don't understand the benefit of using `allOf` in your code. The individual tasks are invoked sequentially using `join`, so task `i + 1` is not invoked until task `i` is complete. I didn't find anything in the documentation that allows for invoking all the subtasks parallely. The closest thing seems to be `ForkJoinPool .invokeAll` that takes a bunch of `Callable`.

    ReplyDelete
    Replies
    1. Thanks Abhijit, I have not seen the internal implementation of the `allOf` but API description is that the tasks are actually running concurrently in the appropriate threadpool and not waiting for the join to be called before executing the task. In my mind allOf provides a way to combine the results and get another CompleteableFuture when individual tasks are complete, this way providing a hook to continue once individual tasks are complete.

      Delete
    2. Thank you, and I stand corrected. I did a test run for a sample of 2 tasks, those indeed seem to be invoked parallelly, though I've yet to check the degree of concurrency (meaning how does the framework decide how many tasks to trigger concurrently if there are many).

      https://github.com/abhijitsarkar/java/blob/master/java8-practice/src/main/java/name/abhijitsarkar/java/service/FinanceService.java

      Delete
    3. Awesome. It would be dependent on the underlying threadpool that you are using. In my example I am submitting using `supplyAsync` method of Completeable future and passing in an explicit executor with a fixed underlying threadpool. Of course all of them would not run in parallel, that would be dependent on the number of cores that you have and how processor heavy your code is (vs say IO heavy)

      Delete