Tuesday, December 27, 2011

Debugging with Maven

For Non-Forked Process
The way to debug any project using maven is to use "mvnDebug" command line tool instead of "mvn". So, if I want to debug a tomcat based project, I would do:
mvnDebug tomcat:run

which would enable the debugging on the JVM and would wait for the debugger to connect to the vm:

By default, the JDWP port that mvnDebug listens is on 8000. 

The next step is to connect to this vm through the debugger in Eclipse - 
Go to Run->Debug Configurations->Remote Java Application and create a new "launch configuration" along these lines:



Click on debug and eclipse should connect to the maven vm and the tomcat should start up at this point. 


For Forked Process
The above approach will however not work for forked processes like JUnit tests - tests by default are forked by maven. 
There are two workarounds for debugging JUnit tests:
To prevent forking of Junit tests, this can be done using a forkMode parameter this way:
mvnDebug test -Dtest=GtdProjectDaoIntegrationTest -DforkMode=never

The second workaround is to use the "maven.surefire.debug" property:
mvn -Dmaven.surefire.debug test -Dtest=GtdProjectDaoIntegrationTest

This would, by default, start up the debugger at port 5005. A variation of this is to explicitly specify the port where the debugger is to be started and with additional JDWP options:
mvn -Dmaven.surefire.debug="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 -Xnoagent -Djava.compiler=NONE" test


References
http://maven.apache.org/plugins/maven-surefire-plugin/examples/debugging.html

Monday, December 19, 2011

Concurrency - Executors and Spring Integration

This is a follow up to a previous blog entry:

Thread Pool/Executors Based Implementation
A better approach than the raw thread version, is a Thread pool based one, where an appropriate thread pool size is defined based on the system where the task is running - Number of CPU's/(1-Blocking Coefficient of Task). Venkat Subramaniams book has more details:




First I defined a custom task to generate the Report Part, given the Report Part Request, this is implemented as a Callable:
public class ReportPartRequestCallable implements Callable<ReportPart> {
 private final ReportRequestPart reportRequestPart;
 private final ReportPartGenerator reportPartGenerator;

 public ReportPartRequestCallable(ReportRequestPart reportRequestPart, ReportPartGenerator reportPartGenerator) {
     this.reportRequestPart = reportRequestPart;
     this.reportPartGenerator = reportPartGenerator;
    }

 @Override
    public ReportPart call() {
    return this.reportPartGenerator.generateReportPart(reportRequestPart);
    } 
}

public class ExecutorsBasedReportGenerator implements ReportGenerator {
    private static final Logger logger = LoggerFactory.getLogger(ExecutorsBasedReportGenerator.class);

    private ReportPartGenerator reportPartGenerator;

    private ExecutorService executors = Executors.newFixedThreadPool(10);

    @Override
    public Report generateReport(ReportRequest reportRequest) {
        List<Callable<ReportPart>> tasks = new ArrayList<Callable<ReportPart>>();
        List<ReportRequestPart> reportRequestParts = reportRequest.getRequestParts();
        for (ReportRequestPart reportRequestPart : reportRequestParts) {
            tasks.add(new ReportPartRequestCallable(reportRequestPart, reportPartGenerator));
        }

        List<Future<ReportPart>> responseForReportPartList;
        List<ReportPart> reportParts = new ArrayList<ReportPart>();
        try {
            responseForReportPartList = executors.invokeAll(tasks);
            for (Future<ReportPart> reportPartFuture : responseForReportPartList) {
                reportParts.add(reportPartFuture.get());
            }

        } catch (Exception e) {
            logger.error(e.getMessage(), e);
            throw new RuntimeException(e);
        }
        return new Report(reportParts);
    }

 ......
}

Here a thread pool is created using the Executors.newFixedThreadPool(10) call, with a pool size of 10, a callable task is generated for each of the report request parts, and handed over to the threadpool using the ExecutorService abstraction -
responseForReportPartList = executors.invokeAll(tasks);
this call returns a List of Futures, which support a get() method which is a blocking call on the response to be available.

This is clearly a much better implementation compared to the raw thread version, the number of threads is constrained to a manageable number under load.


Spring Integration Based Implementation
The approach that I personally like the most is using Spring Integration, the reason is that with Spring Integration I focus on the components doing the different tasks and leave it upto Spring Integration to wire the flow together, using a xml based or annotation based configuration. Here I will be using a XML based configuration :

The components in my case are:
1. The component to generate the report part, given the report part request, which I had shown earlier.
2. A component to split the report request to report request parts:
public class DefaultReportRequestSplitter implements ReportRequestSplitter{
 @Override
 public List<ReportRequestPart> split(ReportRequest reportRequest) {
  return reportRequest.getRequestParts();
 }
}

3. A component to assemble/aggregate the report parts into a whole report:
public class DefaultReportAggregator implements ReportAggregator{

    @Override
    public Report aggregate(List<ReportPart> reportParts) {
        return new Report(reportParts);
    }

}

And that is all the java code that is required with Spring Integration, the rest of the is wiring - here I have used a Spring integration configuration file:
<?xml version="1.0" encoding="UTF-8"?>
<beans ....

    <int:channel id="report.partsChannel"/>
    <int:channel id="report.reportChannel"/>
    <int:channel id="report.partReportChannel">
        <int:queue capacity="50"/>
    </int:channel>  
    <int:channel id="report.joinPartsChannel"/>


 <int:splitter id="splitter" ref="reportsPartSplitter" method="split" 
        input-channel="report.partsChannel" output-channel="report.partReportChannel"/>
    
    <task:executor id="reportPartGeneratorExecutor" pool-size="10" queue-capacity="50" />
    
 <int:service-activator id="reportsPartServiceActivator"  ref="reportPartReportGenerator" method="generateReportPart" 
            input-channel="report.partReportChannel" output-channel="report.joinPartsChannel">
    <int:poller task-executor="reportPartGeneratorExecutor" fixed-delay="500">
    </int:poller>
 </int:service-activator>

    <int:aggregator ref="reportAggregator" method="aggregate" 
            input-channel="report.joinPartsChannel" output-channel="report.reportChannel" ></int:aggregator> 

    <int:gateway id="reportGeneratorGateway" service-interface="org.bk.sisample.springintegration.ReportGeneratorGateway" 
           default-request-channel="report.partsChannel" default-reply-channel="report.reportChannel"/>
    
    <bean name="reportsPartSplitter" class="org.bk.sisample.springintegration.processors.DefaultReportRequestSplitter"></bean>
    <bean name="reportPartReportGenerator" class="org.bk.sisample.processors.DummyReportPartGenerator"/>
    <bean name="reportAggregator" class="org.bk.sisample.springintegration.processors.DefaultReportAggregator"/>
    <bean name="reportGenerator" class="org.bk.sisample.springintegration.SpringIntegrationBasedReportGenerator"/>

</beans>

Spring Source Tool Suite provides a great way of visualizing this file:
this matches perfectly with my original view of the user flow:

In the Spring Integration version of the code, I have defined the different components to handle the different parts of the flow:
1. A splitter to convert a report request to report request parts:
<int:splitter id="splitter" ref="reportsPartSplitter" method="split" 
        input-channel="report.partsChannel" output-channel="report.partReportChannel"/>

2. A service activator component to generate a report part from a report part request:

<int:service-activator id="reportsPartServiceActivator"  ref="reportPartReportGenerator" method="generateReportPart" 
            input-channel="report.partReportChannel" output-channel="report.joinPartsChannel">
    <int:poller task-executor="reportPartGeneratorExecutor" fixed-delay="500">
    </int:poller>
 </int:service-activator>
3. An aggregator to join the report parts back to a report, and is intelligent enough to correlate the original split report requests appropriately without any explicit coding required for it:
<int:aggregator ref="reportAggregator" method="aggregate" 
            input-channel="report.joinPartsChannel" output-channel="report.reportChannel" ></int:aggregator> 


What is interesting in this code is that, like in the executors based sample, the number of threads that services each of these components is completely configurable using the xml file, by using appropriate channels to connect the different components together and by using task executors with the thread pool size set as attribute of the executor.

In this code, I have defined a queue channel where the report request parts come in:

<int:channel id="report.partReportChannel">
        <int:queue capacity="50"/>
    </int:channel>  


and is serviced by the service activator component, using a task executor with a thread pool of size 10, and a capacity of 50:

<task:executor id="reportPartGeneratorExecutor" pool-size="10" queue-capacity="50" />
    
 <int:service-activator id="reportsPartServiceActivator"  ref="reportPartReportGenerator" method="generateReportPart" 
            input-channel="report.partReportChannel" output-channel="report.joinPartsChannel">
    <int:poller task-executor="reportPartGeneratorExecutor" fixed-delay="500">
    </int:poller>
 </int:service-activator>


All this through configuration!


The entire codebase for this sample is available at this github location: https://github.com/bijukunjummen/si-sample

Saturday, December 17, 2011

Concurrency - Sequential and Raw Thread

I worked on a project a while back, where the report flow was along these lines:



  1. User would request for a report
  2. The report request would be translated into smaller parts/sections
  3. The report for each part, based on the type of the part/section would be generated by a report generator
  4. The constituent report parts would be reassembled into a final report and given back to the user

My objective is to show how I progressed from a bad implementation to a fairly good implementation:

Some of the basic building blocks that I have is best demonstrated by a unit test:
This is a test helper which generates a sample report request, with constituent report request parts:
public class FixtureGenerator {
    public static ReportRequest generateReportRequest(){
        List<ReportRequestPart> requestParts = new ArrayList<ReportRequestPart>();
        Map<String, String> attributes = new HashMap<String, String>();
        attributes.put("user","user");
        Context context = new Context(attributes );
    
        ReportRequestPart part1 = new ReportRequestPart(Section.HEADER, context);
        ReportRequestPart part2 = new ReportRequestPart(Section.SECTION1, context);
        ReportRequestPart part3 = new ReportRequestPart(Section.SECTION2, context);
        ReportRequestPart part4 = new ReportRequestPart(Section.SECTION3, context);
        ReportRequestPart part5 = new ReportRequestPart(Section.FOOTER, context);   
        
        requestParts.add(part1);        
        requestParts.add(part2);
        requestParts.add(part3);
        requestParts.add(part4);
        requestParts.add(part5);
        
        ReportRequest reportRequest  = new ReportRequest(requestParts );
        return reportRequest;
    }

}
And the test for the report generation:
public class FixtureGenerator {
 @Test
 public void testSequentialReportGeneratorTime(){
  long startTime = System.currentTimeMillis();
  Report report = this.reportGenerator.generateReport(FixtureGenerator.generateReportRequest());
  long timeForReport = System.currentTimeMillis()-startTime;
  assertThat(report.getSectionReports().size(), is (5));
  logger.error(String.format("Sequential Report Generator : %s ms", timeForReport));
 } 

The component which generates a part of the report is a dummy implementation with a 2 second delay to simulate a IO intensive call:
public class DummyReportPartGenerator implements ReportPartGenerator{

 @Override
 public ReportPart generateReportPart(ReportRequestPart reportRequestPart) {
  try {
   //Deliberately introduce a delay
   Thread.sleep(2000);
  } catch (InterruptedException e) {
   e.printStackTrace();
  }
  return new ReportPart(reportRequestPart.getSection(), "Report for " + reportRequestPart.getSection());
 }
}

Sequential Implementation
Given these base set of classes, my first naive sequential implementation is the following:
public class SequentialReportGenerator implements ReportGenerator {
 private ReportPartGenerator reportPartGenerator;

 @Override
 public Report generateReport(ReportRequest reportRequest){
  List<ReportRequestPart> reportRequestParts = reportRequest.getRequestParts();
  List<ReportPart> reportSections = new ArrayList<ReportPart>();
  for (ReportRequestPart reportRequestPart: reportRequestParts){
   reportSections.add(reportPartGenerator.generateReportPart(reportRequestPart));
  }
  return new Report(reportSections);
 }
 
 
......
}

Obviously, for a report request with 5 parts in it, each part taking 2 seconds to be fulfilled this report takes about 10 seconds for it to be returned back to the user.

It begs to be made concurrent.

Raw Thread Based Implementation
The first concurrent implementation, not good but better than sequential is the following, where a thread is spawned for every report request part, waiting on the reportparts to be generated(using thread.join() method), and aggregating the pieces as they come in.

public class RawThreadBasedReportGenerator implements ReportGenerator {
    private static final Logger logger = LoggerFactory.getLogger(RawThreadBasedReportGenerator.class);

    private ReportPartGenerator reportPartGenerator;

    @Override
    public Report generateReport(ReportRequest reportRequest) {
        List<ReportRequestPart> reportRequestParts = reportRequest.getRequestParts();
        List<Thread> threads = new ArrayList<Thread>();
        List<ReportPartRequestRunnable> runnablesList = new ArrayList<ReportPartRequestRunnable>();
        for (ReportRequestPart reportRequestPart : reportRequestParts) {
            ReportPartRequestRunnable reportPartRequestRunnable = new ReportPartRequestRunnable(reportRequestPart, reportPartGenerator);
            runnablesList.add(reportPartRequestRunnable);
            Thread thread = new Thread(reportPartRequestRunnable);
            threads.add(thread);
            thread.start();
        }

        for (Thread thread : threads) {
            try {
                thread.join();
            } catch (InterruptedException e) {
                logger.error(e.getMessage(), e);
            }
        }

        List<ReportPart> reportParts = new ArrayList<ReportPart>();

        for (ReportPartRequestRunnable reportPartRequestRunnable : runnablesList) {
            reportParts.add(reportPartRequestRunnable.getReportPart());
        }

        return new Report(reportParts);

    }    
    .....
}

The danger with this approach is that a new thread is being created for every report part, so in a real world scenario if a 100 simultaneous request comes in with each request spawning 5 threads, this can potentially end up creating 500 costly threads in the vm!!

So thread creation has to be constrained in some way. I will go through two more approaches where threads are controlled, in the next blog entry.

Friday, December 16, 2011

jquery maphilight

jquery maphilight is a fantastic jquery plugin to overlay an image with highlights, based on information from an imagemap.
I recently used it for one of my work projects and it worked out beautifully - highly recommended for overlay highlights.

Saturday, December 3, 2011

SimpleDateFormat and TimeZone

Recently I was stumped by a simple concept - I needed to transform a timestamp in a Europe/London timezone to a yyyyMMdd format. So I had a code along this lines to do this:
SimpleDateFormat formatter = new SimpleDateFormat("yyyyMMdd");
Calendar date = Calendar.getInstance(TimeZone.getTimeZone("Europe/London"));
date.set(Calendar.YEAR, 2011);
date.set(Calendar.MONTH, 10);
date.set(Calendar.DAY_OF_MONTH, 15);
date.set(Calendar.HOUR_OF_DAY, 3);
int aDateName = Integer.valueOf(formatter.format(date.getTime()));
System.out.println(aDateName);

I was expecting it to print 20111115 as the output.

However, the output was 20111114(when executing from US EST Timezone) - this is because I am transforming Calendar to a date using getTime() API, and as soon as I do this the timezone is set to UTC. The workaround is to somehow set the the timezone attribute at the point where it is printed back to a string, this can be done by setting the timezone attribute of SimpleDateFormat, otherwise it tends to format it based on the default timezone where the code is run -

This is what fixed the code for me:

.....
formatter.setTimeZone(TimeZone.getTimeZone("Europe/London"));
.....