Wednesday, April 25, 2012

Java and slow XML Parsing

If xml parsing using SAX or DOM appears slow, it is very likely because the parser is trying to download external entities at the point of parsing. A simple fix for this is to implement a custom entity resolver which provides a local reference to the external resource or a dummy one if the external resource is not very important.

For eg. I had to write a small method to validate that an xml is well-formed and so could safely ignore external references, and this is how I had the dummy EntityResolver implementation:

private static boolean isWellFormed(File xmlFile){
 try{
  SAXParserFactory factory = SAXParserFactory.newInstance();
  factory.setValidating(false);
  factory.setNamespaceAware(true);

  SAXParser parser = factory.newSAXParser();
  XMLReader reader = parser.getXMLReader();
  reader.setEntityResolver(new EntityResolver() {
   public InputSource resolveEntity(String pid, String sid) throws SAXException {
    return new InputSource(new StringReader(""));
   }
  });      
  reader.parse(new InputSource(new BufferedReader(new FileReader(xmlFile)))) ;
  return true;
 }catch(Exception e){
  return false;
 }
 return false;
}

No comments:

Post a Comment