com.nexwave.nquindexer
Class SaxDocFileParser

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by com.nexwave.nquindexer.SaxDocFileParser
All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler
Direct Known Subclasses:
SaxHTMLIndex

public class SaxDocFileParser
extends org.xml.sax.helpers.DefaultHandler

Generic parser for populating a DocFileInfo object.

Version:
2.0 2010-08-14
Author:
N. Quaine, Kasun Gajasinghe

Field Summary
protected  DocFileInfo fileDesc
           
protected  String projectDir
           
protected  StringBuffer strbf
           
 
Constructor Summary
SaxDocFileParser()
          Constructor
 
Method Summary
 void characters(char[] ch, int start, int length)
           
 void endElement(String uri, String localName, String qName)
           
 int init(String inputDir)
          Initializer
 void parseDocument(File file)
           
 void processingInstruction(String target, String data)
           
 String RemoveValidationPI(File file)
          Removes the validation in html files, such as xml version and DTDs
 InputSource resolveEntity(String publicId, String systemId)
           
 DocFileInfo runExtractData(File file)
          Parses the file to extract all the words for indexing and some data characterizing the file.
 void startElement(String uri, String localName, String qName, Attributes attributes)
           
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

fileDesc

protected DocFileInfo fileDesc

projectDir

protected String projectDir

strbf

protected StringBuffer strbf
Constructor Detail

SaxDocFileParser

public SaxDocFileParser()
Constructor

Method Detail

init

public int init(String inputDir)
Initializer


runExtractData

public DocFileInfo runExtractData(File file)
Parses the file to extract all the words for indexing and some data characterizing the file.

Parameters:
file - contains the fullpath of the document to parse
Returns:
a DitaFileInfo object filled with data describing the file

parseDocument

public void parseDocument(File file)

startElement

public void startElement(String uri,
                         String localName,
                         String qName,
                         Attributes attributes)
                  throws SAXException
Specified by:
startElement in interface ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.DefaultHandler
Throws:
SAXException

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws SAXException
Specified by:
characters in interface ContentHandler
Overrides:
characters in class org.xml.sax.helpers.DefaultHandler
Throws:
SAXException

endElement

public void endElement(String uri,
                       String localName,
                       String qName)
                throws SAXException
Specified by:
endElement in interface ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.DefaultHandler
Throws:
SAXException

processingInstruction

public void processingInstruction(String target,
                                  String data)
                           throws SAXException
Specified by:
processingInstruction in interface ContentHandler
Overrides:
processingInstruction in class org.xml.sax.helpers.DefaultHandler
Throws:
SAXException

resolveEntity

public InputSource resolveEntity(String publicId,
                                 String systemId)
                          throws SAXException,
                                 IOException
Specified by:
resolveEntity in interface EntityResolver
Overrides:
resolveEntity in class org.xml.sax.helpers.DefaultHandler
Throws:
SAXException
IOException

RemoveValidationPI

public String RemoveValidationPI(File file)
Removes the validation in html files, such as xml version and DTDs

Parameters:
file -
Returns:
int: returns 0 if no IOException occurs, else 1.


Copyright © 2013. All Rights Reserved.