com.nexwave.nquindexer
Class SaxDocFileParser
java.lang.Object
org.xml.sax.helpers.DefaultHandler
com.nexwave.nquindexer.SaxDocFileParser
- All Implemented Interfaces:
- ContentHandler, DTDHandler, EntityResolver, ErrorHandler
- Direct Known Subclasses:
- SaxHTMLIndex
public class SaxDocFileParser
- extends org.xml.sax.helpers.DefaultHandler
Generic parser for populating a DocFileInfo object.
- Version:
- 2.0 2010-08-14
- Author:
- N. Quaine, Kasun Gajasinghe
Method Summary |
void |
characters(char[] ch,
int start,
int length)
|
void |
endElement(String uri,
String localName,
String qName)
|
int |
init(String inputDir)
Initializer |
void |
parseDocument(File file)
|
void |
processingInstruction(String target,
String data)
|
String |
RemoveValidationPI(File file)
Removes the validation in html files, such as xml version and DTDs |
InputSource |
resolveEntity(String publicId,
String systemId)
|
DocFileInfo |
runExtractData(File file)
Parses the file to extract all the words for indexing and
some data characterizing the file. |
void |
startElement(String uri,
String localName,
String qName,
Attributes attributes)
|
Methods inherited from class org.xml.sax.helpers.DefaultHandler |
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
fileDesc
protected DocFileInfo fileDesc
projectDir
protected String projectDir
strbf
protected StringBuffer strbf
SaxDocFileParser
public SaxDocFileParser()
- Constructor
init
public int init(String inputDir)
- Initializer
runExtractData
public DocFileInfo runExtractData(File file)
- Parses the file to extract all the words for indexing and
some data characterizing the file.
- Parameters:
file
- contains the fullpath of the document to parse
- Returns:
- a DitaFileInfo object filled with data describing the file
parseDocument
public void parseDocument(File file)
startElement
public void startElement(String uri,
String localName,
String qName,
Attributes attributes)
throws SAXException
- Specified by:
startElement
in interface ContentHandler
- Overrides:
startElement
in class org.xml.sax.helpers.DefaultHandler
- Throws:
SAXException
characters
public void characters(char[] ch,
int start,
int length)
throws SAXException
- Specified by:
characters
in interface ContentHandler
- Overrides:
characters
in class org.xml.sax.helpers.DefaultHandler
- Throws:
SAXException
endElement
public void endElement(String uri,
String localName,
String qName)
throws SAXException
- Specified by:
endElement
in interface ContentHandler
- Overrides:
endElement
in class org.xml.sax.helpers.DefaultHandler
- Throws:
SAXException
processingInstruction
public void processingInstruction(String target,
String data)
throws SAXException
- Specified by:
processingInstruction
in interface ContentHandler
- Overrides:
processingInstruction
in class org.xml.sax.helpers.DefaultHandler
- Throws:
SAXException
resolveEntity
public InputSource resolveEntity(String publicId,
String systemId)
throws SAXException,
IOException
- Specified by:
resolveEntity
in interface EntityResolver
- Overrides:
resolveEntity
in class org.xml.sax.helpers.DefaultHandler
- Throws:
SAXException
IOException
RemoveValidationPI
public String RemoveValidationPI(File file)
- Removes the validation in html files, such as xml version and DTDs
- Parameters:
file
-
- Returns:
- int: returns 0 if no IOException occurs, else 1.
Copyright © 2013. All Rights Reserved.