nu.validator.htmlparser.impl
Class Tokenizer

java.lang.Object
  extended by nu.validator.htmlparser.impl.Tokenizer
All Implemented Interfaces:
org.xml.sax.Locator

public final class Tokenizer
extends java.lang.Object
implements org.xml.sax.Locator

An implementatition of http://www.whatwg.org/specs/web-apps/current-work/multipage/section-tokenisation.html This class implements the Locator interface. This is not an incidental implementation detail: Users of this class are encouraged to make use of the Locator nature. By default, the tokenizer may report data that XML 1.0 bans. The tokenizer can be configured to treat these conditions as fatal or to coerce the infoset to something that XML 1.0 allows.

Version:
$Id: Tokenizer.java 210 2008-03-20 22:08:54Z hsivonen $
Author:
hsivonen

Constructor Summary
Tokenizer(TokenHandler tokenHandler)
          The constuctor.
 
Method Summary
 void addCharacterHandler(CharacterHandler characterHandler)
           
 int getColumnNumber()
           
 XmlViolationPolicy getCommentPolicy()
          Returns the commentPolicy.
 XmlViolationPolicy getContentNonXmlCharPolicy()
          Returns the contentNonXmlCharPolicy.
 XmlViolationPolicy getContentSpacePolicy()
          Returns the contentSpacePolicy.
 int getLineNumber()
           
 java.lang.String getPublicId()
           
 java.lang.String getSystemId()
           
 boolean isAllowRewinding()
          Returns the allowRewinding.
 boolean isCheckingNormalization()
          Query if checking normalization.
 boolean isMappingLangToXmlLang()
          Returns the mappingLangToXmlLang.
 void setAllowRewinding(boolean allowRewinding)
          Sets the allowRewinding.
 void setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
          Sets the bogusXmlnsPolicy.
 void setCheckingNormalization(boolean enable)
          Turns NFC checking on or off.
 void setCommentPolicy(XmlViolationPolicy commentPolicy)
          Sets the commentPolicy.
 void setContentModelFlag(ContentModelFlag contentModelFlag, java.lang.String contentModelElement)
          Sets the content model flag and the associated element name.
 void setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
          Sets the contentNonXmlCharPolicy.
 void setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
          Sets the contentSpacePolicy.
 void setErrorHandler(org.xml.sax.ErrorHandler eh)
          Sets the error handler.
 void setHeuristics(Heuristics heuristics)
          Sets the encoding sniffing heuristics.
 void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
          Sets the html4ModeCompatibleWithXhtml1Schemata.
 void setMappingLangToXmlLang(boolean mappingLangToXmlLang)
          Sets the mappingLangToXmlLang.
 void setNamePolicy(XmlViolationPolicy namePolicy)
           
 void setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
          Sets the xmlnsPolicy.
 void tokenize(org.xml.sax.InputSource is)
          Runs the tokenization.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Tokenizer

public Tokenizer(TokenHandler tokenHandler)
The constuctor.

Parameters:
tokenHandler - the handler for receiving tokens
Method Detail

isAllowRewinding

public boolean isAllowRewinding()
Returns the allowRewinding.

Returns:
the allowRewinding

setAllowRewinding

public void setAllowRewinding(boolean allowRewinding)
Sets the allowRewinding.

Parameters:
allowRewinding - the allowRewinding to set

setCheckingNormalization

public void setCheckingNormalization(boolean enable)
Turns NFC checking on or off.

Parameters:
enable - true if checking on

addCharacterHandler

public void addCharacterHandler(CharacterHandler characterHandler)

isCheckingNormalization

public boolean isCheckingNormalization()
Query if checking normalization.

Returns:
true if checking on

setErrorHandler

public void setErrorHandler(org.xml.sax.ErrorHandler eh)
Sets the error handler.

See Also:
XMLReader.setErrorHandler(org.xml.sax.ErrorHandler)

getCommentPolicy

public XmlViolationPolicy getCommentPolicy()
Returns the commentPolicy.

Returns:
the commentPolicy

setCommentPolicy

public void setCommentPolicy(XmlViolationPolicy commentPolicy)
Sets the commentPolicy.

Parameters:
commentPolicy - the commentPolicy to set

getContentNonXmlCharPolicy

public XmlViolationPolicy getContentNonXmlCharPolicy()
Returns the contentNonXmlCharPolicy.

Returns:
the contentNonXmlCharPolicy

setContentNonXmlCharPolicy

public void setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
Sets the contentNonXmlCharPolicy.

Parameters:
contentNonXmlCharPolicy - the contentNonXmlCharPolicy to set

getContentSpacePolicy

public XmlViolationPolicy getContentSpacePolicy()
Returns the contentSpacePolicy.

Returns:
the contentSpacePolicy

setContentSpacePolicy

public void setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
Sets the contentSpacePolicy.

Parameters:
contentSpacePolicy - the contentSpacePolicy to set

setXmlnsPolicy

public void setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
Sets the xmlnsPolicy.

Parameters:
xmlnsPolicy - the xmlnsPolicy to set

setNamePolicy

public void setNamePolicy(XmlViolationPolicy namePolicy)

setBogusXmlnsPolicy

public void setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
Sets the bogusXmlnsPolicy.

Parameters:
bogusXmlnsPolicy - the bogusXmlnsPolicy to set

setHtml4ModeCompatibleWithXhtml1Schemata

public void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
Sets the html4ModeCompatibleWithXhtml1Schemata.

Parameters:
html4ModeCompatibleWithXhtml1Schemata - the html4ModeCompatibleWithXhtml1Schemata to set

tokenize

public void tokenize(org.xml.sax.InputSource is)
              throws org.xml.sax.SAXException,
                     java.io.IOException
Runs the tokenization. This is the main entry point.

Parameters:
is - the input source
Throws:
org.xml.sax.SAXException - on fatal error (if configured to treat XML violations as fatal) or if the token handler threw
java.io.IOException - if the stream threw

setContentModelFlag

public void setContentModelFlag(ContentModelFlag contentModelFlag,
                                java.lang.String contentModelElement)
Sets the content model flag and the associated element name.

Parameters:
contentModelFlag - the flag
contentModelElement - the element causing the flag to be set

getPublicId

public java.lang.String getPublicId()
Specified by:
getPublicId in interface org.xml.sax.Locator
See Also:
Locator.getPublicId()

getSystemId

public java.lang.String getSystemId()
Specified by:
getSystemId in interface org.xml.sax.Locator
See Also:
Locator.getSystemId()

getLineNumber

public int getLineNumber()
Specified by:
getLineNumber in interface org.xml.sax.Locator
See Also:
Locator.getLineNumber()

getColumnNumber

public int getColumnNumber()
Specified by:
getColumnNumber in interface org.xml.sax.Locator
See Also:
Locator.getColumnNumber()

isMappingLangToXmlLang

public boolean isMappingLangToXmlLang()
Returns the mappingLangToXmlLang.

Returns:
the mappingLangToXmlLang

setMappingLangToXmlLang

public void setMappingLangToXmlLang(boolean mappingLangToXmlLang)
Sets the mappingLangToXmlLang.

Parameters:
mappingLangToXmlLang - the mappingLangToXmlLang to set

setHeuristics

public void setHeuristics(Heuristics heuristics)
Sets the encoding sniffing heuristics.

Parameters:
heuristics - the heuristics to set


Copyright © 2011. All Rights Reserved.