org.ccil.cowan.tagsoup
Class Parser

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by org.ccil.cowan.tagsoup.Parser
All Implemented Interfaces:
ScanHandler, org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler, org.xml.sax.ext.LexicalHandler, org.xml.sax.XMLReader

public class Parser
extends org.xml.sax.helpers.DefaultHandler
implements ScanHandler, org.xml.sax.XMLReader, org.xml.sax.ext.LexicalHandler

The SAX parser class.


Field Summary
static java.lang.String autoDetectorProperty
          Specifies the AutoDetector (for encoding detection) this Parser uses.
static java.lang.String bogonsEmptyFeature
          A value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.
static java.lang.String CDATAElementsFeature
          A value of "true" indicates that the parser will treat CDATA elements specially.
static java.lang.String defaultAttributesFeature
          A value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.
static java.lang.String externalGeneralEntitiesFeature
          Reports whether this parser processes external general entities (it doesn't).
static java.lang.String externalParameterEntitiesFeature
          Reports whether this parser processes external parameter entities (it doesn't).
static java.lang.String ignorableWhitespaceFeature
          A value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback.
static java.lang.String ignoreBogonsFeature
          A value of "true" indicates that the parser will ignore unknown elements.
static java.lang.String isStandaloneFeature
          May be examined only during a parse, after the startDocument() callback has been completed; read-only.
static java.lang.String lexicalHandlerParameterEntitiesFeature
          A value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).
static java.lang.String lexicalHandlerProperty
          Used to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name).
static java.lang.String namespacePrefixesFeature
          A value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available.
static java.lang.String namespacesFeature
          A value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.
static java.lang.String resolveDTDURIsFeature
          A value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting.
static java.lang.String restartElementsFeature
          A value of "true" indicates that the parser will attempt to restart the restartable elements.
static java.lang.String rootBogonsFeature
          A value of "true" indicates that the parser will allow unknown elements to be the root element.
static java.lang.String scannerProperty
          Specifies the Scanner object this Parser uses.
static java.lang.String schemaProperty
          Specifies the Schema object this Parser uses.
static java.lang.String stringInterningFeature
          Has a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern.
static java.lang.String translateColonsFeature
          A value of "true" indicates that the parser will translate colons into underscores in names.
static java.lang.String unicodeNormalizationCheckingFeature
          Controls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation.
static java.lang.String useAttributes2Feature
          Returns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface.
static java.lang.String useEntityResolver2Feature
          Returns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used.
static java.lang.String useLocator2Feature
          Returns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface.
static java.lang.String validationFeature
          Controls whether the parser is reporting all validity errors (We don't report any validity errors.)
static java.lang.String XML11Feature
          Returns "true" if the parser supports both XML 1.1 and XML 1.0.
static java.lang.String xmlnsURIsFeature
          Controls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace.
 
Constructor Summary
Parser()
           
 
Method Summary
 void adup(char[] buff, int offset, int length)
          Reports an attribute name without a value.
 void aname(char[] buff, int offset, int length)
          Reports an attribute name; a value will follow.
 void aval(char[] buff, int offset, int length)
          Reports an attribute value.
 void cdsect(char[] buff, int offset, int length)
          Reports the content of a CDATA section (not a CDATA element)
 void cmnt(char[] buff, int offset, int length)
          Reports a comment.
 void comment(char[] ch, int start, int length)
           
 void decl(char[] buff, int offset, int length)
          Parsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it.
 void endCDATA()
           
 void endDTD()
           
 void endEntity(java.lang.String name)
           
 void entity(char[] buff, int offset, int length)
          Reports an entity reference or character reference.
 void eof(char[] buff, int offset, int length)
          Reports EOF.
 void etag_basic(char[] buff, int offset, int length)
           
 boolean etag_cdata(char[] buff, int offset, int length)
           
 void etag(char[] buff, int offset, int length)
          Reports an end-tag.
 org.xml.sax.ContentHandler getContentHandler()
           
 org.xml.sax.DTDHandler getDTDHandler()
           
 int getEntity()
          Returns the value of the last entity or character reference reported.
 org.xml.sax.EntityResolver getEntityResolver()
           
 org.xml.sax.ErrorHandler getErrorHandler()
           
 boolean getFeature(java.lang.String name)
           
 java.lang.Object getProperty(java.lang.String name)
           
 void gi(char[] buff, int offset, int length)
          Reports the general identifier (element type name) of a start-tag.
 void parse(org.xml.sax.InputSource input)
           
 void parse(java.lang.String systemid)
           
 void pcdata(char[] buff, int offset, int length)
          Reports character content.
 void pi(char[] buff, int offset, int length)
          Reports the data part of a processing instruction.
 void pitarget(char[] buff, int offset, int length)
          Reports the target part of a processing instruction.
 void setContentHandler(org.xml.sax.ContentHandler handler)
           
 void setDTDHandler(org.xml.sax.DTDHandler handler)
           
 void setEntityResolver(org.xml.sax.EntityResolver resolver)
           
 void setErrorHandler(org.xml.sax.ErrorHandler handler)
           
 void setFeature(java.lang.String name, boolean value)
           
 void setProperty(java.lang.String name, java.lang.Object value)
           
 void stagc(char[] buff, int offset, int length)
          Reports the close of a start-tag.
 void stage(char[] buff, int offset, int length)
          Reports the close of an empty-tag.
 void startCDATA()
           
 void startDTD(java.lang.String name, java.lang.String publicid, java.lang.String systemid)
           
 void startEntity(java.lang.String name)
           
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

namespacesFeature

public static final java.lang.String namespacesFeature
A value of "true" indicates namespace URIs and unprefixed local names for element and attribute names will be available.

See Also:
Constant Field Values

namespacePrefixesFeature

public static final java.lang.String namespacePrefixesFeature
A value of "true" indicates that XML qualified names (with prefixes) and attributes (including xmlns* attributes) will be available. We don't support this value.

See Also:
Constant Field Values

externalGeneralEntitiesFeature

public static final java.lang.String externalGeneralEntitiesFeature
Reports whether this parser processes external general entities (it doesn't).

See Also:
Constant Field Values

externalParameterEntitiesFeature

public static final java.lang.String externalParameterEntitiesFeature
Reports whether this parser processes external parameter entities (it doesn't).

See Also:
Constant Field Values

isStandaloneFeature

public static final java.lang.String isStandaloneFeature
May be examined only during a parse, after the startDocument() callback has been completed; read-only. The value is true if the document specified standalone="yes" in its XML declaration, and otherwise is false. (It's always false.)

See Also:
Constant Field Values

lexicalHandlerParameterEntitiesFeature

public static final java.lang.String lexicalHandlerParameterEntitiesFeature
A value of "true" indicates that the LexicalHandler will report the beginning and end of parameter entities (it won't).

See Also:
Constant Field Values

resolveDTDURIsFeature

public static final java.lang.String resolveDTDURIsFeature
A value of "true" indicates that system IDs in declarations will be absolutized (relative to their base URIs) before reporting. (This returns true but doesn't actually do anything.)

See Also:
Constant Field Values

stringInterningFeature

public static final java.lang.String stringInterningFeature
Has a value of "true" if all XML names (for elements, prefixes, attributes, entities, notations, and local names), as well as Namespace URIs, will have been interned using java.lang.String.intern. This supports fast testing of equality/inequality against string constants, rather than forcing slower calls to String.equals(). (We always intern.)

See Also:
Constant Field Values

useAttributes2Feature

public static final java.lang.String useAttributes2Feature
Returns "true" if the Attributes objects passed by this parser in ContentHandler.startElement() implement the org.xml.sax.ext.Attributes2 interface. (They don't.)

See Also:
Constant Field Values

useLocator2Feature

public static final java.lang.String useLocator2Feature
Returns "true" if the Locator objects passed by this parser in ContentHandler.setDocumentLocator() implement the org.xml.sax.ext.Locator2 interface. (They don't.)

See Also:
Constant Field Values

useEntityResolver2Feature

public static final java.lang.String useEntityResolver2Feature
Returns "true" if, when setEntityResolver is given an object implementing the org.xml.sax.ext.EntityResolver2 interface, those new methods will be used. (They won't be.)

See Also:
Constant Field Values

validationFeature

public static final java.lang.String validationFeature
Controls whether the parser is reporting all validity errors (We don't report any validity errors.)

See Also:
Constant Field Values

unicodeNormalizationCheckingFeature

public static final java.lang.String unicodeNormalizationCheckingFeature
Controls whether the parser reports Unicode normalization errors as described in section 2.13 and Appendix B of the XML 1.1 Recommendation. (We don't normalize.)

See Also:
Constant Field Values

xmlnsURIsFeature

public static final java.lang.String xmlnsURIsFeature
Controls whether, when the namespace-prefixes feature is set, the parser treats namespace declaration attributes as being in the http://www.w3.org/2000/xmlns/ namespace. (It doesn't.)

See Also:
Constant Field Values

XML11Feature

public static final java.lang.String XML11Feature
Returns "true" if the parser supports both XML 1.1 and XML 1.0. (Always false.)

See Also:
Constant Field Values

ignoreBogonsFeature

public static final java.lang.String ignoreBogonsFeature
A value of "true" indicates that the parser will ignore unknown elements.

See Also:
Constant Field Values

bogonsEmptyFeature

public static final java.lang.String bogonsEmptyFeature
A value of "true" indicates that the parser will give unknown elements a content model of EMPTY; a value of "false", a content model of ANY.

See Also:
Constant Field Values

rootBogonsFeature

public static final java.lang.String rootBogonsFeature
A value of "true" indicates that the parser will allow unknown elements to be the root element.

See Also:
Constant Field Values

defaultAttributesFeature

public static final java.lang.String defaultAttributesFeature
A value of "true" indicates that the parser will return default attribute values for missing attributes that have default values.

See Also:
Constant Field Values

translateColonsFeature

public static final java.lang.String translateColonsFeature
A value of "true" indicates that the parser will translate colons into underscores in names.

See Also:
Constant Field Values

restartElementsFeature

public static final java.lang.String restartElementsFeature
A value of "true" indicates that the parser will attempt to restart the restartable elements.

See Also:
Constant Field Values

ignorableWhitespaceFeature

public static final java.lang.String ignorableWhitespaceFeature
A value of "true" indicates that the parser will transmit whitespace in element-only content via the SAX ignorableWhitespace callback. Normally this is not done, because HTML is an SGML application and SGML suppresses such whitespace.

See Also:
Constant Field Values

CDATAElementsFeature

public static final java.lang.String CDATAElementsFeature
A value of "true" indicates that the parser will treat CDATA elements specially. Normally true, since the input is by default HTML.

See Also:
Constant Field Values

lexicalHandlerProperty

public static final java.lang.String lexicalHandlerProperty
Used to see some syntax events that are essential in some applications: comments, CDATA delimiters, selected general entity inclusions, and the start and end of the DTD (and declaration of document element name). The Object must implement org.xml.sax.ext.LexicalHandler.

See Also:
Constant Field Values

scannerProperty

public static final java.lang.String scannerProperty
Specifies the Scanner object this Parser uses.

See Also:
Constant Field Values

schemaProperty

public static final java.lang.String schemaProperty
Specifies the Schema object this Parser uses.

See Also:
Constant Field Values

autoDetectorProperty

public static final java.lang.String autoDetectorProperty
Specifies the AutoDetector (for encoding detection) this Parser uses.

See Also:
Constant Field Values
Constructor Detail

Parser

public Parser()
Method Detail

getFeature

public boolean getFeature(java.lang.String name)
                   throws org.xml.sax.SAXNotRecognizedException,
                          org.xml.sax.SAXNotSupportedException
Specified by:
getFeature in interface org.xml.sax.XMLReader
Throws:
org.xml.sax.SAXNotRecognizedException
org.xml.sax.SAXNotSupportedException

setFeature

public void setFeature(java.lang.String name,
                       boolean value)
                throws org.xml.sax.SAXNotRecognizedException,
                       org.xml.sax.SAXNotSupportedException
Specified by:
setFeature in interface org.xml.sax.XMLReader
Throws:
org.xml.sax.SAXNotRecognizedException
org.xml.sax.SAXNotSupportedException

getProperty

public java.lang.Object getProperty(java.lang.String name)
                             throws org.xml.sax.SAXNotRecognizedException,
                                    org.xml.sax.SAXNotSupportedException
Specified by:
getProperty in interface org.xml.sax.XMLReader
Throws:
org.xml.sax.SAXNotRecognizedException
org.xml.sax.SAXNotSupportedException

setProperty

public void setProperty(java.lang.String name,
                        java.lang.Object value)
                 throws org.xml.sax.SAXNotRecognizedException,
                        org.xml.sax.SAXNotSupportedException
Specified by:
setProperty in interface org.xml.sax.XMLReader
Throws:
org.xml.sax.SAXNotRecognizedException
org.xml.sax.SAXNotSupportedException

setEntityResolver

public void setEntityResolver(org.xml.sax.EntityResolver resolver)
Specified by:
setEntityResolver in interface org.xml.sax.XMLReader

getEntityResolver

public org.xml.sax.EntityResolver getEntityResolver()
Specified by:
getEntityResolver in interface org.xml.sax.XMLReader

setDTDHandler

public void setDTDHandler(org.xml.sax.DTDHandler handler)
Specified by:
setDTDHandler in interface org.xml.sax.XMLReader

getDTDHandler

public org.xml.sax.DTDHandler getDTDHandler()
Specified by:
getDTDHandler in interface org.xml.sax.XMLReader

setContentHandler

public void setContentHandler(org.xml.sax.ContentHandler handler)
Specified by:
setContentHandler in interface org.xml.sax.XMLReader

getContentHandler

public org.xml.sax.ContentHandler getContentHandler()
Specified by:
getContentHandler in interface org.xml.sax.XMLReader

setErrorHandler

public void setErrorHandler(org.xml.sax.ErrorHandler handler)
Specified by:
setErrorHandler in interface org.xml.sax.XMLReader

getErrorHandler

public org.xml.sax.ErrorHandler getErrorHandler()
Specified by:
getErrorHandler in interface org.xml.sax.XMLReader

parse

public void parse(org.xml.sax.InputSource input)
           throws java.io.IOException,
                  org.xml.sax.SAXException
Specified by:
parse in interface org.xml.sax.XMLReader
Throws:
java.io.IOException
org.xml.sax.SAXException

parse

public void parse(java.lang.String systemid)
           throws java.io.IOException,
                  org.xml.sax.SAXException
Specified by:
parse in interface org.xml.sax.XMLReader
Throws:
java.io.IOException
org.xml.sax.SAXException

adup

public void adup(char[] buff,
                 int offset,
                 int length)
          throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports an attribute name without a value.

Specified by:
adup in interface ScanHandler
Throws:
org.xml.sax.SAXException

aname

public void aname(char[] buff,
                  int offset,
                  int length)
           throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports an attribute name; a value will follow.

Specified by:
aname in interface ScanHandler
Throws:
org.xml.sax.SAXException

aval

public void aval(char[] buff,
                 int offset,
                 int length)
          throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports an attribute value.

Specified by:
aval in interface ScanHandler
Throws:
org.xml.sax.SAXException

entity

public void entity(char[] buff,
                   int offset,
                   int length)
            throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports an entity reference or character reference.

Specified by:
entity in interface ScanHandler
Throws:
org.xml.sax.SAXException

eof

public void eof(char[] buff,
                int offset,
                int length)
         throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports EOF.

Specified by:
eof in interface ScanHandler
Throws:
org.xml.sax.SAXException

etag

public void etag(char[] buff,
                 int offset,
                 int length)
          throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports an end-tag.

Specified by:
etag in interface ScanHandler
Throws:
org.xml.sax.SAXException

etag_cdata

public boolean etag_cdata(char[] buff,
                          int offset,
                          int length)
                   throws org.xml.sax.SAXException
Throws:
org.xml.sax.SAXException

etag_basic

public void etag_basic(char[] buff,
                       int offset,
                       int length)
                throws org.xml.sax.SAXException
Throws:
org.xml.sax.SAXException

decl

public void decl(char[] buff,
                 int offset,
                 int length)
          throws org.xml.sax.SAXException
Parsing the complete XML Document Type Definition is way too complex, but for many simple cases we can extract something useful from it. doctypedecl ::= '' DeclSep ::= PEReference | S intSubset ::= (markupdecl | DeclSep)* markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral

Specified by:
decl in interface ScanHandler
Throws:
org.xml.sax.SAXException

gi

public void gi(char[] buff,
               int offset,
               int length)
        throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports the general identifier (element type name) of a start-tag.

Specified by:
gi in interface ScanHandler
Throws:
org.xml.sax.SAXException

cdsect

public void cdsect(char[] buff,
                   int offset,
                   int length)
            throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports the content of a CDATA section (not a CDATA element)

Specified by:
cdsect in interface ScanHandler
Throws:
org.xml.sax.SAXException

pcdata

public void pcdata(char[] buff,
                   int offset,
                   int length)
            throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports character content.

Specified by:
pcdata in interface ScanHandler
Throws:
org.xml.sax.SAXException

pitarget

public void pitarget(char[] buff,
                     int offset,
                     int length)
              throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports the target part of a processing instruction.

Specified by:
pitarget in interface ScanHandler
Throws:
org.xml.sax.SAXException

pi

public void pi(char[] buff,
               int offset,
               int length)
        throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports the data part of a processing instruction.

Specified by:
pi in interface ScanHandler
Throws:
org.xml.sax.SAXException

stagc

public void stagc(char[] buff,
                  int offset,
                  int length)
           throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports the close of a start-tag.

Specified by:
stagc in interface ScanHandler
Throws:
org.xml.sax.SAXException

stage

public void stage(char[] buff,
                  int offset,
                  int length)
           throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports the close of an empty-tag.

Specified by:
stage in interface ScanHandler
Throws:
org.xml.sax.SAXException

cmnt

public void cmnt(char[] buff,
                 int offset,
                 int length)
          throws org.xml.sax.SAXException
Description copied from interface: ScanHandler
Reports a comment.

Specified by:
cmnt in interface ScanHandler
Throws:
org.xml.sax.SAXException

getEntity

public int getEntity()
Description copied from interface: ScanHandler
Returns the value of the last entity or character reference reported.

Specified by:
getEntity in interface ScanHandler

comment

public void comment(char[] ch,
                    int start,
                    int length)
             throws org.xml.sax.SAXException
Specified by:
comment in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

endCDATA

public void endCDATA()
              throws org.xml.sax.SAXException
Specified by:
endCDATA in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

endDTD

public void endDTD()
            throws org.xml.sax.SAXException
Specified by:
endDTD in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

endEntity

public void endEntity(java.lang.String name)
               throws org.xml.sax.SAXException
Specified by:
endEntity in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

startCDATA

public void startCDATA()
                throws org.xml.sax.SAXException
Specified by:
startCDATA in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

startDTD

public void startDTD(java.lang.String name,
                     java.lang.String publicid,
                     java.lang.String systemid)
              throws org.xml.sax.SAXException
Specified by:
startDTD in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

startEntity

public void startEntity(java.lang.String name)
                 throws org.xml.sax.SAXException
Specified by:
startEntity in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException


Licence: Academic Free License 3.0 and/or GPL 2.0