Prev Class | Next Class | Frames | No Frames |
Summary: Nested | Field | Method | Constr | Detail: Nested | Field | Method | Constr |
java.lang.Object
gnu.javax.swing.text.html.parser.support.low.Constants
gnu.javax.swing.text.html.parser.support.low.ReaderTokenizer
gnu.javax.swing.text.html.parser.support.Parser
The parser reads an HTML content from a Reader and calls various notifying methods (which should be overridden in a subclass) when tags or data are encountered.
Some HTML elements need no opening or closing tags. The task of this parser is to invoke the tag handling methods also when the tags are not explicitly specified and must be supposed using information, stored in the DTD. For example, parsing the document
<table><tr><td>a<td>b<td>c</tr>
will invoke exactly the handling methods exactly in the same order
(and with the same parameters) as if parsing the document:
<html><head></head><body><table><
tbody><tr><td>a</td><td>b
</td><td>c</td></tr><
/tbody></table></body></html>
(supposed tags are given in italics). The parser also supports
obsolete elements of HTML syntax.
Field Summary | |
protected DTD |
|
Token |
|
protected int |
|
protected boolean |
|
Fields inherited from class gnu.javax.swing.text.html.parser.support.low.ReaderTokenizer | |
advanced , backupMode |
Fields inherited from class gnu.javax.swing.text.html.parser.support.low.Constants | |
AP , BEGIN , COMMENT_END , COMMENT_OPEN , COMMENT_TRIPLEDASH_END , DOUBLE_DASH , END , ENTITY , ENTITY_NAMED , ENTITY_NUMERIC , EOF , EQ , EXCLAMATION , NUMTOKEN , OTHER , QUOT , SCRIPT , SCRIPT_CLOSE , SCRIPT_OPEN , SGML , SLASH , STYLE , STYLE_CLOSE , STYLE_OPEN , TAG , TAG_CLOSE , WS , bDIGIT , bLETTER , bLINEBREAK , bNAME , bQUOTING , bSINGLE_CHAR_TOKEN , bSPECIAL , bWHITESPACE |
Fields inherited from interface javax.swing.text.html.parser.DTDConstants | |
ANY , CDATA , CONREF , CURRENT , DEFAULT , EMPTY , ENDTAG , ENTITIES , ENTITY , FIXED , GENERAL , ID , IDREF , IDREFS , IMPLIED , MD , MODEL , MS , NAME , NAMES , NMTOKEN , NMTOKENS , NOTATION , NUMBER , NUMBERS , NUTOKEN , NUTOKENS , PARAMETER , PI , PUBLIC , RCDATA , REQUIRED , SDATA , STARTTAG , SYSTEM |
Method Summary | |
protected void |
|
protected void |
|
protected void |
|
protected void |
|
protected void |
|
protected void |
|
protected void |
|
protected void | |
protected void | |
protected void |
|
void | |
void | |
void | |
void | |
void | |
void | |
SimpleAttributeSet |
|
protected int |
|
protected void |
|
protected void |
|
protected void |
|
protected void |
|
protected void |
|
protected void |
|
protected void |
|
protected void |
|
protected TagElement | |
protected TagElement | |
protected void |
|
protected Token |
|
protected void |
|
protected Token |
|
void | |
String |
|
protected void |
|
boolean |
|
protected void |
|
protected String |
|
protected char |
|
protected void |
|
protected void |
|
Methods inherited from class gnu.javax.swing.text.html.parser.support.low.ReaderTokenizer | |
error , getEndOfLineSequence , getNextToken , getTokenAhead , getTokenAhead , mark , reset , reset |
Methods inherited from class gnu.javax.swing.text.html.parser.support.low.Constants | |
endMatches |
Methods inherited from class java.lang.Object | |
clone , equals , extends Object> getClass , finalize , hashCode , notify , notifyAll , toString , wait , wait , wait |
protected boolean strict
The value of this field determines whether or not the Parser will be strict in enforcing SGML compatibility. The default value is false, stating that the parser should do everything to parse and get at least some information even from the incorrectly written HTML input.
public Parser(DTD a_dtd)
Creates a new Parser that uses the givenDTD
. The only standard way to get an instance of DTD is to construct it manually, filling in all required fields.
- Parameters:
a_dtd
- The DTD to use. The parser behaviour after passing null as an argument is not documented and may vary between implementations.
protected void CDATA(boolean clearBuffer) throws ParseException
Read parseable character data, add to buffer.
- Parameters:
clearBuffer
- If true, buffer if filled by CDATA section, otherwise the section is appended to the existing content of the buffer.
- Throws:
ParseException
-
protected void Comment() throws ParseException
Process Comment. This method skips till --> without taking SGML constructs into consideration. The supported SGML constructs are handled separately.
protected void Script() throws ParseException
Read a script. The text, returned without any changes, is terminated only by the closing tag SCRIPT.
protected void Style() throws ParseException
Read a style definition. The text, returned without any changes, is terminated only by the closing tag STYLE.
protected void _handleText()
A hook, for operations, preceeding call to handleText. Handle text in a string buffer. In non - preformatted mode, all line breaks immediately following the start tag and immediately before an end tag is discarded, \r, \n and \t are replaced by spaces, multiple space are replaced by the single one and the result is moved into array, passing it to handleText().
protected final void append(Token t)
Add the image of this token to the buffer.
- Parameters:
t
- A token to append.
protected final void consume(pattern p)
Consume pattern that must match.
- Parameters:
p
- A pattern to consume.
protected void endTag(boolean omitted)
The method is called when the HTML end (closing) tag is found or if the parser concludes that the one should be present in the current position. The method is called immediatly before calling the handleEndTag().
- Parameters:
omitted
- True if the tag is no actually present in the document, but is supposed by the parser (like </html> at the end of the document).
public void error(String msg)
Invokes the error handler. The default method in this implementation delegates the call to handleError, also providing the current line.
public void error(String msg, Token atToken)
Invokes the error handler.
- Overrides:
- error in interface ReaderTokenizer
public void error(String msg, String invalid)
Invokes the error handler. The default method in this implementation delegates the call to error (parm1+": '"+parm2+"'").
public void error(String parm1, String parm2, String parm3)
Invokes the error handler. The default method in this implementation delegates the call to error (parm1+" "+ parm2+" "+ parm3).
public void error(String parm1, String parm2, String parm3, String parm4)
Invokes the error handler. The default method in this implementation delegates the call to error (parm1+" "+ parm2+" "+ parm3+" "+ parm4).
public SimpleAttributeSet getAttributes()
Get the attributes of the current tag.
- Returns:
- The attribute set, representing the attributes of the current tag.
protected void handleComment(char[] comment)
Handle HTML comment. The default method returns without action.
- Parameters:
comment
-
protected void handleEOFInComment()
This is additionally called in when the HTML content terminates without closing the HTML comment. This can only happen if the HTML document contains errors (for example, the closing --;gt is missing.
protected void handleEmptyTag(TagElement tag) throws ChangedCharSetException
Handle the tag with no content, like <br>. The method is called for the elements that, in accordance with the current DTD, has an empty content.
- Parameters:
tag
- The tag being handled.
- Throws:
ChangedCharSetException
-
protected void handleEndTag(TagElement tag)
The method is called when the HTML closing tag ((like </table>) is found or if the parser concludes that the one should be present in the current position.
- Parameters:
tag
- The tag
protected void handleStartTag(TagElement tag)
The method is called when the HTML opening tag ((like <table>) is found or if the parser concludes that the one should be present in the current position.
- Parameters:
tag
- The tag
protected void handleText(char[] text)
Handle the text section.For non-preformatted section, the parser replaces \t, \r and \n by spaces and then multiple spaces by a single space. Additionaly, all whitespace around tags is discarded.
For pre-formatted text (inside TEXAREA and PRE), the parser preserves all tabs and spaces, but removes one bounding \r, \n or \r\n, if it is present. Additionally, it replaces each occurence of \r or \r\n by a single \n.
- Parameters:
text
- A section text.
protected void handleTitle(char[] title)
Handle HTML <title> tag. This method is invoked when both title starting and closing tags are already behind. The passed argument contains the concatenation of all title text sections.
- Parameters:
title
- The title text.
protected TagElement makeTag(Element element)
Constructs the tag from the given element. In this implementation, this is defined, but never called.
- Returns:
- the tag
protected TagElement makeTag(Element element, boolean isSupposed)
Constructs the tag from the given element.
- Parameters:
isSupposed
- true if the tag is not actually present in the html input, but the parser supposes that it should to occur in the current location.
- Returns:
- the tag
protected void markFirstTime(Element element)
This is called when the tag, representing the given element, occurs first time in the document.
- Parameters:
element
-
protected Token mustBe(int kind)
Consume the token that was checked before and hence MUST be present.
- Parameters:
kind
- The kind of token to consume.
protected void noValueAttribute(String element, String attribute)
Handle attribute without value. The default method uses the only allowed attribute value from DTD. If the attribute is unknown or allows several values, the HTML.NULL_ATTRIBUTE_VALUE is used. The attribute with this value is added to the attribute set.
- Parameters:
element
- The name of element.attribute
- The name of attribute without value.
protected Token optional(int kind)
Consume the optional token, if present.
- Parameters:
kind
- The kind of token to consume.
public void parse(Reader reader) throws IOException
Parse the HTML text, calling various methods in response to the occurence of the corresponding HTML constructions.
- Parameters:
reader
- The reader to read the source HTML from.
- Throws:
IOException
- If the reader throws one.
public String parseDTDMarkup() throws IOException
Parses DTD markup declaration. Currently returns null without action.
- Returns:
- null.
- Throws:
IOException
-
public boolean parseMarkupDeclarations(StringBuffer strBuff) throws IOException
Parse SGML insertion ( <! ... > ). When the the SGML insertion is found, this method is called, passing SGML in the string buffer as a parameter. The default method returns false without action and can be overridden to implement user - defined SGML support.If you need more information about SGML insertions in HTML documents, the author suggests to read SGML tutorial on
http://www.w3.org/TR/WD-html40-970708/intro/sgmltut.html
. We also recommend Goldfarb C.F (1991) The SGML Handbook, Oxford University Press, 688 p, ISBN: 0198537379.
- Parameters:
strBuff
-
- Returns:
- true if this is a valid DTD markup declaration.
- Throws:
IOException
-
protected void readAttributes(String element)
Read the element attributes, adding them into attribute set.
- Parameters:
element
- The element name (needed to access attribute information in dtd).
protected String resolveNamedEntity(String a_tag)
Return string, corresponding the given named entity. The name is passed with the preceeding &, but without the ending semicolon.
protected char resolveNumericEntity(String a_tag)
Return char, corresponding the given numeric entity. The name is passed with the preceeding &#, but without the ending semicolon.
protected void restart()
Reset all fields into the intial default state, preparing the parset for parsing the next document.
protected void startTag(TagElement tag) throws ChangedCharSetException
The method is called when the HTML opening tag ((like <table>) is found or if the parser concludes that the one should be present in the current position. The method is called immediately before calling the handleStartTag.
- Parameters:
tag
- The tag