My first proposal on SoC project "Add support for the StAX (JSR-173) cursor API 2007-03-31 - By wei duan
Hello,Everyone,
I'm a student applying for SoC <http://code.google.com/soc/>projectAdd support for the StAX(JSR-173) cursor API to Xerces-J<http://wiki.apache.org/general/SummerOfCode2007>. Michael suggested I could discuss my proposal in the mailing list. So I would like to introduce my thoughts and plan on this student project, any comments are welcomed. : )
The abstract description of project is: "To design and implement the cursor-based XMLStreamReader<http://java.sun.com/javase/6/docs/api/javax/xml /stream/XMLStreamReader.html>(and [image: [WWW]] filtering<http://java.sun.com/javase/6/docs/api/javax/xml/stream /StreamFilter.html>support). It should be possible to accomplish this using XNI <http://xerces.apache.org/xerces2-j/xni.html> by building the XMLStreamReader<http://java.sun.com/javase/6/docs/api/javax/xml/stream /XMLStreamReader.html>on top of anXMLPullParserConfiguration<http://xerces.apache.org/xerces2-j/javadocs /xni/org/apache/xerces/xni/parser/XMLPullParserConfiguration.html>."
Besides XNI, there are several ways to implement StAX interface. For example, parse the XML document as raw text and start from scratch, including parsing characters, building token, and interpreting tokens, and so on. Or to implement a converter from existing DOM or SAX interfaces. However, after reading Xerces sources code, I found both SAX and DOM implementations are based on XNI, so it's very natural to build StAX on XNI.
To implement XMLStreamReader, two important preconditions should be confirmed.
1. XML event information can be received.
2. The pull style parsing process can be simulated.
When I look through the XNI interfaces, I found it actually meets these two preconditions. The handler interfaces in XNI such as XMLDocumentHandler and XMLDTDHandler can get XML events including startDocument and endDocument, which can be easily mapped to StAX events accordingly. XMLPullParserConfiguration<http://xerces.apache.org/xerces2-j/javadocs/xni/org /apache/xerces/xni/parser/XMLPullParserConfiguration.html>interface in XNI is used to represent a parser configuration that can be used as the configuration for a "pull" parser, thus the pull parsing process of StAX can be simulated by calling "boolean parse(boolean)" method in XMLPullParserConfiguration<http://xerces.apache.org/xerces2-j/javadocs/xni/org /apache/xerces/xni/parser/XMLPullParserConfiguration.html> .
Then I looked through the current Xerces Implementation, I found AbstractXMLDocumentParser class implements XMLDocumentHandler, XMLDTDHandler, and XMLDTDContentModelHandler interfaces. Both AbstractDOMParser and AbstractSAXParser extend from AbstractXMLDocumentParser. So I think I can implement an AbstractStAXParser extending AbstractXMLDocumentParser to get XML events.
For example, code in current AbstractSAXParser:
* public void comment(XMLString text, Augmentations augs) throws XNIException { *
* try {*
* // SAX2 extension*
* if (fLexicalHandler != null) {*
* fLexicalHandler.comment(text.ch, 0, text.length);*
* }*
* }*
* catch (SAXException e) {*
* throw new XNIException(e);*
* }*
* } // comment(XMLString)*
And in my AbstractStAXPaser, it may be implemented like this,
* public class AbstractStAXParser extends AbstractXMLDocumentParser {*
* public int m_curEventType;*
* public String m_characters;*
* ….*
* public void comment(XMLString text, Augmentations augs) throws XNIException {*
* m_curEventType = XMLStreamConstants.COMMENT;*
* m_characters = new String(text.ch, text.offset, text.length); *
* }*
* … *
* }*
* *
Meanwhile, XMLPullParserConfiguration<http://xerces.apache.org/xerces2-j /javadocs/xni/org/apache/xerces/xni/parser/XMLPullParserConfiguration.html>will be used to control the parsing process. XML11Configuration is the implementation of XMLPullParserConfiguration interface in Xerces. I think I can implement StAXPaserConfiguration which extends from XML11Configuration for XML1.0 and XML 1.1. In runtime, AbstractStAXParser will be set as the handlers of the StAXParserConfiguration instance.
As for XMLStreamReader, it can be implemented as this,
*public class StAXXMLStreamReaderr implements XMLStreamReader {*
* public StAXPaserConfiguration m_configuration;*
* public StAXParser m_parser;*
* ….*
* int getEventType<http://java.sun.com/javase/6/docs/api/javax/xml/stream /XMLStreamReader.html#getEventType%28%29> ()*
*{*
* return m_parser.m_curEventType;*
* } *
*int next<http://java.sun.com/javase/6/docs/api/javax/xml/stream /XMLStreamReader.html#getEventType%28%29> ()*
*{*
* m_configuration.parse(false)*
*return m_parser.m_curEventType;;*
*}*
*… *
*}*
Above are some of my rough thoughts, so if you have any comments and questions, I would like to discuss with you.
Thanks, Wei
<p><span style="font-size: 12pt;" lang="EN-US">Hello</span><span style="font -size: 12pt;">,</span><span style="font-size: 12pt;" lang="EN-US">Everyone,< /span></p>
<p><span style="font-size: 12pt;" lang="EN-US"><span> </span>I'm a student applying for <a href="http://code.google.com/soc/" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">SoC </a >project <a href="http://wiki.apache.org/general/SummerOfCode2007" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)"> Add support for the StAX(JSR-173) cursor API to Xerces-J</a>. <span>Michael suggested I could discuss my proposal in the mailing list. So I would like to introduce my thoughts and plan on this student project, any comments are welcomed</span></span><span style="font -size: 12pt;" lang="EN-US"><span>.</span></span><span style="font-size: 12pt;" lang="EN-US"><span> : ) </span><span><ins cite="mailto:home" datetime="2007-03 -31T14:27"> </ins></span></span></p>
<p><span lang="EN-US"> <span> </span>< /span><span style="font-size: 12pt;" lang="EN-US">The abstract description of project is:</span><span style="font-size: 12pt;" lang="EN-US"> "To design and implement the cursor-based<a href="http://java.sun.com/javase/6/docs /api/javax/xml/stream/XMLStreamReader.html" target="_blank" onclick="return top .js.OpenExtLink(window,event,this)"><span style="color: windowtext; text -decoration: none;"> <span></span></span><span style="color: windowtext; text-decoration: none;"> XMLStreamReader</span></a> (and <a href="http://java.sun.com/javase/6/docs/api/javax/xml/stream /StreamFilter.html" target="_blank" onclick="return top.js.OpenExtLink(window ,event,this)"><span style="color: windowtext; text-decoration: none;"><span> <img alt="[WWW]" border="0" height="11" width="11"> </span></span><span style="color: windowtext; text-decoration: none;">filtering </span></a> support). It should be possible to accomplish this using <a href="http://xerces .apache.org/xerces2-j/xni.html" target="_blank" onclick="return top.js .OpenExtLink(window,event,this)"><span style="color: windowtext; text-decoration : none;"> </span><span style="color: windowtext; text-decoration: none;"> XNI</span></a> by building the <a href="http://java.sun.com/javase/6/docs/api/javax/xml/stream /XMLStreamReader.html" target="_blank" onclick="return top.js.OpenExtLink(window ,event,this)"><span style="color: windowtext; text-decoration: none;"> <span></span></span><span style="color: windowtext; text-decoration: none;"> XMLStreamReader</span></a> on top of an<a href="http://xerces.apache.org/xerces2-j/javadocs/xni/org/apache /xerces/xni/parser/XMLPullParserConfiguration.html" target="_blank" onclick= "return top.js.OpenExtLink(window,event,this)"><span style="color: windowtext; text-decoration: none;"> <span></span></span><span style="color: windowtext; text-decoration: none;"> XMLPullParserConfiguration</span></a>." </span></p>
<p style="text-indent: 24pt;"><span style="font-size: 12pt;" lang="EN-US" >Besides XNI, there are several ways to implement StAX interface. For example, parse the XML document as raw text and start from scratch, including parsing characters, building token, and interpreting tokens, and so on. Or to implement a converter from existing DOM or SAX interfaces. However, after reading Xerces sources code, I found both SAX and DOM implementations are based on XNI, so it's very natural to build StAX on XNI.<span> </span><span> </span> </span></p>
<p style="text-indent: 24pt;"><span style="font-size: 12pt;" lang="EN-US">To implement XMLStreamReader, two important preconditions should be confirmed.</span></p>
<p style="margin-left: 42pt; text-indent: -18pt;"><span style="font-size: 12pt; " lang="EN-US"><span>1.<span style="font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;">
</span></span></span><span style="font-size: 12pt;" lang="EN-US">XML event information can be received.</span></p>
<p style="margin-left: 42pt; text-indent: -18pt;"><span style="font-size: 12pt; " lang="EN-US"><span>2.<span style="font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;">
</span></span></span><span style="font-size: 12pt;" lang="EN-US">The pull style parsing process can be simulated. </span></p>
<p style="text-indent: 24pt;"><span style="font-size: 12pt;" lang="EN-US">When I look through the XNI interfaces, I found it actually meets these two preconditions. The handler interfaces in XNI such as XMLDocumentHandler and XMLDTDHandler can get XML events including startDocument and endDocument, which can be easily mapped to StAX events accordingly. <a href="http://xerces.apache.org/xerces2-j /javadocs/xni/org/apache/xerces/xni/parser/XMLPullParserConfiguration.html" title="interface in org.apache.xerces.xni.parser" target="_blank" onclick= "return top.js.OpenExtLink(window,event,this)"> <span style="color: windowtext; text-decoration: none;"> XMLPullParserConfiguration</span></a> interface in XNI is used to represent a parser configuration that can be used as the configuration for a "pull" parser, thus the pull parsing process of StAX can be simulated by calling "boolean parse(boolean)" method in <a href="http://xerces.apache.org/xerces2-j/javadocs/xni/org/apache /xerces/xni/parser/XMLPullParserConfiguration.html" title="interface in org .apache.xerces.xni.parser" target="_blank" onclick="return top.js.OpenExtLink (window,event,this)">
<span style="color: windowtext; text-decoration: none;" >XMLPullParserConfiguration</span></a> <span> </span>.<span> </span></span></p>
<p style="text-indent: 24pt;"><span style="font-size: 12pt;" lang="EN-US">Then I looked through the current Xerces Implementation, I found AbstractXMLDocumentParser class implements XMLDocumentHandler, XMLDTDHandler, and XMLDTDContentModelHandler interfaces. Both AbstractDOMParser and AbstractSAXParser extend from AbstractXMLDocumentParser. So I think I can implement an AbstractStAXParser extending AbstractXMLDocumentParser to get XML events. </span></p>
<p style="text-indent: 24pt;"><span style="font-size: 12pt;" lang="EN-US">For example, code in current AbstractSAXParser:</span></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>public void comment(XMLString text, Augmentations augs) throws XNIException { </span>< /i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>try {</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>// SAX2 extension</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>if (fLexicalHandler != null) {</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>fLexicalHandler.comment(text.ch, 0, text.length);</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>}</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>}</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>catch (SAXException e) {</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>throw new XNIException(e);</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>}</span></i></p><p style="text-indent: 21.75pt;" ><i><span lang="EN-US"><span> </span>} // comment(XMLString)</span></i></p>
<p style="text-indent: 21.75pt;"><span lang="EN-US"> </span></p>
<p style="text-indent: 24pt;"><span style="font-size: 12pt;" lang="EN-US">And in my AbstractStAXPaser, it may be implemented like this,<br></span></p><p style= "text-indent: 21.75pt;"><i><span lang="EN-US"><span style="font-style: italic;"> </span>public class AbstractStAXParser extends AbstractXMLDocumentParser {</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>public int m_curEventType;</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> < /span><span> </span> public String m_characters;</span></i></p >
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span>   ; </span>….</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span><span style="font-style: italic;"><span style="font-style: italic;"> </span></span>public void comment(XMLString text, Augmentations augs) throws XNIException {</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>m _curEventType = XMLStreamConstants.COMMENT;</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>m_characters = new String(text.ch, text.offset, text.length);<span> </span></span></i>< /p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span>   ; </span>}</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span>   ; </span>… </span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span style="font-style: italic;"> </span>}</span></i></p><p style="text-indent: 21.75pt;"><i> <span lang="EN-US"><br></span></i></p><p style="text-indent: 24pt;"><span style= "font-size: 12pt;" lang="EN-US"> Meanwhile, <a href="http://xerces.apache.org/xerces2-j/javadocs/xni/org/apache /xerces/xni/parser/XMLPullParserConfiguration.html" title="interface in org .apache.xerces.xni.parser" target="_blank" onclick="return top.js.OpenExtLink (window,event,this)">
<span style="color: windowtext; text-decoration: none;" >XMLPullParserConfiguration</span></a> will be used to control the parsing process. XML11Configuration is the implementation of XMLPullParserConfiguration interface in Xerces. I think I can implement StAXPaserConfiguration which extends from XML11Configuration for XML1.0 and XML 1.1. In runtime, AbstractStAXParser will be set as the handlers of the StAXParserConfiguration instance. </span></p>
<p style="text-indent: 24pt;"><span style="font-size: 12pt;" lang="EN-US">As for XMLStreamReader, it can be implemented as this,<br></span></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US">public class StAXXMLStreamReaderr implements XMLStreamReader {</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>public StAXPaserConfiguration m_configuration;</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span>public StAXParser m_parser;</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span>   ; </span>….</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"><span> </span><span > </span>int <a href="http://java.sun.com/javase/6/docs/api/javax /xml/stream/XMLStreamReader.html#getEventType%28%29" target="_blank" onclick= "return top.js.OpenExtLink(window,event,this)">
<span style="color: windowtext; text-decoration: none;">getEventType</span></a> ()</span></i></p>
<p style="text-indent: 37.5pt;"><i><span lang="EN-US">{</span></i></p>
<p style="text-indent: 37.5pt;"><i><span lang="EN-US"><span> </span><span > </span>return m_parser.m_curEventType;</span></i></p>
<p style="text-indent: 21.75pt;"><i><span lang="EN-US"> }<br><span></span></span></i></p>
<p style="text-indent: 37.5pt;"><i><span lang="EN-US">int <a href="http://java .sun.com/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html#getEventType%28 %29" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)"> <span style="color: windowtext; text-decoration: none;"> next</span></a>()</span></i></p>
<p style="text-indent: 37.5pt;"><i><span lang="EN-US">{</span></i></p>
<p style="text-indent: 37.5pt;"><i><span lang="EN-US"><span> </span><span > </span>m_configuration.parse(false)</span></i></p>
<p style="text-indent: 53.25pt;"><i><span lang="EN-US">return m_parser.m _curEventType;;</span></i></p>
<p style="text-indent: 36.75pt;"><i><span lang="EN-US">}</span></i></p>
<p style="text-indent: 36.75pt;"><i><span lang="EN-US">… </span></i></p>
<p style="text-indent: 21pt;"><i><span lang="EN-US">}</span></i></p>
<p><span style="font-size: 12pt;" lang="EN-US"> <br></span></p><p><span style="font-size: 12pt;" lang="EN-US"> Above are some of my rough thoughts, so if you have any comments and questions, I would like to discuss with you. <br></span></p><p><span style="font-size: 12pt;" lang="EN-US"> </span></p>
<p><span style="font-size: 12pt;" lang="EN-US">Thanks, Wei</span></p>
|
|