First proposal on SoC project "Add support for the StAX (JSR-173) cursor API 2007-04-04 - By wei duan
Hi Michael, Thanks for your reply. As for your comments:
1. XML11Configuration. After looking through it, I think actually I can use it directly for StAXStreamReader implementation. 2. XMLPullParserConfiguration use org .apache.xerces.impl.XMLDocumentScannerImpl as its document scanner to support pull parser. To avoid Alek's problem, one solution is that I need to know the XMLDocumentScannerImple implementation details and see when it stops in parsing process. If the scanner didn't meet my expections, for example, parsing two StAX events as its one step, then I can prepare proper buffer to store extra event and information.
Thanks, Wei
On 4/2/07, Michael Glavassevich <mrglavas@(protected)> wrote: > > Hi Wei, > > Welcome to the list and thanks for sharing your thoughts with everyone. I > think you've got the general idea. Some initial comments below... > > "wei duan" <weidua@(protected)> wrote on 04/01/2007 09:05:53 AM: > > > Hello,Everyone, > > I'm a student applying for SoC project Add support for the > > StAX(JSR-173) cursor API to Xerces-J. Michael suggested I could > > discuss my proposal in the mailing list. So I would like to > > introduce my thoughts and plan on this student project, any comments > > are welcomed. : ) > > The abstract description of project is: "To design and > > implement the cursor-based XMLStreamReader (and [image removed] > filtering > > support). It should be possible to accomplish this using XNI by building > the > > XMLStreamReader on top of an XMLPullParserConfiguration." > > Besides XNI, there are several ways to implement StAX interface. For > > example, parse the XML document as raw text and start from scratch, > > including parsing characters, building token, and interpreting > > tokens, and so on. Or to implement a converter from existing DOM or > > SAX interfaces. > > The student who we had for GSoC last year implemented those already. > They're useful when you're starting from a SAX or DOM source though you > really need a native solution to get decent performance if you're parsing > the document from a stream. > > > However, after reading Xerces sources code, I found > > both SAX and DOM implementations are based on XNI, so it's very > > natural to build StAX on XNI. > > To implement XMLStreamReader, two important preconditions should be > confirmed. > > 1. XML event information can be received. > > 2. The pull style parsing process can be simulated. > > When I look through the XNI interfaces, I found it actually meets > > these two preconditions. The handler interfaces in XNI such as > > XMLDocumentHandler and XMLDTDHandler can get XML events including > > startDocument and endDocument, which can be easily mapped to StAX > > events accordingly. XMLPullParserConfiguration interface in XNI is > > used to represent a parser configuration that can be used as the > > configuration for a "pull" parser, thus the pull parsing process of > > StAX can be simulated by calling "boolean parse(boolean)" method in > > XMLPullParserConfiguration . > > Then I looked through the current Xerces Implementation, I found > > AbstractXMLDocumentParser class implements XMLDocumentHandler, > > XMLDTDHandler, and XMLDTDContentModelHandler interfaces. Both > > AbstractDOMParser and AbstractSAXParser extend from > > AbstractXMLDocumentParser. So I think I can implement an > > AbstractStAXParser extending AbstractXMLDocumentParser to get XML > events. > > For example, code in current AbstractSAXParser: > > public void comment(XMLString text, Augmentations augs) throws > > XNIException { > > try { > > // SAX2 extension > > if (fLexicalHandler != null) { > > fLexicalHandler.comment(text.ch, 0, text.length); > > } > > } > > catch (SAXException e) { > > throw new XNIException(e); > > } > > } // comment(XMLString) > > > > And in my AbstractStAXPaser, it may be implemented like this, > > public class AbstractStAXParser extends AbstractXMLDocumentParser { > > public int m_curEventType; > > public String m_characters; > > …. > > public void comment(XMLString text, Augmentations augs) > > throws XNIException { > > m_curEventType = XMLStreamConstants.COMMENT; > > m_characters = new String(text.ch, text.offset, > text.length); > > Where possible you should try to avoid creating new strings unless a call > to the API demands one. > > > } > > … > > } > > > > Meanwhile, XMLPullParserConfiguration will be used to control the > > parsing process. XML11Configuration is the implementation of > > XMLPullParserConfiguration interface in Xerces. > > It's one of several parser configurations which implement > XMLPullParserConfiguration. Given the way we implemented XInclude > (dispatching a child pipeline to read the entire include before returning > to the parent), XML11Configuration is probably a better choice than > XIncludeAwareParserConfiguration (the current default config for SAX and > DOM). > > > I think I can implement StAXPaserConfiguration which extends from > > XML11Configuration for XML1.0 and XML 1.1. > > I'm not sure why you would need a new parser configuration. I think > XML11Configuration would work just fine. Is there something that you think > is missing from it? > > > In runtime, AbstractStAXParser will be set as the handlers of the > > StAXParserConfiguration instance. > > As for XMLStreamReader, it can be implemented as this, > > public class StAXXMLStreamReaderr implements XMLStreamReader { > > public StAXPaserConfiguration m_configuration; > > public StAXParser m_parser; > > …. > > int getEventType() > > { > > return m_parser.m_curEventType; > > } > > int next() > > { > > m_configuration.parse(false) > > return m_parser.m_curEventType;; > > } > > … > > } > > I don't think the StAX and XNI method signatures overlap with each other. > You could probably merge this into the other class and avoid the > indirection. > > public class AbstractStAXParser > extends AbstractXMLDocumentParser > implements XMLStreamReader { > ... > } > > > Above are some of my rough thoughts, so if you have any > > comments and questions, I would like to discuss with you. > > > > Thanks, Wei > > Thanks. > > Michael Glavassevich > XML Parser Development > IBM Toronto Lab > E-mail: mrglavas@(protected) > E-mail: mrglavas@(protected) >
Hi Michael,<br> Thanks for your reply. As for your comments:<br><br> 1. XML11Configuration. After looking through it, I think actually I can use it directly for StAXStreamReader implementation.<br> 2. XMLPullParserConfiguration use org <font color="#000000" face="arial,helvetica,sanserif"><code>.apache.xerces.impl .XMLDocumentScannerImpl</code></font> as its document scanner to support pull parser. To avoid Alek's problem, one solution is that I need to know the XMLDocumentScannerImple implementation details and see when it stops in parsing process. If the scanner didn't meet my expections, for example, parsing two StAX events as its one step, then I can prepare proper buffer to store extra event and information. <br><br>Thanks, Wei<br><br><div><span class="gmail_quote">On 4/2/07, <b class= "gmail_sendername">Michael Glavassevich</b> <<a href="mailto:mrglavas@(protected) .com">mrglavas@(protected)</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"> Hi Wei,<br><br>Welcome to the list and thanks for sharing your thoughts with everyone. I<br>think you've got the general idea. Some initial comments below...<br><br>"wei duan" <<a href="mailto:weidua@(protected)"> weidua@(protected)</a>> wrote on 04/01/2007 09:05:53 AM:<br><br>> Hello ,Everyone,<br>> I'm a student applying for SoC project Add support for the<br>> StAX(JSR-173) cursor API to Xerces-J. Michael suggested I could <br>> discuss my proposal in the mailing list. So I would like to<br>> introduce my thoughts and plan on this student project, any comments<br>> are welcomed. : )<br>> The abstract description of project is: "To design and <br>> implement the cursor-based XMLStreamReader (and [image removed]<br >filtering<br>> support). It should be possible to accomplish this using XNI by building<br>the<br>> XMLStreamReader on top of an XMLPullParserConfiguration." <br>> Besides XNI, there are several ways to implement StAX interface. For <br>> example, parse the XML document as raw text and start from scratch,<br> > including parsing characters, building token, and interpreting <br>> tokens, and so on. Or to implement a converter from existing DOM or<br >> SAX interfaces.<br><br>The student who we had for GSoC last year implemented those already.<br>They're useful when you're starting from a SAX or DOM source though you <br>really need a native solution to get decent performance if you're parsing<br>the document from a stream.<br><br>> However, after reading Xerces sources code, I found<br>> both SAX and DOM implementations are based on XNI, so it's very <br>> natural to build StAX on XNI.<br>> To implement XMLStreamReader, two important preconditions should be<br>confirmed.<br>> 1.   ; XML event information can be received.<br>> 2.   ; The pull style parsing process can be simulated. <br>> When I look through the XNI interfaces, I found it actually meets<br> > these two preconditions. The handler interfaces in XNI such as<br>> XMLDocumentHandler and XMLDTDHandler can get XML events including<br> > startDocument and endDocument, which can be easily mapped to StAX<br>> events accordingly. XMLPullParserConfiguration interface in XNI is<br>> used to represent a parser configuration that can be used as the<br> > configuration for a "pull" parser, thus the pull parsing process of<br>> StAX can be simulated by calling "boolean parse(boolean)" method in<br>> XMLPullParserConfiguration .<br>> Then I looked through the current Xerces Implementation, I found <br>> AbstractXMLDocumentParser class implements XMLDocumentHandler,<br>> XMLDTDHandler, and XMLDTDContentModelHandler interfaces. Both<br>> AbstractDOMParser and AbstractSAXParser extend from<br>> AbstractXMLDocumentParser. So I think I can implement an <br>> AbstractStAXParser extending AbstractXMLDocumentParser to get XML<br >events.<br>> For example, code in current AbstractSAXParser:<br>> public void comment(XMLString text, Augmentations augs) throws<br>> XNIException { <br>> try {<br>> // SAX2 extension<br>> if (fLexicalHandler != null) {<br>>   ; fLexicalHandler.comment(text.ch, 0, text.length);<br>> }<br>> } <br>> catch (SAXException e) {<br>> throw new XNIException(e);<br>> }<br>> } // comment(XMLString)<br>> <br>> And in my AbstractStAXPaser, it may be implemented like this,<br> > public class AbstractStAXParser extends AbstractXMLDocumentParser {<br>> public int m_curEventType;<br>> public String m_characters;<br>> ….<br>>   ;public void comment(XMLString text, Augmentations augs) <br>> throws XNIException {<br>>   ; m_curEventType = XMLStreamConstants .COMMENT;<br>> m_characters = new String(text.ch, text.offset,<br>text .length);<br><br>Where possible you should try to avoid creating new strings unless a call <br>to the API demands one.<br><br>> }<br>> …<br>> }<br>><br>> Meanwhile, XMLPullParserConfiguration will be used to control the<br>> parsing process. XML11Configuration is the implementation of <br>> XMLPullParserConfiguration interface in Xerces.<br><br>It's one of several parser configurations which implement<br>XMLPullParserConfiguration. Given the way we implemented XInclude<br>(dispatching a child pipeline to read the entire include before returning <br>to the parent), XML11Configuration is probably a better choice than<br >XIncludeAwareParserConfiguration (the current default config for SAX and<br>DOM ).<br><br>> I think I can implement StAXPaserConfiguration which extends from <br>> XML11Configuration for XML1.0 and XML 1.1.<br><br>I'm not sure why you would need a new parser configuration. I think<br>XML11Configuration would work just fine. Is there something that you think<br>is missing from it? <br><br>> In runtime, AbstractStAXParser will be set as the handlers of the <br>> StAXParserConfiguration instance.<br>> As for XMLStreamReader, it can be implemented as this,<br>> public class StAXXMLStreamReaderr implements XMLStreamReader { <br>> public StAXPaserConfiguration m_configuration;<br>> public StAXParser m_parser;<br>> ….<br>> int getEventType()<br>> {<br>> return m_parser.m _curEventType;<br>> } <br>> int next()<br>> {<br>> m_configuration .parse(false)<br>> return m_parser.m_curEventType;;<br>> }<br>> …<br> > }<br><br>I don't think the StAX and XNI method signatures overlap with each other. <br>You could probably merge this into the other class and avoid the<br >indirection.<br><br>public class AbstractStAXParser<br> extends AbstractXMLDocumentParser<br> implements XMLStreamReader {<br>  ; ...<br>}<br><br>> Above are some of my rough thoughts, so if you have any <br>> comments and questions, I would like to discuss with you.<br>><br> > Thanks, Wei<br><br>Thanks.<br><br>Michael Glavassevich<br>XML Parser Development<br>IBM Toronto Lab<br>E-mail: <a href="mailto:mrglavas@(protected)"> mrglavas@(protected)</a><br>E-mail: <a href="mailto:mrglavas@(protected)" >mrglavas@(protected)</a><br></blockquote></div><br>
|
|