determining the encoding of an external subset via XNI 2003-03-10 - By neilg@(protected)
Hi all,
In an attempt to generate some more discussion surrounding the issue I raised in the message below, here are some ways by which we might move forward. For those who didn't see the previous thread, the Cole's Notes version of the problem is that, as XNI is currently designed, there doesn't seem to be any way of determining what the parser autodetected the encoding of the DTD external subset to be--or any way of determining anything about that encoding at all if the external subset doesn't happen to contain a text decl.
Here are all the options that I've thought of:
1. We could modify the XMLDTDHandler#externalSubset callback so that, instead of looking like
public void startExternalSubset(XMLResourceIdentifier identifier, Augmentations augs)
it looks like
public void startExternalSubset(XMLResourceIdentifier identifier, String encoding, Augmentations augs)
This would make that callback much more symmetric to the startDocument callback of the XMLDocumentHandler interface; unfortunately it has the tremendous drawback of not being terribly backwards compatible.
2. We could add a new callback to the XMLDTDHandler interface, something like:
public void externalSubsetEncoding(String encoding)
which we would advertise as occurring after the startExternalSubset callback and before the textDecl call. While this would be far more backward compatible, there's no precedent for anything like it in XNI; also, the callback would only be useful for external subsets, since in all other contexts we already have methods for conveying encoding information.
3. We could use the Augmentations parameter of the startExternalSubset callback. This would preserve backward compatibility, but certainly couldn't be accused of being beautiful; also , it would mark the first time we've used Augmentations in Xerces for something at the level of a scanner. So far, we've only employed that functionality in the context of schema validation.
4. We could amend the XMLLocator interface by adding a method like
public String getEncoding()
on the lines of the SAX Locator2 interface. This again would only be really useful in this single context, since XNI goes out of its way everywhere else to explicitly make provision for the passage of encoding information; i.e., it doesn't seem to accord well with the overall design of the API.
I'll readily admit that none of these solutions is particularly attractive. Thoughts, preferences, or more appealing solutions are thus even more than usually welcome!
Cheers, Neil Neil Graham XML Parser Development IBM Toronto Lab Phone: 905-413-3519, T/L 969-3519 E-mail: neilg@(protected)
----- Forwarded by Neil Graham/Toronto/IBM on 03/10/2003 06:03 PM ----- |---------+----------------------------> | | Neil Graham | | | | | | 03/04/2003 11:13 | | | PM | | | | |---------+----------------------------> >---------------------------------------------------------------------------- -----------------------------------------------------------------| | | | To: xerces-j-dev@(protected) | | cc: | | From: Neil Graham/Toronto/IBM@(protected) | | Subject: another encoding issue | | | | | >---------------------------------------------------------------------------- -----------------------------------------------------------------|
Hi all,
How does one determine the autodetected encoding of a DTD external subset?
Right now, our DTD scanner takes this information from the entity manager in a (non-XNI) startEntity(name, resourceIdentifier, encoding) call but drops the encoding information on the floor for entities whose names are [dtd].
It sure would have been handy if the XMLDTDHandler#startExternalSubset(XMLResourceIdentifier, Augmentations) had also included an encoding parameter...
Thoughts?
Cheers, Neil Neil Graham XML Parser Development IBM Toronto Lab Phone: 905-413-3519, T/L 969-3519 E-mail: neilg@(protected)
--------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected) For additional commands, e-mail: xerces-j-user-help@(protected)
|
|