Valid XML characters 2003-01-03 - By Dima Gutzeit
<DIV>Thanks for you answer.</DIV> <DIV> </DIV> <DIV>Could you please provide me with the "legal" Unicode range for XML , so I would know what to filter out.</DIV> <DIV> </DIV> <DIV><BR>Joseph Kesselman wrote:<BR>>Subject: Re: Valid XML characters <BR> > From: Joseph Kesselman <KESHLAM@(protected)><BR>> To: xerces-j-user@(protected) .apache.org <BR>> Date: Thu, 2 Jan 2003 23:27:28 -0500 <BR>> <BR>> <BR> >On Thursday, 12/26/2002 at 07:23 ZE2, "Dima Gutzeit" <DIMA@(protected)> <BR>>wrote: <BR>>> Sometimes when parsing XML files I get an error message(exception) about <BR>> <BR>>> "invalid Unicode characters" , is there any way to filter those before <BR>>parsing ? <BR>> <BR>> ;There's no way to do that within the parser. "If it contains illegal <BR>> ;characters, it isn't XML" and the error messages are entirely correct. <BR>> <BR>>You could, of course, write your own stream filter and pass the data <BR>>through that, then use its output as the input to the parser. That's <BR >>fairly straightforward Java coding. The problem would be deciding what <BR> >you're going to do with those characters when yo! u see them -- if you just <BR>>discard them you may be changing the meaning of the document, and if you <BR>>turn them into some sort of private escape sequence only applications <BR>>which understand that convention will be able to do anything with them. <BR>>Fixing the source documents really is the cleanest answer. <BR>> <BR>>For what it's worth: It has been proposed that future versions of XML <BR>>*may* relax the forbidden-character restrictions, but there's still no <BR>>firm consensus on whether that change would be desirable or what version <BR>>of XML it might find its way into. <BR>> <BR>>______________________________________ <BR>>Joe Kesselman / IBM Research <BR>> <BR>> <BR>>---------------------------- ----------------------------------------- <BR>>To unsubscribe, e-mail: xerces -j-user-unsubscribe@(protected) <BR>>For additional commands, e-mail: xerces-j-user-help@(protected) <BR>> <BR>> <BR>>____! ______________________________________________________________ <BR>> </DIV> <br><P><FONT color=#0000ff><FONT face="Comic Sans MS">Regards , <BR>Dima Gutzeit </FONT>.<BR>---------------------------------<BR>MailVision LTD. <BR>R&D Team. <BR>Phone: 972 - 4 - 8508020<BR>Fax: 972 - 3 - 9285149 <BR>http://www.mailvision.com </FONT></P><br>
--------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected) For additional commands, e-mail: xerces-j-user-help@(protected)
|
|