high value unicode characters 2004-04-07 - By Joshua Santelli
Hello,
We're using Xerces SAX2Print, version 2.5.0 (xerces-c_2_5_0-solaris_27-cc_62) and have run into a problem with a few "high value" unicode characters. What we would like to do is validate the file and convert it to UTF-8. The SAX2Print process completes with no error but there appears to be some strange characters after the high value unicode characters (𝖢, 𝖧 and 𝒫) in the output.
The command is: # SAX2Print -v=always -x=UTF-8 test1.xml
The error that I get using SAX2Print on the output XML file is:
Fatal Error at file test1-out.xml, line 5, char 35 Message: Got an unexpected trailing surrogate character
Any idea what is going wrong here?
Thanks in advance, josh
========================= <?xml version="1.0"?> <!DOCTYPE test SYSTEM "test.dtd"> <test> <testPara> <head>1. high value Unicode characters and some punctuation as entities</head> <p>Assuming 𝖢𝖧, Hindman [ht1] showed that the existence of certain ultrafilters on the power set of the natural numbers is equivalent to Hindman’s Theorem. Adapting this work to a countable setting formalized in RCA<sub>0</sub>, this article proves the equivalence of the existence of certain ultrafilters on countable Boolean algebras and an iterated form of Hindman’s Theorem, which is closely related to Milliken’s Theorem.</p> </testPara> <testPara> <head>2. high value Unicode char and some Greek as entities</head> <p>This article is a continuation of our search for tautologies that are hard even for strong propositional proof systems like EF, cf. [Kra-wphp,Kra-tau]. The particular tautologies we study, the τ-formulas, are obtained from any 𝒫/poly map g; they express that a string is outside of the range of g. Maps g considered here are particular pseudorandom generators. The ultimate goal is to deduce the hardness of the τ-formulas for at least EF from some general, plausible computational hardness hypothesis.</p> </testPara> </test> ========================= <!ELEMENT test (testPara+) > <!ELEMENT testPara (head, p) > <!ELEMENT head (#PCDATA) > <!ELEMENT p (#PCDATA | b | i | sub)* > <!ELEMENT b (#PCDATA) > <!ELEMENT i (#PCDATA) > <!ELEMENT sub (#PCDATA) > =========================
--------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected) For additional commands, e-mail: xerces-j-user-help@(protected)
|
|