[Chandler-dev] Re: [Cosmo-dev] dealing with characters that XML 1.0 doesn't allow

Phillip J. Eby pje at telecommunity.com
Thu May 24 15:37:14 PDT 2007


At 03:24 PM 5/24/2007 -0700, Grant Baillie wrote:

>On 24 May, 2007, at 15:16, Phillip J. Eby wrote:
>
>>At 02:58 PM 5/24/2007 -0700, Heikki Toivonen wrote:
>>>Morgen Sagen wrote:
>>> > #3 feels like the right thing to do.  One suggestion is to
>>>encode all
>>> > non-allowed by XML characters using %XX where the XX are hex
>>>digits.
>>> > Should we take that route?
>>>
>>>I said this on IRC, posting here to keep everyone in the loop.
>>>
>>>I think characters not allowed should be encoded with the standard
>>>XML
>>>way, for example ©.
>>
>>Characters that are not allowed can't be encoded in this way -
>>that's what it means that they're not allowed.  The resulting XML
>>is not well-formed, by definition.
>
>In XML 1.1, those characters are merely "Restricted" ... the grammar
>seems to have changed some. I think that's the reasoning behind the
>summarization in <http://lists.xml.org/archives/xml-dev/200701/ 
>msg00011.html>:
>
>>1-1F except CR, TAB, NL:
>>Can't occur in XML 1.0.  Can occur in XML 1.1 and must be escaped.

Note, however, that XML 1.1 still doesn't allow NULs...  so even if 
we go with 1.1 and XML escaping, we still have to do something about 
NUL, and any other disallowed characters.



More information about the cosmo-dev mailing list