[Issue 1] Re: [Ietf-calsify] draft-ietf-calsify-rfc2445bis-01.txt
/ UTF-8
Bernard Desruisseaux
bernard.desruisseaux at oracle.com
Wed Aug 23 08:06:13 PDT 2006
Hi Bill,
The question is "Do we actually care?".
I would say that as long as the iCalendar stream is a valid
UTF-8 document we are ok.
Cheers,
Bernard
Bill McQuillan wrote:
> I think Bernard has pointed out one issue that occurred to me also--since
> the information is UTF8 text, an application that does not understand an
> iCalendar object could still read it, for instance a text editor. The
> question becomes how would it react to a line break in the middle of a
> composed character.
>
> After browsing in the Unicode standard I found this sentence in Annex # 14
> - Line Breaking Properties:
>
> Combining character sequences are treated as units for the purpose of
> line breaking.
>
> If the text editor assumes this property, there will likely be some loss of
> information.
>
> On Tue, 2006-08-22, Bernard Desruisseaux wrote:
>> [
>> For those not familiar with "combining character sequence" here's
>> how it is defined by Unicode: A character sequence consisting of
>> either a base character followed by a sequence of one or more
>> combining characters, or a sequence of one or more combining
>> characters.
>> ]
>
>> Let me try to put this another way. We need to decide and justify:
>
>> 1- Whether we want to allow "multi-octet characters" to be split
>> across lines.
>
>> Answer: No.
>> Why : Otherwise the resulting text would end up being invalid
>> in the specified encoding.
>
>> 2- Whether we want to allow "combining character sequences" to be
>> split across lines.
>
>> Answer: Yes.
>> Why : (1) I'm assuming that it is valid for a "combining
>> character" to be preceded by the LF character (but I
>> don't know this for a fact...), and thus the resulting
>> text would still be valid in the specified encoding
>> (but would sure "look" different).
>
>> (2) A "combining character sequence" could probably be
>> longer than 75 octets in *theory*! But I'm sure we would
>> never see this in practice though...
>
>> With this approach:
>
>> - You could open an iCalendar object specified in the charset 'X'
>> in any application that support 'X' without errors.
>
>> - You would still need to unfold the iCalendar object to be able
>> to "interpret" all the characters properly.
>
>> What do you think?
>
>> Cheers,
>> Bernard
>
>> Mark Crispin wrote:
>>> Unfortunately, your answers are circular.
>>>
>>> I understand that you assert
>>> (a) it is not alright to fold in the middle of a UTF-8 sequence
>>> ("multi-octet sequence" is ambiguous and imprecise)
>>> but that
>>> (b) it is alright to fold between a character and a combining character.
>>>
>>> However, you also give assertion (a) as the answer the "why" question
>>> for (a) and (b).
>>>
>>> Why is it not alright to fold in the middle of a UTF-8 sequence?
>>>
>>> Why is it alright to fold between a character and a combining character?
>>>
>>>
>>> What is wrong with the assertation:
>>> A proper interpretation of the text is impossible until
>>> all folding is removed and the strings are catenated.
>>> Therefore, folding may appear anywhere, even in the
>>> middle of a UTF-8 sequence.
>>> or, alternatively:
>>> A proper interpretation of a subtext is impossible unless
>>> all UTF-8 sequences and combining characters appear in
>>> that subtext. Therefore, folding may not in the middle
>>> of a UTF-8 sequence or separating the UTF-8 sequences of
>>> any (and all) combining characters from the character
>>> being combined.
>>>
>>> Why is one, or the other, of the above two assertations inferior to your
>>> pair of assertations?
>>>
>>> I'm sorry for being such a troublemaker, and to be honest I really don't
>>> know which of these is best. But someone's got to do it. Whatever
>>> decision is made, we need to justify why that decision and not the
>>> alternatives.
>>>
>>> -- Mark --
>>>
>>> http://staff.washington.edu/mrc
>>> Science does not emerge from voting, party politics, or public debate.
>>> Si vis pacem, para bellum.
>
More information about the Ietf-calsify
mailing list