[cosmo-dev] More Unicode

Travis Vachon travis at osafoundation.org
Mon Dec 3 10:57:59 PST 2007


On Dec 2, 2007, at 8:36 AM, Brian Moseley wrote:

> On Nov 30, 2007 2:56 PM, Travis Vachon <travis at osafoundation.org>  
> wrote:
>
>> In any case, my gut feel
>> is that it would be easier to fix the client for Unicode support than
>> to change the server to blacklist characters.
>
> how would you do that?

We can modify our utf8 encoding logic to detect the utf-16 character  
pairs and translate appropriately instead of encoding each character  
as an actual character.

To really be complete, however, we'd need to do this for all output  
(that is, all text in server requests) generated on the client. This  
would probably be a pretty significant amount of work, so on second  
thought I think it might be worth looking at just how much work we'd  
need on the server.

>
>
> re validation - we only have to validate the syntax of usernames when
> writing them into the database. we don't have to validate them when
> they are used in queries. the only code change we'd have to make is
> adding a regex to the ui validators and to the User model. explaining
> this restriction to users would probably be more complicated.

Yeah, explaining the restriction to users and enforcing it client side  
is something required by both of our proposals.

It's important to note that this problem isn't limited to usernames.  
Currently, characters with code points above U+10000 works fine in the  
desktop client and syncs to the server. Bringing this data up in the  
web ui and saving distorts this data. I think that even if we decide  
to limit usernames to U+0000-U+FFFF (the BMP) we'll still need to fix  
this bug, which will require all of the difficult work that we'd need  
to do for supporting U+10000 and above in usernames.


All that said, I think I'm leaning toward favoring the following:

1) Restrict usernames to U+0000-U+FFFF server side, add client side  
logic to explain and enforce restriction
2) Create 1.0 bug to handle U+10000 and above characters in data  
correctly.


-Travis


More information about the cosmo-dev mailing list