Update 关于 Mac OS X 的 filesystem encoding

跟上一个问题有关。在 stackoverflow 上问了问题,得到了如下答复 [1]:

The problem is that MacOS X’s default filesystem changes all filenames you give it to an unusual normalization form which does not use precomposed characters.

根据 Python unicodedata 的文档 [2]:

The Unicode standard defines various normalization forms of a Unicode string, based on the definition of canonical equivalence and compatibility equivalence. In Unicode, several characters can be expressed in various way. For example, the character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as the sequence U+0327 (COMBINING CEDILLA) U+0043 (LATIN CAPITAL LETTER C).

其中的 U+00C7 是 ‘Ç’

对应的 python 源代码我也更新了。

请参考:
[1] http://stackoverflow.com/questions/6803621/how-to-prevent-the-command-line-argument-from-being-encoded/6809703#6809703
[2] http://docs.python.org/library/unicodedata.html#unicodedata.normalize

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: