NYCJUG/FunWithMultibytes
Fun with Chinese Characters in J and Emacs
Recently, I received e-mail with embedded Chinese characters. I was pleased to see that they seem to be handled correctly in the emacs editor which is my environment for running jconsole. However, these characters are necessarily in a multi-byte format which can be confusing if not accounted for properly.
chns=. 0 : 0 在 2012年3月8日 下午1:00,Devon McCormick <devon_mcc…@spcapitaliq.com>写道: ) $chns 92 3{.chns 在
Attempt to get characters before the comma.
(]{.~','i.~]) chns 在 2012年3月8日 下午1:00,Devon McCormick <devon_mcc…@spcapitaliq.com>写道:
Huh? Did I build the tacit expression correctly?
13 : 'y{.~y i. '',''' ] {.~ ',' i.~ ]
This looks correct. Maybe the comma isn’t a comma. Let's get everything up to an arbitrary character instead. Maybe the character “D” is the same?
'D' (] {.~ i.~) chns 在 2012年3月8日 下午1:00,
Look at multi-byte funniness: if we look at the last one or two bytes before the first "D", the display isn't too helpful. However, when we look at a complete multi-byte unit (in this case) of three bytes, emacs displays the comma character we're expecting.
_1{.'D' (] {.~ i.~) chns \214 _2{.'D' (] {.~ i.~) chns \274\214 _3{.'D' (] {.~ i.~) chns ,
The numeric values of the three bytes comprising this Chinese comma:
a. i. _3{.'D' (] {.~ i.~) chns 239 188 140