Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] UTF-8

From: alain::goye::free::fr at: 17-Oct-2004 19:27

Hi all, I got interested in manipulating Unicode with REBOL and tried the UTF-8 script by Jan Skibinski. It seems there is an error in the encode function which did not convert correctly my test case : the 1st letter of Khmer alphabet which code is U+1780, should become #{E19E80} in UTF-8, according to my understanding (based on http://www.zvon.org/tmRFC/RFC2279/Output/chapter2.html). In case it may be helpful to someone this version should work (though not optimized and tested only with k=2 on U+1780 :-) : encode: func [ k [integer!] ucs [string!] /local c f m x result [string!] ][ result: make string! length? ucs f: pick fetch k parse/all ucs [any [c: k skip ( either 128 > x: f c [ insert tail result x ][ result: tail result m: 64 until [ insert result to char! x and 63 or 128 (m: m / 2) > x: x and -64 / 64 ] insert result to char! x or pick udata 1 + length? result ] )]] head result ]