World: r4wp
[#Red] Red language group
older newer | first last |
DocKimbel 17-Apr-2013 [7090x5] | From a routine, if str is a red-string! pointer, this is the dispatch code you would need to use: s: GET_BUFFER(str) switch GET_UNIT(str) [ Latin-1 [...conversion code...] UCS-2 [...conversion code...] UCS-4 [...conversion code...] ] |
Beginning of internal string buffer is given by: string/rs-head str (returns a byte-ptr!) | |
Should be GET_UNIT(s) above, sorry for the typo/ | |
Another typo: should be Latin1. | |
Anyway, you don't need any conversion for Latin1, so you just have to do it for the other two formats. | |
Kaj 17-Apr-2013 [7095x2] | Sticking to Latin1 is not much use these days. Many data such as web sites is in Unicode. It would be fine if it worked like R2, as a transparent passthrough, but Red eats your Unicode and won't give it back from its internal format |
How does stdout support deal with that? Is there no conversion to the platform format there? | |
PeterWood 17-Apr-2013 [7097x2] | I'd be happy to look at a UCS-2 to UTF-8 conversion function but I don't have the time to do it at the moment. |
I'm pretty sure that would be enough for Kaj's immediate needs. | |
Kaj 17-Apr-2013 [7099x2] | Yes |
I see there are specialised platform specific print functions only for printing the internal format. They look like a base for the general purpose conversions, though | |
PeterWood 17-Apr-2013 [7101x3] | I've written a quick function that will take a Red char (UCS4) and output the equivalent UTF-8 as bytes stored in a struct!. It can be used for the base of converting a Red sting to UTF-8. What is needed is to extract Red Char! s from the Red String, call the function and then appedn the UTF-8 to a c-string! |
The function only covers the BMP at the moment. | |
You can find it at: https://github.com/PeterWAWood/Red-System-Libs/blob/master/UTF-8/ucs4-utf8.reds | |
AdrianS 18-Apr-2013 [7104] | It's so nice to see C written that way, Peter. |
Pekr 18-Apr-2013 [7105] | Yes, finally a C, that makes sense :-) Well, nothing against C, I am glad it is still around and going to stay .... |
PeterWood 18-Apr-2013 [7106x2] | I've just committed a slightly improved version that retunrs a c-string! instead of a structure. |
For me the big issue of turning the function into the utf-8 string that Kaj's wants is "How to allocate a c-string! using the Red Memory Manager rather than malloc" Any suggestions appreciated. | |
DocKimbel 18-Apr-2013 [7108x2] | It would be best to do the conversions on the fly, that is why I want to wait for I/O get done to implement such conversion routines. Anyway, for doing it now, you need to allocate a new string, the best way to do it is: str: as red-string! stack/push* str/header: TYPE_STRING str/head: 0 str/node: alloc-bytes size The new string! value will be put on stack, so any other call to a Red internal native or action might destroy it. Also, keep in mind that the GC is not there yet, so intensive I/O might quickly eat up all your RAM. |
Oh, you meant a c-string!, not a string!, so it's even easier, just use: alloc-bytes size | |
PeterWood 18-Apr-2013 [7110x2] | Thanks. |
Is there any easy way to free the c-string? | |
DocKimbel 18-Apr-2013 [7112x4] | Currently no, the freeing function requires a memory frame pointer in addition to the buffer pointer. It is meant for internal use only for now. |
Anyway, even freeing it won't help much as long as the GC doesn't do the cleanup. | |
Here's how your main loop would look like for retrieving every codepoint from a string! value: head: string/rs-head str tail: string/rs-tail str s: GET_BUFFER(str) unit: GET_UNIT(s) while [head < tail][ cp: switch unit [ Latin1 [as-integer p/value] UCS-2 [(as-integer p/2) << 8 + p/1] UCS-4 [p4: as int-ptr! p p4/value] ] ...emit UTF-8 char... head: head + unit ] | |
Oops, you should replace 'head by 'p in the above code. | |
PeterWood 18-Apr-2013 [7116] | Many thanks. |
DocKimbel 18-Apr-2013 [7117] | cp hold your codepoint as a 32-bit integer. |
PeterWood 18-Apr-2013 [7118] | I should be able to turn this into a function for Kaj to include in his routine! where he needs UTF-8 |
DocKimbel 18-Apr-2013 [7119] | I guess that should be enough for his needs. |
PeterWood 18-Apr-2013 [7120] | Fingers crossed :-) |
Oldes 18-Apr-2013 [7121] | why not to use native OS functions? At least on WIn there is: http://msdn.microsoft.com/en-us/library/windows/desktop/dd374085(v=vs.85).aspx |
DocKimbel 18-Apr-2013 [7122] | Kaj is working on Linux and Syllable only. Also that API provides UTF-16 to UTF-8 support, but we need also UCS-4 to UTF-8 (UCS-2 being a subset of UTF-16). |
Oldes 18-Apr-2013 [7123] | Some UCS related code for porting is here: http://public.googlecode.com/svn/trunk/UCSUTF.cpp |
DocKimbel 18-Apr-2013 [7124] | Endo: I have submitted a report for false positive to AVIRA, I hope Red binaries will be whitelisted soon. It seems to be the last AV vendor producing false alams, according to virustotal online testing tool. |
Endo 18-Apr-2013 [7125] | Thank you, I'll check it later. |
PeterWood 19-Apr-2013 [7126x5] | Kaj - You can find a rough and ready red-string! to c-string! function at: https://github.com/PeterWAWood/Red-System-Libs/blob/master/UTF-8/string-c-string.reds it #includes the UCS4 character to UTF8 convertor which you will need in the same directory as the string-c-string func. |
The ucs4 -> utf8 char convertor: https://github.com/PeterWAWood/Red-System-Libs/blob/master/UTF-8/ucs4-utf8.reds | |
I haven't really tested it as you can see from : https://github.com/PeterWAWood/Red-System-Libs/blob/master/UTF-8/Tests/string-c-string-test.red | |
I'm not sure how it will cope with repeaated use as there is no way to release allocated c-strings under the Red memory manager. | |
Hope it helps. | |
Pekr 19-Apr-2013 [7131] | who needs a GC/memory manager these days, just buy more RAM :-) |
DocKimbel 19-Apr-2013 [7132] | Peter, maybe you could user ALLOCATE function from Red/Sytem and let Kaj's code call FREE on UTF-8 buffers after usage? |
PeterWood 19-Apr-2013 [7133] | I didn't think that it was possible to mix using the Red Memory Manager and C memory management in the same program. Is it safe to do so? |
DocKimbel 19-Apr-2013 [7134] | Yes, it is. |
PeterWood 19-Apr-2013 [7135] | I have committed the change for the c-string to be allocated with Red/System ALLOCATE function. |
DocKimbel 20-Apr-2013 [7136] | FYI, Bruno is working on a Zlib binding for Red/System: https://github.com/be-red/Red/commits/zlib |
Kaj 20-Apr-2013 [7137] | Much obliged, Peter. I can work that into my I/O routines |
PeterWood 21-Apr-2013 [7138x2] | Really the thanks should go to Nenad. Without his help, I still be trying to work out how to do it. |
I've add support for code points above the BMP. | |
older newer | first last |