World: r4wp
[#Red] Red language group
older newer | first last |
DocKimbel 17-Apr-2013 [7071] | For Android, java uses UTF-16, so the conversion from string! is (almost) trivial. |
Kaj 17-Apr-2013 [7072] | But it's not there yet, is it? |
DocKimbel 17-Apr-2013 [7073] | No, I will implement it when I'll need it, and I have a lot of other stuff to code for Android support before that. |
Kaj 17-Apr-2013 [7074] | Is it wise that Red won't work on other platforms before it works on Android? |
DocKimbel 17-Apr-2013 [7075] | The features currently implementd in Red are working. |
Kaj 17-Apr-2013 [7076] | Sure, but it's pretty useless like this |
DocKimbel 17-Apr-2013 [7077] | That's why it is called an alpha. ;-) |
Kaj 17-Apr-2013 [7078] | I was hoping a little more could be done, but I'll have to postpone a lot of work |
DocKimbel 17-Apr-2013 [7079] | I told you I will have a look at it once the shared libs will be done, just wait a few days more. If it's critical to you, you might want to contribute the required conversion routines? |
Kaj 17-Apr-2013 [7080x2] | I could, but I know very little of Unicode, so there would be a lot of overhead in getting up to speed |
I have no idea how long it will take you to finish the shared libraries. It has been a backburner project for a long time | |
DocKimbel 17-Apr-2013 [7082] | Not very long, I just kept it postponed since almost a year now, and it's getting on my way for Android support since a while, so I've scheduled it since a few weeks to get it done just after the interpreter is finished (Exit/Return support). |
Kaj 17-Apr-2013 [7083] | OK, that's fine. It sounded like data out support was of undetermined priority |
DocKimbel 17-Apr-2013 [7084x11] | The only "data out" support we need for now for building Red is the stdout support, and we have it since a while. |
Red I/O full support is next on my list after the above mentioned tasks will be completed. | |
BTW, if you stick to Latin-1, you shouldn't have the need for any conversion? | |
Also, there might be a cheap way to achieve the conversion in the meantime using wsprintf() or similar function. | |
Hmm, it might not be enough, so you might want to have a look and maybe wrap libiconv: http://www.gnu.org/software/libiconv/ | |
For once, the API looks good and simple enough (4 functions to wrap). | |
From a routine, if str is a red-string! pointer, this is the dispatch code you would need to use: s: GET_BUFFER(str) switch GET_UNIT(str) [ Latin-1 [...conversion code...] UCS-2 [...conversion code...] UCS-4 [...conversion code...] ] | |
Beginning of internal string buffer is given by: string/rs-head str (returns a byte-ptr!) | |
Should be GET_UNIT(s) above, sorry for the typo/ | |
Another typo: should be Latin1. | |
Anyway, you don't need any conversion for Latin1, so you just have to do it for the other two formats. | |
Kaj 17-Apr-2013 [7095x2] | Sticking to Latin1 is not much use these days. Many data such as web sites is in Unicode. It would be fine if it worked like R2, as a transparent passthrough, but Red eats your Unicode and won't give it back from its internal format |
How does stdout support deal with that? Is there no conversion to the platform format there? | |
PeterWood 17-Apr-2013 [7097x2] | I'd be happy to look at a UCS-2 to UTF-8 conversion function but I don't have the time to do it at the moment. |
I'm pretty sure that would be enough for Kaj's immediate needs. | |
Kaj 17-Apr-2013 [7099x2] | Yes |
I see there are specialised platform specific print functions only for printing the internal format. They look like a base for the general purpose conversions, though | |
PeterWood 17-Apr-2013 [7101x3] | I've written a quick function that will take a Red char (UCS4) and output the equivalent UTF-8 as bytes stored in a struct!. It can be used for the base of converting a Red sting to UTF-8. What is needed is to extract Red Char! s from the Red String, call the function and then appedn the UTF-8 to a c-string! |
The function only covers the BMP at the moment. | |
You can find it at: https://github.com/PeterWAWood/Red-System-Libs/blob/master/UTF-8/ucs4-utf8.reds | |
AdrianS 18-Apr-2013 [7104] | It's so nice to see C written that way, Peter. |
Pekr 18-Apr-2013 [7105] | Yes, finally a C, that makes sense :-) Well, nothing against C, I am glad it is still around and going to stay .... |
PeterWood 18-Apr-2013 [7106x2] | I've just committed a slightly improved version that retunrs a c-string! instead of a structure. |
For me the big issue of turning the function into the utf-8 string that Kaj's wants is "How to allocate a c-string! using the Red Memory Manager rather than malloc" Any suggestions appreciated. | |
DocKimbel 18-Apr-2013 [7108x2] | It would be best to do the conversions on the fly, that is why I want to wait for I/O get done to implement such conversion routines. Anyway, for doing it now, you need to allocate a new string, the best way to do it is: str: as red-string! stack/push* str/header: TYPE_STRING str/head: 0 str/node: alloc-bytes size The new string! value will be put on stack, so any other call to a Red internal native or action might destroy it. Also, keep in mind that the GC is not there yet, so intensive I/O might quickly eat up all your RAM. |
Oh, you meant a c-string!, not a string!, so it's even easier, just use: alloc-bytes size | |
PeterWood 18-Apr-2013 [7110x2] | Thanks. |
Is there any easy way to free the c-string? | |
DocKimbel 18-Apr-2013 [7112x4] | Currently no, the freeing function requires a memory frame pointer in addition to the buffer pointer. It is meant for internal use only for now. |
Anyway, even freeing it won't help much as long as the GC doesn't do the cleanup. | |
Here's how your main loop would look like for retrieving every codepoint from a string! value: head: string/rs-head str tail: string/rs-tail str s: GET_BUFFER(str) unit: GET_UNIT(s) while [head < tail][ cp: switch unit [ Latin1 [as-integer p/value] UCS-2 [(as-integer p/2) << 8 + p/1] UCS-4 [p4: as int-ptr! p p4/value] ] ...emit UTF-8 char... head: head + unit ] | |
Oops, you should replace 'head by 'p in the above code. | |
PeterWood 18-Apr-2013 [7116] | Many thanks. |
DocKimbel 18-Apr-2013 [7117] | cp hold your codepoint as a 32-bit integer. |
PeterWood 18-Apr-2013 [7118] | I should be able to turn this into a function for Kaj to include in his routine! where he needs UTF-8 |
DocKimbel 18-Apr-2013 [7119] | I guess that should be enough for his needs. |
PeterWood 18-Apr-2013 [7120] | Fingers crossed :-) |
older newer | first last |