• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r4wp

[#Red] Red language group

DocKimbel
17-Apr-2013
[7090x5]
From a routine, if str is a red-string! pointer, this is the dispatch 
code you would need to use:

s: GET_BUFFER(str)
switch GET_UNIT(str) [
	Latin-1 [...conversion code...]
	UCS-2 [...conversion code...]
	UCS-4 [...conversion code...]
]
Beginning of internal string buffer is given by:

	string/rs-head str	(returns a byte-ptr!)
Should be GET_UNIT(s) above, sorry for the typo/
Another typo: should be Latin1.
Anyway, you don't need any conversion for Latin1, so you just have 
to do it for the other two formats.
Kaj
17-Apr-2013
[7095x2]
Sticking to Latin1 is not much use these days. Many data such as 
web sites is in Unicode. It would be fine if it worked like R2, as 
a transparent passthrough, but Red eats your Unicode and won't give 
it back from its internal format
How does stdout support deal with that? Is there no conversion to 
the platform format there?
PeterWood
17-Apr-2013
[7097x2]
I'd be happy to look at a UCS-2 to UTF-8 conversion function but 
I don't have the time to do it at the moment.
I'm pretty sure that would be enough for Kaj's immediate needs.
Kaj
17-Apr-2013
[7099x2]
Yes
I see there are specialised platform specific print functions only 
for printing the internal format. They look like a base for the general 
purpose conversions, though
PeterWood
17-Apr-2013
[7101x3]
I've written a quick function that will take a Red char (UCS4) and 
output the equivalent UTF-8 as bytes stored in a struct!.


It can be used for the base of converting a Red sting to UTF-8. What 
is needed is to extract Red Char! s from the Red String, call the 
function and then appedn the UTF-8 to a c-string!
The function only covers the BMP at the moment.
You can find it at:


https://github.com/PeterWAWood/Red-System-Libs/blob/master/UTF-8/ucs4-utf8.reds
AdrianS
18-Apr-2013
[7104]
It's so nice to see C written that way, Peter.
Pekr
18-Apr-2013
[7105]
Yes, finally a C, that makes sense :-) Well, nothing against C, I 
am glad it is still around and going to stay ....
PeterWood
18-Apr-2013
[7106x2]
I've just committed a slightly improved version that retunrs a c-string! 
instead of  a structure.
For me the big issue of turning the function into the utf-8 string 
that Kaj's wants is "How to allocate a c-string! using the Red Memory 
Manager rather than malloc"

Any suggestions appreciated.
DocKimbel
18-Apr-2013
[7108x2]
It would be best to do the conversions on the fly, that is why I 
want to wait for I/O get done to implement such conversion routines. 


Anyway, for doing it now, you need to allocate a new string, the 
best way to do it is:

    str: as red-string! stack/push*
    str/header: TYPE_STRING
    str/head: 0
    str/node:  alloc-bytes size


The new string! value will be put on stack, so any other call to 
a Red internal native or action might destroy it. Also, keep in mind 
that the GC is not there yet, so intensive I/O might quickly eat 
up all your RAM.
Oh, you meant a c-string!, not a string!, so it's even easier, just 
use: alloc-bytes size
PeterWood
18-Apr-2013
[7110x2]
Thanks.
Is there any easy way to free the c-string?
DocKimbel
18-Apr-2013
[7112x4]
Currently no, the freeing function requires a memory frame pointer 
in addition to the buffer pointer. It is meant for internal use only 
for now.
Anyway, even freeing it won't help much as long as the GC doesn't 
do the cleanup.
Here's how your main loop would look like for retrieving every codepoint 
from a string! value:

	head: string/rs-head str
	tail: string/rs-tail str
		
	s: GET_BUFFER(str)
	unit: GET_UNIT(s)
		
	while [head < tail][
		cp: switch unit [
			Latin1 [as-integer p/value]
			UCS-2  [(as-integer p/2) << 8 + p/1]
			UCS-4  [p4: as int-ptr! p p4/value]
		]
		...emit UTF-8 char...
		head: head + unit
	]
Oops, you should replace 'head by 'p in the above code.
PeterWood
18-Apr-2013
[7116]
Many thanks.
DocKimbel
18-Apr-2013
[7117]
cp hold your codepoint as a 32-bit integer.
PeterWood
18-Apr-2013
[7118]
I should be able to turn this into a function for Kaj to include 
in his routine! where he needs UTF-8
DocKimbel
18-Apr-2013
[7119]
I guess that should be enough for his needs.
PeterWood
18-Apr-2013
[7120]
Fingers crossed :-)
Oldes
18-Apr-2013
[7121]
why not to use native OS functions? At least on WIn there is: http://msdn.microsoft.com/en-us/library/windows/desktop/dd374085(v=vs.85).aspx
DocKimbel
18-Apr-2013
[7122]
Kaj is working on Linux and Syllable only. Also that API provides 
UTF-16 to UTF-8 support, but we need also UCS-4 to UTF-8 (UCS-2 being 
a subset of UTF-16).
Oldes
18-Apr-2013
[7123]
Some UCS related code for porting is here: http://public.googlecode.com/svn/trunk/UCSUTF.cpp
DocKimbel
18-Apr-2013
[7124]
Endo: I have submitted a report for false positive to AVIRA, I hope 
Red binaries will be whitelisted soon. It seems to be the last AV 
vendor producing false alams, according to virustotal online testing 
tool.
Endo
18-Apr-2013
[7125]
Thank you, I'll check it later.
PeterWood
19-Apr-2013
[7126x5]
Kaj - You can find a rough and ready  red-string! to  c-string! function 
at:


https://github.com/PeterWAWood/Red-System-Libs/blob/master/UTF-8/string-c-string.reds


it #includes the UCS4 character to UTF8 convertor which you will 
need in the same directory as the string-c-string func.
The ucs4 -> utf8 char convertor:


https://github.com/PeterWAWood/Red-System-Libs/blob/master/UTF-8/ucs4-utf8.reds
I haven't really tested it as you can see from :


https://github.com/PeterWAWood/Red-System-Libs/blob/master/UTF-8/Tests/string-c-string-test.red
I'm not sure how it will cope with repeaated use as there is no way 
to release allocated c-strings under the Red memory manager.
Hope it helps.
Pekr
19-Apr-2013
[7131]
who needs a GC/memory manager these days, just buy more RAM :-)
DocKimbel
19-Apr-2013
[7132]
Peter, maybe you could user ALLOCATE function from Red/Sytem and 
let Kaj's code call FREE on UTF-8 buffers after usage?
PeterWood
19-Apr-2013
[7133]
I didn't think that it was possible to mix using the Red Memory Manager 
and C memory management in the same program. Is it safe to do so?
DocKimbel
19-Apr-2013
[7134]
Yes, it is.
PeterWood
19-Apr-2013
[7135]
I have committed the change for the c-string to be allocated with 
Red/System ALLOCATE function.
DocKimbel
20-Apr-2013
[7136]
FYI, Bruno is working on a Zlib binding for Red/System:
https://github.com/be-red/Red/commits/zlib
Kaj
20-Apr-2013
[7137]
Much obliged, Peter. I can work that into my I/O routines
PeterWood
21-Apr-2013
[7138x2]
Really the thanks should go to Nenad. Without his help, I still be 
trying to work out how to do it.
I've add support for code points above the BMP.