r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[!REBOL3 Host Kit]

Kaj
2-Jan-2011
[1060]
You can define it when you create a REBSER string, but a character, 
I don't know
BrianH
2-Jan-2011
[1061]
Doesn't REBCHR refer to an internal format?
Kaj
2-Jan-2011
[1062x2]
Yeah, I don't think I've seen it anywhere
The extensions use u32 for a character
Oldes
2-Jan-2011
[1064]
Yes... but AGG requires multibyte. Probably. At least I can display 
gob with non ansi string like:
	g_text: make gob! [size: 100x20 text: "èøž"]
but not:
	g_text: make gob! [size: 100x20 text: "crz"]
Kaj
2-Jan-2011
[1065x2]
If text is needed in a different format, you have to convert it somehow
Multibyte will always need to be multibyte. The question is what 
sort of multibyte
Oldes
2-Jan-2011
[1067]
AB
 = "^(41)^(00)^(42)^(00)"
BrianH
2-Jan-2011
[1068]
As Kaj says, that just means that there needs to be a conversion 
step in there somewhere. R3's internal character format is supposed 
to switch amond UCS-1, UCS-2 and UCS-4 in theory, but in practice 
might just be UCS-2 all the time (haven't checked lately).
Oldes
2-Jan-2011
[1069]
There must be easy way... for example to check the RMA's source :)
Kaj
2-Jan-2011
[1070x2]
Sounds like it's configured for Windows UCS-2
I'd be surprised if AGG couldn't work with UTF-8, and that wouldn't 
be the default on Unix
BrianH
2-Jan-2011
[1072]
I can only test on Windows at the moment, so I don't know how it 
behaves on other platforms.
Kaj
2-Jan-2011
[1073x2]
It would also be very useful if the Amiga patches were incorporated 
into the host kit
They probably use UTF-8
BrianH
2-Jan-2011
[1075]
That might also require some conversion, but at least then the conversion 
would be there to use. R3 uses UCS for strings internally for speed 
and code simplicity, though strangely enough words are stored in 
UTF-8 internally, since you don't have to access and change words 
on a character basis.
Oldes
2-Jan-2011
[1076]
R3 uses both, single-byte by default and multi-byte if needed.
BrianH
2-Jan-2011
[1077x2]
Windows uses UTF-16 for its APIs, not UCS-2, so by using UCS-2 R3 
is limited to the BMP codepoints.
Good to hear that the UCS-1 to UCS-2 autoexpansion is still there. 
We don't have the UCS-4 expansion supported yet though.
Kaj
2-Jan-2011
[1079]
Is UCS-2 sufficient for Chinese?
Oldes
2-Jan-2011
[1080]
I don't know, but know that I need wchar at this point and don't 
have it:

https://github.com/rebolsource/r3-hostkit/blob/f331c6a46947e6e5afedc90f3d375bcd3f7ad8a1/src/agg/agg_truetype_text.cpp#L696
BrianH
2-Jan-2011
[1081x2]
Yes, I think so.
(to Kaj)
Oldes
2-Jan-2011
[1083]
I must dig deeper.. the solution is visible even with the Carl's 
version as I can see the text using draw... it's like solving a puzzle...:)
BrianH
2-Jan-2011
[1084x2]
The BMP should cover most stuff, but there is a whole supplemental 
plane dedicated to more obscure asian ideographic scripts, and some 
of those might come up in Chinese language eventually.
It's still an ongoing issue, as many asian nations don't like each 
other very much, so commonalities in their character sets are often 
controversial.
Oldes
2-Jan-2011
[1086x4]
I've found it.. rich-text is doing the conversion here:

https://github.com/rebolsource/r3-hostkit/blob/f331c6a46947e6e5afedc90f3d375bcd3f7ad8a1/src/os/win32/host-graphics.c#L714
The question is, how to use RL_GET_STRING if I already have REBCHR 
instead of REBSER.
Also isn't it REBOL's week spot if we need so much text manipulations? 
At least for international languages?
For example the CMD_TEXT_TEXT command is called on each mouse move 
event (redraw).
BrianH
2-Jan-2011
[1090x2]
It is not a REBOL-secific weak spot. All programming languages have 
to deal with Unicode issues in some way or another, and every means 
of dealing with it has its good and bad points. Any cross-platform 
language will run into a little difficulty if it has to interact 
with the platform-specific Unicode APIs, because different platforms 
deal with the problems in different ways.
REBOL-specific
Oldes
2-Jan-2011
[1092x2]
Mistake - extensive ansi text will be affected as AGG is using widechar, 
each ansi string which we want to display in view must be converted 
to wchar on each redraw!
It could be fixed if we could change the ansi string to unicode and 
store it for later use.. I'm just a C newbie, but I don't think it's 
how it works now.
BrianH
2-Jan-2011
[1094]
Does AGG have separate 8bit and 16bit rendering APIs or is it always 
one or the other?
Oldes
2-Jan-2011
[1095x2]
I think it has only 16bit.
But I'm not so far to be sure.
BrianH
2-Jan-2011
[1097]
If it is UCS-2 or UTF-16, then all that would need to be done is 
to convert UCS-1 model R3 strings to UCS-2 mode somewhere before 
rendering. (He says glibly, having not analyzed the AGG sources or 
APIs.)
Kaj
2-Jan-2011
[1098]
I'm still guessing this only applies to AGG on Windows, using UTF-16. 
On other platforms, AGG uses FreeType, and I guess that would accept 
UTF-8
Oldes
2-Jan-2011
[1099]
Submited it to CC http://curecode.org/rebol3/ticket.rsp?id=1814so 
Carl could answer eventualy.
Kaj
2-Jan-2011
[1100]
I think the file you're looking at is not really part of AGG, but 
written by Cyphre as a bridge between AGG and Windows
Oldes
2-Jan-2011
[1101]
I know. But it does not solve the issue.
Kaj
2-Jan-2011
[1102]
The issue is we have to wait for the Amiga patches, which should 
constitute the bridge to FreeType systems
Oldes
2-Jan-2011
[1103]
Also I'm not sure REBOL is using UTF-8 internally, I think it has 
only ANSI or UCS2
Kaj
2-Jan-2011
[1104]
No, as Brian says, it's using fixed width vectors internally. You 
get UTF-8 only from conversions
Oldes
2-Jan-2011
[1105]
which means that FreeType will be affected as well.
Kaj
2-Jan-2011
[1106]
Yes, we knew since the Unicode epic that we would need conversions
Oldes
2-Jan-2011
[1107]
But for each redraw? The conversion is fine, if is only one.
Kaj
2-Jan-2011
[1108x2]
That can be cached
The text conversion is actually a minor matter. Code points also 
need to be converted to pixel glyphs, and heavy caching is always 
used there to arrive at usable performance