r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Red] Red language group

Mchean
4-Oct-2011
[3519]
has the Red google group moved somewhere else, don't see any activity
Andreas
4-Oct-2011
[3520]
It's still at
http://groups.google.com/group/red-lang
Mchean
4-Oct-2011
[3521x2]
just quiet at the moment
?
Andreas
4-Oct-2011
[3523x2]
Yes.
Also:
https://twitter.com/#!/red_lang/status/118396786737033216
Mchean
4-Oct-2011
[3525]
thanks
Kaj
9-Oct-2011
[3526x4]
Implemented horizontal and vertical box layouts in the GTK binding
Added a widgets overview to the examples
Here's the current one:
gtk-view window [
	gtk-position-center
	"Widgets Overview"
	icon "Red-48x48.png"
	vbox [
		label "Vertical box"
		fixed [
			label "Fixed layout"
			5 25  button [50 25  "Quit" :gtk-quit]
		]
		hbox [
			label "Horizontal box"
			button ["Fill"] yes
			button "Expand"
			button ["Fixed"] no
		]
		vbox [
			label "Vertical box"
			button ["Fill"] yes
			button "Expand"
			button ["Fixed"] no
		] yes
	]
]
Dockimbel
11-Oct-2011
[3530]
Works fine on Win7. What are the yes/no keywords for?
Kaj
11-Oct-2011
[3531x3]
I'm about to define names for them. :-) They were the most practical 
way to construct a dialect that results in proper settings for filling 
or fixating a box cell
Did you resize the window? Then the working becomes clear
Not many floats are used in GTK, but I need them for layout alignment
Dockimbel
11-Oct-2011
[3534]
Ok, I see now what they are used for. :-) Are the extra brackets 
around some button titles a special convention you're using?
Pekr
11-Oct-2011
[3535x2]
Hmm, no floats in Red/System will have to come anyway, no? :-)
eh, minus "no" in above sentence :-)
Kaj
11-Oct-2011
[3537x2]
Normally a button needs more than one parameter, so it would always 
have brackets. But here they're only used as examples, so they only 
have a display text and the brackets can be left out
I left them in for a while to make the separation with the optionally 
following layout parameters clearer, but in the latest version I 
reconsidered
Dockimbel
11-Oct-2011
[3539x2]
Anyone knows where to find exhaustive lists of invalid UTF-8 encoding 
ranges?
I am calculating them by hand, so I might miss some.
Andreas
11-Oct-2011
[3541x3]
C0, C1, F5-FF must never occur in UTF-8.
80-BF are continuation bytes.
Is that what you are after?
Dockimbel
11-Oct-2011
[3544]
Yes, but I was searching for an exhaustive list of rules.
Andreas
11-Oct-2011
[3545x2]
RFC3629 has a (non-normative) ABNF, if I remember correctly.
http://tools.ietf.org/html/rfc3629#section-4s
Dockimbel
11-Oct-2011
[3547x3]
Here are the parse rules I came up with so far: https://gist.github.com/1278718
I think I am missing some overlong combinations.
I am also unsure of the valid range of the 2nd byte in the four-bytes 
encoding.
Andreas
11-Oct-2011
[3550]
one-byte-codepoint: charset [#"^(00)" - #"^(7F)]
Dockimbel
11-Oct-2011
[3551]
Right, fixing that.
Andreas
11-Oct-2011
[3552x4]
tail-bytes: charset [#"^(80)" - #"^(BF)]

two-byte-codepoint: reduce [charset [#"^(C2)" - #"^(DF)] tail-bytes]
tail-bytes == cont-byte
three-byte-codepoint: reduce [
  #"^(E0)" charset [#"^(A0)" - #"^(BF)] cont-byte
| charset [#"^(E1)" - #"^(EC)"] 2 cont-byte
| #"^(ED)" charset [#"^(80)" - #"^(9F)] cont-byte
| charset [#"^(EE)" - #"^(EF)"] 2 cont-byte 
]
four-byte-codepoint: reduce [
  #"^(F0)" charset [#"^(90)" - #"^(BF)] 2 cont-byte
| charset [#"^(F1)" - #"^(F3)"] 3 cont-byte
| #"^(F4)" charset [#"^(80)" - #"^(8F)] 2 cont-byte
]
Dockimbel
11-Oct-2011
[3556x2]
Thanks, I see that everything I need is in http://tools.ietf.org/html/rfc3629#section-4
BrianH: what was the CureCode ticket where you've summed up the word! 
Unicode parsing rules?
BrianH
11-Oct-2011
[3558x3]
http://issue.cc/r3/1302for the ASCII range in R3. The R3 parser 
tends to be excessively forgiving outside the ASCII range, accepting 
too much, though I haven't done the thorough test.
You might also consider looking at the source of INVALID-UTF? in 
R2, which is MIT licensed from R2/Forward.
It would still be a good idea to review the Unicode standard to determine 
which of the characters should be treated as spaces, but that would 
still be a problem for R3 because all of the delimiters it currently 
supports are one byte in UTF-8 for efficiency. If other delimiters 
are supported, R3's parser will be much slower.
Dockimbel
12-Oct-2011
[3561]
Thanks. For whitespaces, I have already taken higher Unicode codepoints 
into account (from this list: http://en.wikipedia.org/wiki/Whitespace_character).
Andreas
12-Oct-2011
[3562x2]
Completely forgot about INVALID-UTF? :)
After having a quick glance at it, at least for utf8 it's quite basic 
and does not take any of the above overlong combinations into account.
BrianH
12-Oct-2011
[3564x4]
The policy on overlong combinations was set by R3, where there isn't 
as much need to flag them. Overlong combinations are a problem in 
UTF-8 for code that works on the binary encoding directly, instead 
of translating to Unicode first. The only function in R3 that operates 
that way is TRANSCODE, so as long as it doesn't choke on overlong 
combinations there is no problem with them being allowed. It might 
be good to add a /strict option to INVALID-UTF? though to make it 
check for them.
Speaking of which, I don't think anyone has tried overlong combinations 
with TRANSCODE yet. We should look into that.
(I mean, aside from Carl possible doing so internally.)
As long as they are interpreted exactly the same as the short encoding 
of the value, no problems.
Andreas
12-Oct-2011
[3568]
(Let's switch to !REBOL3.)