World: r3wp
[Red] Red language group
older newer | first last |
Kaj 11-Oct-2011 [3538] | I left them in for a while to make the separation with the optionally following layout parameters clearer, but in the latest version I reconsidered |
Dockimbel 11-Oct-2011 [3539x2] | Anyone knows where to find exhaustive lists of invalid UTF-8 encoding ranges? |
I am calculating them by hand, so I might miss some. | |
Andreas 11-Oct-2011 [3541x3] | C0, C1, F5-FF must never occur in UTF-8. |
80-BF are continuation bytes. | |
Is that what you are after? | |
Dockimbel 11-Oct-2011 [3544] | Yes, but I was searching for an exhaustive list of rules. |
Andreas 11-Oct-2011 [3545x2] | RFC3629 has a (non-normative) ABNF, if I remember correctly. |
http://tools.ietf.org/html/rfc3629#section-4s | |
Dockimbel 11-Oct-2011 [3547x3] | Here are the parse rules I came up with so far: https://gist.github.com/1278718 |
I think I am missing some overlong combinations. | |
I am also unsure of the valid range of the 2nd byte in the four-bytes encoding. | |
Andreas 11-Oct-2011 [3550] | one-byte-codepoint: charset [#"^(00)" - #"^(7F)] |
Dockimbel 11-Oct-2011 [3551] | Right, fixing that. |
Andreas 11-Oct-2011 [3552x4] | tail-bytes: charset [#"^(80)" - #"^(BF)] two-byte-codepoint: reduce [charset [#"^(C2)" - #"^(DF)] tail-bytes] |
tail-bytes == cont-byte | |
three-byte-codepoint: reduce [ #"^(E0)" charset [#"^(A0)" - #"^(BF)] cont-byte | charset [#"^(E1)" - #"^(EC)"] 2 cont-byte | #"^(ED)" charset [#"^(80)" - #"^(9F)] cont-byte | charset [#"^(EE)" - #"^(EF)"] 2 cont-byte ] | |
four-byte-codepoint: reduce [ #"^(F0)" charset [#"^(90)" - #"^(BF)] 2 cont-byte | charset [#"^(F1)" - #"^(F3)"] 3 cont-byte | #"^(F4)" charset [#"^(80)" - #"^(8F)] 2 cont-byte ] | |
Dockimbel 11-Oct-2011 [3556x2] | Thanks, I see that everything I need is in http://tools.ietf.org/html/rfc3629#section-4 |
BrianH: what was the CureCode ticket where you've summed up the word! Unicode parsing rules? | |
BrianH 11-Oct-2011 [3558x3] | http://issue.cc/r3/1302for the ASCII range in R3. The R3 parser tends to be excessively forgiving outside the ASCII range, accepting too much, though I haven't done the thorough test. |
You might also consider looking at the source of INVALID-UTF? in R2, which is MIT licensed from R2/Forward. | |
It would still be a good idea to review the Unicode standard to determine which of the characters should be treated as spaces, but that would still be a problem for R3 because all of the delimiters it currently supports are one byte in UTF-8 for efficiency. If other delimiters are supported, R3's parser will be much slower. | |
Dockimbel 12-Oct-2011 [3561] | Thanks. For whitespaces, I have already taken higher Unicode codepoints into account (from this list: http://en.wikipedia.org/wiki/Whitespace_character). |
Andreas 12-Oct-2011 [3562x2] | Completely forgot about INVALID-UTF? :) |
After having a quick glance at it, at least for utf8 it's quite basic and does not take any of the above overlong combinations into account. | |
BrianH 12-Oct-2011 [3564x4] | The policy on overlong combinations was set by R3, where there isn't as much need to flag them. Overlong combinations are a problem in UTF-8 for code that works on the binary encoding directly, instead of translating to Unicode first. The only function in R3 that operates that way is TRANSCODE, so as long as it doesn't choke on overlong combinations there is no problem with them being allowed. It might be good to add a /strict option to INVALID-UTF? though to make it check for them. |
Speaking of which, I don't think anyone has tried overlong combinations with TRANSCODE yet. We should look into that. | |
(I mean, aside from Carl possible doing so internally.) | |
As long as they are interpreted exactly the same as the short encoding of the value, no problems. | |
Andreas 12-Oct-2011 [3568] | (Let's switch to !REBOL3.) |
Kaj 13-Oct-2011 [3569x3] | Implemented GTK table layouts |
For example: | |
table [2 2 5 5 button "X" button "O" button "O" button "X" ] | |
amacleod 18-Oct-2011 [3572] | Kaj, I love what you are doing. Just curious if you looked at QT, it seems to be avail on more platforms - phone wise- which is a major plus... Is it more difficult to impliment? |
Kaj 18-Oct-2011 [3573x12] | Thanks. As it happens, I looked into binding Qt last week |
I never liked either GTK or Qt. The reason I'm binding one anyway is that we want native platform user interfaces for Red. Linux and BSD don't have a native interface, but if you have to appoint one, you have to appoint two: GTK and Qt | |
The reason I chose GTK is that it's written in C, which makes it natural to bind to Red/System. Almost all other open source GUI toolkits, including Qt, are written in C++, which is much more problematic to bind | |
Basically, to bind a C++ library, you have to write two bindings: one from C++ to C, and then one from C to your target language. This is because only C++ knows what C++ objects mean, and C++ claims that its object classes are a program's interface | |
So you can write a binding from Red/System to a C library purely in Red/System, while a C++ binding would also require writing an extra bridge in C++. Even after this initial hurdle, apart from the maintenance, a remaining problem would be that the C++ bridge needs a traditional development environment, so the wonderful abitlity of Red to crosscompile to anything would be negated for a large part. Basically the same problem that REBOL 3 extensions have | |
Intrepid readers will note that one of the libraries I bind, 0MQ, is written in C++. However, the 0MQ designers wisely decided to define the interface in C, so that all languages can bind to it | |
For generic libraries, binding tools exist, such as SWIG and SIP. Unfortunately, they don't solve the problem but only assist a little, and the result is very bloated | |
Since a few years, Qt and KDE use a new tool: Smoke. It's more automated, so it looks like it can generate a C interface without writing C++ yourself. However, the cross-compilation problem still exists. Because the tool is so generic, the bindings it generates are also quite bloated and probably otherwise inefficient. In any case, it's just the first step for a Red binding, because I put abstraction layers over my bindings that are much more REBOL like | |
Another consideration for me is that GTK is more fragmented, but that also makes it more modular than Qt. From the viewpoint of Syllable, it makes it harder to integrate completely, but easier to integrate just some selected pieces, which is what I am after | |
While it's true that Qt is more portable than GTK, I'm not sure it's significant. The only phone platform I know that uses it is Meego, but Nokia has sidetracked that. Samsung's own phone platform in Bada, for example, uses GTK. There's also recent DirectFB support in GTK, while the Qt port to DirectFB is obsolete | |
So I chose GTK to support as the "native" GUI for Linux and BSD. It can also run on several other platforms until we have native support for those | |
I'm not planning to fragment the effort by doing a Qt binding as well, but I did evaluate it, and the decision could change if I would be funded for it | |
amacleod 18-Oct-2011 [3585] | Interesting stuff...thanks |
Endo 19-Oct-2011 [3586] | Thank you for the good explanations Kaj. |
Gabriele 19-Oct-2011 [3587] | TL;DR: the creators of C++ and C++ compilers decided that the world was not complicated enough, so they worked hard to make it more complicated. |
older newer | first last |