• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r4wp

[#Red] Red language group

DocKimbel
28-Aug-2012
[1513]
The callback was never replaced, just inferred.
Jerry
1-Sep-2012
[1514]
Doc, will you extract all the information string of Red source code 
into a file, so I can translate them into Chinese without modifying 
the source code.
DocKimbel
1-Sep-2012
[1515]
You mean the docstring for functions (which we don't have yet), or 
all the code comments?
Jerry
1-Sep-2012
[1516]
Doc, I mean the docstring, error messages ...
Pekr
4-Sep-2012
[1517x2]
New Twitter message from Doc: "Making good progress on Red language 
layer, got a stable core compiler now and minimal datatypes set"
Go Doc, Go! :-)
Jerry
4-Sep-2012
[1519]
Great.
Henrik
4-Sep-2012
[1520]
sounds great
Jerry
4-Sep-2012
[1521]
V5!!! ... Which means "Great! Wonderful!" in the Modern Chinese Cyber 
World.
DocKimbel
4-Sep-2012
[1522x3]
There are still a lot of details to work on, but the core part is 
there. The bootstrapping does had several complications (like literal 
series handling) that will vanish once we get Red compiler rewritten 
in Red.
Thanks Jerry! :-)
*had => add
Henrik
4-Sep-2012
[1525]
Doc, this is where we need that screenshot, like the one Linus Thorvalds 
took of the Linux kernel, when he showed it was able to switch tasks 
back in 1991. :-)
DocKimbel
4-Sep-2012
[1526]
:-) Never seen that one, got an URL?
Henrik
4-Sep-2012
[1527x2]
I was looking for it... but now I can't find it.
Maybe there is no original screenshot and I remember it wrong, but 
the kernel exists and you can test it. It's Linux 0.00 and the major 
feature is switching between two tasks that print A and B in the 
console:

http://gunkies.org/wiki/Linux_0.00
DocKimbel
4-Sep-2012
[1529]
I should be able to make a "hello word" script in Red in a few days. 
I still have to make some design decision wrt Unicode internal handling, 
that's really a complex part.
Henrik
4-Sep-2012
[1530]
I'm wondering if this is "easy"? Is the development path laid out 
or do you really carefully need to think about each step?
DocKimbel
4-Sep-2012
[1531]
No easy way AFAICT, even if the big picture is there, you need to 
think and take decisions for a lot of details every day, even if 
you try to isolate parts, you always end up with some conflicts to 
solve both at the design and implementation levels.
Henrik
4-Sep-2012
[1532]
ok
DocKimbel
4-Sep-2012
[1533]
Thanks for the link...if I take Linus' code and add it to Red/System, 
I should be able to output a VM image directly from a Red/System 
program, no? ;-)
Pekr
4-Sep-2012
[1534]
Doc - what I noticed (and please don't take it personally) is, that 
sometimes you miss on how R3 was designed and solved some areas. 
Maybe you could talk to BrianH, who knows lots of things about what 
was/is good about R3, so that you can take similar path? E.g. Unicode 
support took Carl 2-3 months ...
DocKimbel
4-Sep-2012
[1535x2]
For Red, the bootstrap stage is really costly, I'm really impatient 
of getting rid of the REBOL part and only have Red code.
Pekr: thanks for the advice. :-) I haven't followed very closely 
the developpement of R3 nor I have ever wrote R3 code, so I'm not 
aware of all the reasons for some design decisions. That's why I 
ask when I need to. AFAIU, R3 was designed to solve R2 issues. I'm 
building Red from scratch, so I don't have legacy issues (so far) 
to deal with, I have more freedom than Carl with R3 and I intend 
to use it. They are some parts of R2/R3 design that fit well my plan, 
so I use them as inspiration, but there are other parts (especially 
in R3), that I am not fan of. Also, do I need to remind you that 
Red is compiled while R3 is interpreted? These are two different 
models which require different trade-offs.


The difficulties I have to deal with in Red (both design and construction 
process) are inherent part of any non-trivial work to build something 
new and that's my role to solve and overcome them. The best way others 
can help me are by pointing out errors or inconsistencies both in 
the design and implementation.


Wrt Unicode support, I should be able to say in a few days how long 
it will take to support it. I doubt I  need as much as 2-3 months, 
but anyway, nobody but Carl knows what he had put in, and exactly 
how long it took him. ;-)
Pekr
4-Sep-2012
[1537]
Thanks for clarification :-D
Jerry
4-Sep-2012
[1538]
I am glad that you are doing the Unicode part now. Better support 
it sooner than later. Back to 2008, I was one of the three Unicode 
testers for Carl, and I found many bugs and reported them back to 
Carl before he released it to the public.
BrianH
4-Sep-2012
[1539x4]
There is a bit that is worth learning from R3's Unicode transition 
that would help Red.


First, make sure that strings are logically series of codepoints. 
Don't expose the internal structure of strings to code that uses 
them. Different underlying platforms do their Unicode APIs using 
different formats, so on different platforms you might need to implement 
strings differently. You don't want these differences affecting the 
Red code that uses these strings.


Don't have direct equivalence between binary! and string! - require 
conversion between them. No AS-STRING and AS-BINARY functions. Don't 
export the underlying binary data. If you do, the code that uses 
strings would come to depend on a particular underlying format, and 
would then break on platforms where the underlying format is different. 
Also, if you provide access to the underlying binary data to Red 
code, you have to assume that the format of that data can be corrupted 
at any moment, so you'll have to add a lot of verification code, 
and your compiler won't be able to get rid of it.


Work in codepoints, not characters. Unicode characters are complicated 
and can involve multiple codepoints, or not, but until you display 
it none of that matters.


R3 uses fixed-length encodings of strings internally in order to 
speed things up, but that can cause problems when running on underlying 
platforms that use variable-length encodings in their APIs, like 
Linux (UTF-8) and Windows/Java/.NET/OSX? (UTF-16). This makes sense 
for R3 because the underlying code is compiled, but the outer code 
is not, and there's no way to break that barrier. With Red the string 
API could be logical, with the optimizer making the distinction go 
away, so you might be able to get away with using variable-length 
encodings internally if that makes sense to you. Length and index 
would be slower, but there'd be less overhead when calling external 
API functions, so make the tradeoff that works best for you.
If there are parts of R2 or R3 that you like or don't like, don't 
assume that they are part of the design. There's a lot of stuff in 
there that doesn't match the design, is buggy or unfinished. Also, 
for R3, don't assume that only Carl knows the design. He worked with 
others, discussed his design with the other contributors. There's 
some stuff which only he can answer though, and some design decisions 
that weren't resolved, let alone implemented.
The concurrency model was not fully designed, for instance, and almost 
completely not implemented.
However, the part of the concurrency model that was designed so far 
affected the design and implementation of the system model and module 
system. You'd be surprised how much the module system was affected 
by the system, binding and interpretation model of R3; very little 
of its design and implementation was arbitrary. You might be able 
to get the syntax the same for Red's module system, but given the 
different system/binding/execution model there wouldn't be much of 
the implementation in common.
sqlab
4-Sep-2012
[1543]
I am for sure no expert regarding unicode, but as red is a compiler 
and open source, why not not add flags that the user has to choose 
which unicode/string support he wants; either flexibility, but of 
cost of speed or no unicode support, then he  has to do the hard 
work by himself
BrianH
4-Sep-2012
[1544x2]
One hypothetical advantage you have with Red is that you can make 
the logical behavior fairly high-level and have the compiler/optimizer 
get rid of that at runtime. REBOL, being interpreted, is effectively 
a lower-level language requiring hand optimization, the kind of hand 
optimization that you'd want to prohibit in Red because it would 
interfere with the machine optimization. This means that, for strings 
at least, it would make sense to have the logical model have a lot 
of the same constraints as that of R3 (because those constraints 
were inherent in the design of Unicode), but make the compiler aware 
of the model so it can translate things to a much lower level. If 
you break the logical model though, you remove the power the compiler 
has to optimize things.
sqlab, it would make sense to have the user choose the underlying 
model if you are doing Red on bare metal and implementing everything 
yourself, or running on a system with no Unicode support at all. 
If you are running a Red program on an existing system with Unicode 
support, the choice of which model is best has already been made 
for you. In those cases choosing the best underlying model would 
best be made by the Red porter, not the end developer.
sqlab
4-Sep-2012
[1546]
but that means, that Red has to support all unicode models on all 
the systems, it can be compiled for.
BrianH
4-Sep-2012
[1547x2]
That's not as hard as it sounds. There are only 3 API models in wide 
use: UTF-16, UTF-8, and no Unicode support at all. A given port of 
Red would only have to support one of those on a given platform.
Red user code would only need to support the codepoint-series model; 
Red would translate that into the system's preferred underlying model. 
More encodings would need to be supported for conversion during I/O, 
of course, but not for API or internal use.
DocKimbel
4-Sep-2012
[1549]
So far, my short-list of encodings to support are UTF-8 and UTF-16LE. 
UTF-32 might be needed at some point in the future, but for now, 
I'm not aware of any system that uses it?


The Unicode standard by itself is not the problem (having just one 
encoding would have helped, though). The issue lies in different 
OSes supporting different encodings, so it makes the choice for an 
internal x-platform encoding hard. It's a matter of Red internal 
trade-offs, so I need to study the possible internal resources usage 
for each one and decide which one is the more appropriate. So far, 
I was inclined to support both UTF-8 and UTF-16LE fully, but I'm 
not sure yet that's the best choice. To avoid surprizing users with 
inconsistent string operation performances, I thought to give users 
explicit control over string format, if they need such control (by 
default, Red would handle all automatically internally). For example, 
on Windows::

    s: "hello"		;-- UTF-8 literal string

    print s		;-- string converted to UCS2 for printing through win32 
    API
    write %file s	;-- string converted back to UTF-8

    set-modes s 'encoding 'UTF-16 ;-- user deciding on format
or
    s/encoding: 'UTF-16

    print length? s	;-- Length? then runs in O(1), no surprize.



Supporting ANSI as internal encoding seems useless, being able to 
just export/import it should suffice.

BTW, Brian, IIRC, OS X relies on UTF-8 internally not UTF-16.
BrianH
4-Sep-2012
[1550]
Thanks, I don't know much about OSX's Unicode support.
DocKimbel
4-Sep-2012
[1551]
set-modes s 'encoding 'UTF-16
should rather be:
    set-modes s [encoding: UTF-16]
BrianH
4-Sep-2012
[1552x4]
Be sure to not forget the difference between UTF-16 (variable-length 
encoding of all of Unicode) and UCS2 (fixed-length encoding of a 
subset of Unicode). Windows, Java and .NET support UTF-16 (barring 
the occasional buggy code that assumes fixed-length encoding). R3's 
current underlying implementation is UCS2, with its character set 
limitations, but its logical model is codepoint-series.
IIRC Python 3 uses UCS4 internally for its Unicode strings, with 
all of the overhead that implies. UCS4 and UTF-32 are the same thing, 
both fixed-length.
If you support different internal string encodings on a given platform, 
be sure to not give logical access to the underlying binary data 
to Red code. The get/set-modes model is good for that kind of thing. 
If the end developer knows that the string will be grabbed from something 
that provides UTF-8 and passed along to something that takes UTF-8, 
they might be better off choosing UTF-8 as an underlying encoding. 
However, that should just be a mode - their interaction with the 
string should follow the codepoint model. If the end developer will 
be working directly with encoded data, they should be working with 
binary! values.
Btw, in this code above:
    s/encoding: 'UTF-16
    print length? s	;-- Length? then runs in O(1), no surprize.


Length is not O(1) for UTF-16, it's O(n). Length is only O(1) for 
the fixed-length encodings.
DocKimbel
4-Sep-2012
[1556x2]
Since Python 3.3, things have changed: http://www.python.org/dev/peps/pep-0393/
Brian: right, my claim is valid for BMP characters only.
BrianH
4-Sep-2012
[1558]
Ah, but length is even O(n) for BMP characters in a UTF-16 string, 
because figuring out that there are only BMP characters in there 
is an O(n) operation. To be O(1) you'd have to mark some flag in 
the string when you add the characters in there in the first place.
DocKimbel
4-Sep-2012
[1559]
Ok, if you really want to be nitpicking, replace UTF-16 with UCS-2. 
;-)
BrianH
4-Sep-2012
[1560x3]
If you are ensuring that only BMP characters are in there then you 
have UCS2, not UTF-16 :)
Python 3.3 seems to finally be following the R3 model, good for them. 
Even better for them because it's actually implemented.
Don't worry, I'm only nitpicking to make things better. There's a 
lot of buggy code out there that assumes UTF-16 is UCS2, so we're 
better off making that distinction right away :)