• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r4wp

[#Red] Red language group

Jerry
10-Nov-2012
[3489x3]
I write a book for R3 instead of R2 because R3 supports Unicode. 
Without Unicode, R2 is useless in China.
Yes, I've noticed that. By supporting USC-4, Red is better than REBOL. 
But Characters with codepoint > 0xFFFF are rarely used, I think.
I was reading your Unicode.reds earlier, It's really cool to see 
that Red supports UCS-4.
DocKimbel
10-Nov-2012
[3492]
Well, if you look at the astral planes, you'll see things like music 
notes or play cards, I bet they will become quickly as much popular 
as they are useless. ;-)
Jerry
10-Nov-2012
[3493x3]
Many years ago, I found REBOL 2 and liked it a lot, but back then 
REBOL didn't support Unicode, so it was useless in China/Taiwan. 
I wrote e-mail to Carl, but I got no feedback. So I decided to start 
a magazine column in China and Taiwan to introduce REBOL. My idea 
was to make readers love REBOL and felt the same pain (of no unicode 
support). I also kind of encouraged them to write e-mail to RT on 
the Unicode issue.
After a while, Carl said (in somewhere, blog maybe) that he didn't 
know why REBOL had many Chinese users, and they need Unicode. So 
he decided to support Unicode.
Doc, I am glad that Red support Unicode in the first place, so I 
don't have to do the same trick to you. :-)
DocKimbel
10-Nov-2012
[3496]
I remember that Unicode in R3 is mostly thanks to you. :-) No modern 
programming language can miss full Unicode support now, so it's a 
mandatory feature to have, anyway.
Ashley
10-Nov-2012
[3497]
For those of us still dealing mostly in pure ASCII will there be 
a 1-byte per character string alternative?
DocKimbel
10-Nov-2012
[3498x2]
Ashley: Red string! use 1-byte per character by default, so as long 
as you stick to ASCII, it takes the same storage space as C strings. 
As soon as you insert a non ASCII character, the string will automatically 
upgrade to the appropriate format. It's transparent for the Red user. 
Moreover, you'll be able to force string encodings back down to 2-bytes 
or 1-byte when possible.
So, upgrade is automatic, downgrade should be manual.
BrianH
10-Nov-2012
[3500]
0-127 ASCII, or UCS-1 (aka Latin1)?
DocKimbel
10-Nov-2012
[3501]
Latin1
BrianH
10-Nov-2012
[3502]
Cool :)
DocKimbel
10-Nov-2012
[3503x2]
So, I should have said "as soon as you insert a non Latin1 character".
In any case, you can always bypass the whole Unicode layer by reading 
(or converting) strings as binary! values, and then processing them 
the way you want (this is not recommended though, but some users 
might need it).
BrianH
10-Nov-2012
[3505]
For instance, users converting character encodings to Unicode, encodings 
like UTF8 or national encodings.
DocKimbel
10-Nov-2012
[3506]
Red should provide an UTF-8 codec. For national encodings, we would 
probably proceed by offering  on-demand online codecs for the most 
used ones. That could be a shared resource with R3.
BrianH
10-Nov-2012
[3507x2]
I was talking about the codecs. They need to be written too, right? 
:)
Sorry if I missed the answer to this, but are you going to be doing 
a UTF8 binary parser for Red's source the way that R3 does for its 
source? Rather than a Unicode string parser, which processes the 
source after it's been through a codec?
DocKimbel
10-Nov-2012
[3509x2]
Yes, that's the "runtime lexer" in Red's roadmap. It is required 
for implementing LOAD (or TRANSCODE if you prefer).
BTW, we already have a UTF-8 binary parser in the Red compiler.
BrianH
10-Nov-2012
[3511]
I do prefer, actually. LOAD being mezzanine and calling a separate 
parser lets you do a lot of nice tricks. The "mezzanine" might be 
native in Red, but the separation of concerns is still a value. YMMV 
of course.
Andreas
10-Nov-2012
[3512]
Agreed. TRANSCODE is a rather unelegant name, though :)
BrianH
10-Nov-2012
[3513]
Agreed. The weird set of options turned out to be essential though. 
Every combination of options is used in LOAD at different points. 
We even need a /part option like that of DECOMPRESS/part, for the 
same reason.
DocKimbel
10-Nov-2012
[3514]
Agreed too for the separation and for the bad sounding name. ;-)
BrianH
10-Nov-2012
[3515]
Worse than being bad-sounding, it's a really general term being applied 
to a really specific operation. That name could have been used for 
a more general codec-based transformation process.
Jerry
15-Nov-2012
[3516]
Should Red/System series be one-based? There are some discussion 
on it. Why not set a #pragma for it, so programmers can set it themselves?
Pekr
15-Nov-2012
[3517]
I would say one based, but then other ppl reappear, which will claim 
R/S is not for me, but for C coders, so they will imo align it to 
zero :-)
Kaj
15-Nov-2012
[3518x2]
I've become a low level programmer again, and I still want it to 
be one based
A pragma would be the worst of both worlds: not making a decision, 
like most other software out there
DocKimbel
15-Nov-2012
[3520]
I agree that we should have only _one_ convention, else it will quickly 
become a nightmare when having to integrate 3rd-party code. We need 
to find some objective reasons for choosing it. 


For Red, I'm inclined to continue on the one-based convention that 
worked pretty well in R2 for many years (at least for me). I'm not 
very fond of the change in R3, introducing 0-based convention implicitly, 
it solves one problem (iterating over 0 index...I don't remember 
ever doing that), but introduces new ones (negative indexes point 
now to an IMHO, counter-intuitive position which will most probably 
lead to programming errors). For now, I prefer to stick to R2 way, 
until  we find a better solution (feel free to propose some on related 
github tickets or here). For example, we could decide to ban indexes 
<= 0 (not my favorite personal option though, but would solve simply 
the problem).


For Red/System, a 0-based convention might make more sense, but it 
would push us into the R3 issue I've mentioned above wrt indexes 
<= 0. Also, as a dialect of Red, it can use whatever convention best 
fits its purpose, but OTOH, having the same convention as Red would 
help. So, I'm really undecided for Red/System.


I think the whole issue boils down to decide about PICK behavior 
with <= 0 indexes, everything else should be able to fit in easily 
once that preliminary question is solved. It would be helpful if 
someone could put up everything related to this topic on a wiki page 
with all arguments sorted (there's a lot of them in R3 group posted 
a few weeks ago).
Kaj
15-Nov-2012
[3521x4]
Agreed
Off-by-one errors are everywhere in programming. Choosing between 
one-based and zero-based indexing shifts them to slightly different 
places, but they will still be there. As you said, I seldomly encounter 
a situation where there would be a strong preference for indexes 
to be zero based
One-based is human friendly, while zero-based is usually more machine 
friendly, so I think REBOL made the right choice
By extension, I would like Red/System to be as close to Red as possible, 
so issues can be explained firmly and just once, and it's easy to 
morph Red code into Red/System code when you decide you need the 
performance
Pekr
15-Nov-2012
[3525]
Agreed with last Kaj's remark ....
Endo
15-Nov-2012
[3526]
+1
Andreas
15-Nov-2012
[3527x3]
Being a human myself, I don't find indices-as-ordinals ("one-based") 
particularly human friendly.
For Red/System, a 0-based convention might make more sense, but it 
would push us into the R3 issue I've mentioned above wrt indexes 
<= 0.


With indices-as-offsets ("0-based"), there really is no issue with 
indices <= 0.
As for R3, it did not really introduce "0-based convention implicitly", 
it still is firmly "1-based" in as far as the first element in a 
series can be accessed using index 1.


When you want indices-as-ordinals, you really need to decide: (a) 
is the ordinal "zeroth" meaningful, and if so, what it means; (b) 
are negative indices meaningful, and if so, what they mean.


R3 went with the choices of (a) having meaningful zeroth, defined 
as "the item in a series before the first item", and (b) allowing 
negative indices, having index -1 as the immediate predecessor of 
index 0.


R2 went with the choice of (a) not having a meaningful zeroth, but 
instead of erroring out, functions (pick) & syntax (paths) accepting 
indices are lenient: passing an index of 0 always returns NONE. For 
(b), R2 allows negative indices and defines -1 as the immediate predecessor 
of 1.
DocKimbel
15-Nov-2012
[3530]
Andreas: thanks for the good sum up.


R3: agreed that index 1 is still the first element in a series, but 
index 0 is allowed and there is this ticket #613 that clearly aims 
at introducing 0-based indexing in R3...so my guessing was these 
different changes or wishes were inter-related. http://curecode.org/rebol3/ticket.rsp?id=613


R2: I would have really prefered that index 0 raises an error than 
returning none.
Andreas
15-Nov-2012
[3531]
If you wish to allow index computation for series not positioned 
at the head, allowing index 0 is actually quite sensible, unless 
you want to make index computation particularly error prone.
DocKimbel
15-Nov-2012
[3532]
There is also the option proposed by Gabriele to consider: an ordinal! 
datatype (...-2th, -1th, 1st, 2nd, 3rd, 4th,...).


It could solve the whole thing, but I see two cons about this option: 

1) negative ordinals look odd, I don't even know if they can be read 
in engllish?

2) code would be more verbose as it will need conversions (to/from 
ordinals) in many places.


In addition to the pros, making  a difference between an integer 
and an ordinal might help improve code readability.
Andreas
15-Nov-2012
[3533]
The problem with no meaningful index 0 is that potentially meaningful 
index values are no longer isomorphic to integers. And as REBOL has 
no actual datatype for indices, all we can compute with are integers 
while relying on a correspondence of those integers to indices.


If you only ever compute indices for series positioned at the head, 
you get a nice correspondence of integers to indices, because meaningful 
indices for this series correspond to the positive integers.


But if you also want to compute indices for series positioned elsewhere, 
this nice integer-to-index correspondence breaks down as you suddenly 
have an undefined "gap" for the integer 0, whereas negative integers 
and positive integers are fine.
Kaj
15-Nov-2012
[3534]
Yes, ordinal! would fix that, or the index! I proposed earlier
Andreas
15-Nov-2012
[3535]
I also think that an ordinal! (or index!) datatype may be an intriguing 
possiblity to get the best of both worlds.
DocKimbel
15-Nov-2012
[3536]
Andreas: do you have a short code example involving index 0 in computation? 
I don't remember ever having issues with index 0 and I use series 
with offsets a lot! Though, Ladislav claims he and Carl did encounter 
such issue at least once...the use cases for this issues remain a 
mystery well kept by Ladislav. ;-)
Andreas
15-Nov-2012
[3537x2]
I personally avoid computing with non-head positioned series wherever 
possible.
So sorry, I don't have a particular example at hand, but I can easily 
imagine it coming up with e.g. forall or forskip and trying to access 
previous values in an iteration.