World: r4wp
[#Red] Red language group
older newer | first last |
Jerry 10-Nov-2012 [3489x3] | I write a book for R3 instead of R2 because R3 supports Unicode. Without Unicode, R2 is useless in China. |
Yes, I've noticed that. By supporting USC-4, Red is better than REBOL. But Characters with codepoint > 0xFFFF are rarely used, I think. | |
I was reading your Unicode.reds earlier, It's really cool to see that Red supports UCS-4. | |
DocKimbel 10-Nov-2012 [3492] | Well, if you look at the astral planes, you'll see things like music notes or play cards, I bet they will become quickly as much popular as they are useless. ;-) |
Jerry 10-Nov-2012 [3493x3] | Many years ago, I found REBOL 2 and liked it a lot, but back then REBOL didn't support Unicode, so it was useless in China/Taiwan. I wrote e-mail to Carl, but I got no feedback. So I decided to start a magazine column in China and Taiwan to introduce REBOL. My idea was to make readers love REBOL and felt the same pain (of no unicode support). I also kind of encouraged them to write e-mail to RT on the Unicode issue. |
After a while, Carl said (in somewhere, blog maybe) that he didn't know why REBOL had many Chinese users, and they need Unicode. So he decided to support Unicode. | |
Doc, I am glad that Red support Unicode in the first place, so I don't have to do the same trick to you. :-) | |
DocKimbel 10-Nov-2012 [3496] | I remember that Unicode in R3 is mostly thanks to you. :-) No modern programming language can miss full Unicode support now, so it's a mandatory feature to have, anyway. |
Ashley 10-Nov-2012 [3497] | For those of us still dealing mostly in pure ASCII will there be a 1-byte per character string alternative? |
DocKimbel 10-Nov-2012 [3498x2] | Ashley: Red string! use 1-byte per character by default, so as long as you stick to ASCII, it takes the same storage space as C strings. As soon as you insert a non ASCII character, the string will automatically upgrade to the appropriate format. It's transparent for the Red user. Moreover, you'll be able to force string encodings back down to 2-bytes or 1-byte when possible. |
So, upgrade is automatic, downgrade should be manual. | |
BrianH 10-Nov-2012 [3500] | 0-127 ASCII, or UCS-1 (aka Latin1)? |
DocKimbel 10-Nov-2012 [3501] | Latin1 |
BrianH 10-Nov-2012 [3502] | Cool :) |
DocKimbel 10-Nov-2012 [3503x2] | So, I should have said "as soon as you insert a non Latin1 character". |
In any case, you can always bypass the whole Unicode layer by reading (or converting) strings as binary! values, and then processing them the way you want (this is not recommended though, but some users might need it). | |
BrianH 10-Nov-2012 [3505] | For instance, users converting character encodings to Unicode, encodings like UTF8 or national encodings. |
DocKimbel 10-Nov-2012 [3506] | Red should provide an UTF-8 codec. For national encodings, we would probably proceed by offering on-demand online codecs for the most used ones. That could be a shared resource with R3. |
BrianH 10-Nov-2012 [3507x2] | I was talking about the codecs. They need to be written too, right? :) |
Sorry if I missed the answer to this, but are you going to be doing a UTF8 binary parser for Red's source the way that R3 does for its source? Rather than a Unicode string parser, which processes the source after it's been through a codec? | |
DocKimbel 10-Nov-2012 [3509x2] | Yes, that's the "runtime lexer" in Red's roadmap. It is required for implementing LOAD (or TRANSCODE if you prefer). |
BTW, we already have a UTF-8 binary parser in the Red compiler. | |
BrianH 10-Nov-2012 [3511] | I do prefer, actually. LOAD being mezzanine and calling a separate parser lets you do a lot of nice tricks. The "mezzanine" might be native in Red, but the separation of concerns is still a value. YMMV of course. |
Andreas 10-Nov-2012 [3512] | Agreed. TRANSCODE is a rather unelegant name, though :) |
BrianH 10-Nov-2012 [3513] | Agreed. The weird set of options turned out to be essential though. Every combination of options is used in LOAD at different points. We even need a /part option like that of DECOMPRESS/part, for the same reason. |
DocKimbel 10-Nov-2012 [3514] | Agreed too for the separation and for the bad sounding name. ;-) |
BrianH 10-Nov-2012 [3515] | Worse than being bad-sounding, it's a really general term being applied to a really specific operation. That name could have been used for a more general codec-based transformation process. |
Jerry 15-Nov-2012 [3516] | Should Red/System series be one-based? There are some discussion on it. Why not set a #pragma for it, so programmers can set it themselves? |
Pekr 15-Nov-2012 [3517] | I would say one based, but then other ppl reappear, which will claim R/S is not for me, but for C coders, so they will imo align it to zero :-) |
Kaj 15-Nov-2012 [3518x2] | I've become a low level programmer again, and I still want it to be one based |
A pragma would be the worst of both worlds: not making a decision, like most other software out there | |
DocKimbel 15-Nov-2012 [3520] | I agree that we should have only _one_ convention, else it will quickly become a nightmare when having to integrate 3rd-party code. We need to find some objective reasons for choosing it. For Red, I'm inclined to continue on the one-based convention that worked pretty well in R2 for many years (at least for me). I'm not very fond of the change in R3, introducing 0-based convention implicitly, it solves one problem (iterating over 0 index...I don't remember ever doing that), but introduces new ones (negative indexes point now to an IMHO, counter-intuitive position which will most probably lead to programming errors). For now, I prefer to stick to R2 way, until we find a better solution (feel free to propose some on related github tickets or here). For example, we could decide to ban indexes <= 0 (not my favorite personal option though, but would solve simply the problem). For Red/System, a 0-based convention might make more sense, but it would push us into the R3 issue I've mentioned above wrt indexes <= 0. Also, as a dialect of Red, it can use whatever convention best fits its purpose, but OTOH, having the same convention as Red would help. So, I'm really undecided for Red/System. I think the whole issue boils down to decide about PICK behavior with <= 0 indexes, everything else should be able to fit in easily once that preliminary question is solved. It would be helpful if someone could put up everything related to this topic on a wiki page with all arguments sorted (there's a lot of them in R3 group posted a few weeks ago). |
Kaj 15-Nov-2012 [3521x4] | Agreed |
Off-by-one errors are everywhere in programming. Choosing between one-based and zero-based indexing shifts them to slightly different places, but they will still be there. As you said, I seldomly encounter a situation where there would be a strong preference for indexes to be zero based | |
One-based is human friendly, while zero-based is usually more machine friendly, so I think REBOL made the right choice | |
By extension, I would like Red/System to be as close to Red as possible, so issues can be explained firmly and just once, and it's easy to morph Red code into Red/System code when you decide you need the performance | |
Pekr 15-Nov-2012 [3525] | Agreed with last Kaj's remark .... |
Endo 15-Nov-2012 [3526] | +1 |
Andreas 15-Nov-2012 [3527x3] | Being a human myself, I don't find indices-as-ordinals ("one-based") particularly human friendly. |
For Red/System, a 0-based convention might make more sense, but it would push us into the R3 issue I've mentioned above wrt indexes <= 0. With indices-as-offsets ("0-based"), there really is no issue with indices <= 0. | |
As for R3, it did not really introduce "0-based convention implicitly", it still is firmly "1-based" in as far as the first element in a series can be accessed using index 1. When you want indices-as-ordinals, you really need to decide: (a) is the ordinal "zeroth" meaningful, and if so, what it means; (b) are negative indices meaningful, and if so, what they mean. R3 went with the choices of (a) having meaningful zeroth, defined as "the item in a series before the first item", and (b) allowing negative indices, having index -1 as the immediate predecessor of index 0. R2 went with the choice of (a) not having a meaningful zeroth, but instead of erroring out, functions (pick) & syntax (paths) accepting indices are lenient: passing an index of 0 always returns NONE. For (b), R2 allows negative indices and defines -1 as the immediate predecessor of 1. | |
DocKimbel 15-Nov-2012 [3530] | Andreas: thanks for the good sum up. R3: agreed that index 1 is still the first element in a series, but index 0 is allowed and there is this ticket #613 that clearly aims at introducing 0-based indexing in R3...so my guessing was these different changes or wishes were inter-related. http://curecode.org/rebol3/ticket.rsp?id=613 R2: I would have really prefered that index 0 raises an error than returning none. |
Andreas 15-Nov-2012 [3531] | If you wish to allow index computation for series not positioned at the head, allowing index 0 is actually quite sensible, unless you want to make index computation particularly error prone. |
DocKimbel 15-Nov-2012 [3532] | There is also the option proposed by Gabriele to consider: an ordinal! datatype (...-2th, -1th, 1st, 2nd, 3rd, 4th,...). It could solve the whole thing, but I see two cons about this option: 1) negative ordinals look odd, I don't even know if they can be read in engllish? 2) code would be more verbose as it will need conversions (to/from ordinals) in many places. In addition to the pros, making a difference between an integer and an ordinal might help improve code readability. |
Andreas 15-Nov-2012 [3533] | The problem with no meaningful index 0 is that potentially meaningful index values are no longer isomorphic to integers. And as REBOL has no actual datatype for indices, all we can compute with are integers while relying on a correspondence of those integers to indices. If you only ever compute indices for series positioned at the head, you get a nice correspondence of integers to indices, because meaningful indices for this series correspond to the positive integers. But if you also want to compute indices for series positioned elsewhere, this nice integer-to-index correspondence breaks down as you suddenly have an undefined "gap" for the integer 0, whereas negative integers and positive integers are fine. |
Kaj 15-Nov-2012 [3534] | Yes, ordinal! would fix that, or the index! I proposed earlier |
Andreas 15-Nov-2012 [3535] | I also think that an ordinal! (or index!) datatype may be an intriguing possiblity to get the best of both worlds. |
DocKimbel 15-Nov-2012 [3536] | Andreas: do you have a short code example involving index 0 in computation? I don't remember ever having issues with index 0 and I use series with offsets a lot! Though, Ladislav claims he and Carl did encounter such issue at least once...the use cases for this issues remain a mystery well kept by Ladislav. ;-) |
Andreas 15-Nov-2012 [3537x2] | I personally avoid computing with non-head positioned series wherever possible. |
So sorry, I don't have a particular example at hand, but I can easily imagine it coming up with e.g. forall or forskip and trying to access previous values in an iteration. | |
older newer | first last |