World: r4wp
[#Red] Red language group
older newer | first last |
Gregg 7-Dec-2012 [4636] | I like having the numbers in binary! values, but not as much for this. My brain says "this is a binary in base 16 notation", but for hex or binary literals, I want to think of the words 'hex and 'binary, rather than "this is a base-16 number, which means it's in hex format". I think I looked for alternate notations a long time ago. Have to see if I can find my notes. |
DocKimbel 7-Dec-2012 [4637] | I have found an issue with word! value casing in Red. The Red/System code generated for: print 'a = 'A is: stack/mark-native ~print stack/mark-native ~strict-equal? word/push ~a word/push ~A natives/strict-equal?* stack/unwind natives/print* stack/unwind The problem is that Red/System is case-insensitive, so ~a and ~A are the same variable. So, no way to make it work like that. I see two options for solving it: 1) Make Red/System case-sensitive. 2) Deep encode each Red generated symbol to distinguish lower and uppercases. Solution 2) works, but it makes symbol decoration operation very costly (each symbol letter is prefixed with a sigil for lowercases and another one for uppercases). The example above becomes: stack/mark-native ~_p_r_i_n_t stack/mark-native ~_s_t_r_i_c_t_-_e_q_u_a_l_? word/push ~_a word/push ~-A natives/strict-equal?* stack/unwind natives/print* stack/unwind So, it is not nice, it doubles every Red symbol size that is handled by Red/System and slows down Red compilation by 25%. So, my questions are: a) Does anyone see another cheaper solution to this problem? b) In case of option 1), do you have anything against making Red/System identifiers case-sensitive? |
Kaj 7-Dec-2012 [4638] | Hm, I like that Red/System is case-insensitive like REBOL, so I would consider it an offer to have to let go of that |
DocKimbel 7-Dec-2012 [4639x3] | Hmm, actually, another option should be possible, generating a unique new symbol for same words that have different casing. I will test it tomorrow. Anyway, if you have ideas/remarks about this, let me know. |
Anyway, I don't think we use different casing for identifiers in Red/System. Even in REBOL, I don't remember ever using same words with different casing in the same app. | |
I would like to fix this issue and make words comparison operators work for the new release, so I'll postpone the release for tomorrow. | |
Gregg 7-Dec-2012 [4642x3] | Do you know how REBOL handles it? I prefer case-insensitive in general, but doubling the size of identifiers seems bad, even if hidden from us for the most part. |
Case-sensitivity could trip up a lot of REBOLers. I know this is Red/System, but still. You may also find that people treat it as a feature and start giving things names that differ only in case, as happens in C. | |
What are the biggest downsides to having Red/System remain case-insensitive? That is, what does case sensitivity buy us? | |
Kaj 7-Dec-2012 [4645x4] | In REBOL, 'a and 'A are aliases of the same symbol. Red/System converts them to their integer identifier, right? I'd say you need different identifiers for aliases somehow to implement the REBOL semantics of distinguishing equal? and strict-equal? |
That is, identifiers need two levels: the first level for identifying the symbol, and the second level for distinguishing aliases | |
The most space efficient encoding I can come up with would be something like ~a-1 for 'a and ~A-2 for 'A. That would be cheap to evaluate for strict-equal? but expensive for equal? | |
A faster encoding would be to reserve a part of the integer identifier for the alias number, for example one byte. That would reduce the number of different symbols to 2^24 and the maximum number of aliases for one symbol to 256. That would only allow a word up to 8 characters to have all its aliases, but it would be cheap to evaluate for both strict-equal? and equal? | |
DocKimbel 8-Dec-2012 [4649x5] | In REBOL, 'a and 'A are aliases of the same symbol. Red/System converts them to their integer identifier, right? Symbols have two representations in Red compiler, one is at runtime (like in REBOL), the other is a compile-time, in the form of Red/System variables. In a very early version of the compiler, I was using integers (indexes in symbol table) instead of variables, but quickly realizef that it was obfuscating the generated Red/System code a lot, making it difficult to debug. Also, the integer approach had an additional runtime cost at it required to make an array access in order to retrieve the symbol value. Currently, the Red/System ~<name> variables directly point to a word! value version, instead of a symbol! for simplicity and efficiency. |
I have implemented a compile-time aliasing system for same words but different casing. It works fine so far and is cheap compared to other options (it requires a conversion table (symbol->alias) to be maintained during the compilation). | |
Aliases are already implemented in the symbol! type. Basically a word! relies on a symbol ID, which is an entry in the symbol table. Each entries in this table is a symbol! value that references the internal Red string! value and a possible alias ID (which is just another symbol ID). Now, I just need to add alias handling in the equal? and strict-equal? natives when applied on words to make it work correctly. | |
What are the biggest downsides to having Red/System remain case-insensitive? That is, what does case sensitivity buy us? Good question. I think it doesn't buy us anything nor does it remove us any useful feature. Actually, I think that as long as you are consistent in the way you name your identifiers (variables, functions, contexts,...), you are case-neutral. So, having Red/System case-sensitive wouldn't change anything for me and I guess it would be the same for others. Anyway, I prefer to keep it case-insensitive for now, for the sake of consistency with Red, unless I really need to change it. | |
Ok, now equality comparison operators work on all word datatypes. | |
Gregg 8-Dec-2012 [4654] | Thanks Doc. This is good information to put in a doc somewhere, even if just as a reminder to formally doc it later. |
BrianH 8-Dec-2012 [4655x2] | Why would = translate to strict-equal? - shouldn't that be == instead? |
This is one area where copying R3 as it is now would be a bad idea though. See http://issue.cc/r3/1834for details. | |
DocKimbel 8-Dec-2012 [4657x2] | Brian: wrt '=, it's a typo, it should be ==. |
I haven't implemented EQUIV? yet, I'll look at it when we'll have a complete IEEE-754 support (we are missing INFs and NaN handling in Red/System). | |
Marco 8-Dec-2012 [4659] | About hex notation etc (I like case insensitiveness for numbers): 0&a1B 0%10110 or 0b10110 0ha1B |
DocKimbel 8-Dec-2012 [4660x3] | 0%... prefix will clash with percent! datatype literal form. |
The two last (0b... and 0h) do not read easily IMHO, especially if lowercases are allowed. | |
Anyway, having a prefix rather than a suffix is a possible option. | |
Steeve 9-Dec-2012 [4663] | How do one know a rebol function is supported or not ? I tried a simple FOR loop, but no result and the compiler is quiet. |
DocKimbel 9-Dec-2012 [4664x4] | How do one know a rebol function is supported or not ? Currently, only by looking in the source code. The compiler is lack a lot of checks, so you need to get your Red code right for now. |
lacking | |
The source code should be easily parse-able, so the list of functions, native, actions, ops could be extracted and pretty-printed as a web page. IIRC, someone tried to make such script but I didn't see any result yet. | |
New features added today worth mentioning: - comparison operators (=, ==, <>, <, <=, >=, >) support extended to all datatypes. - FIND action added (supports block! only for now, /match not implemented, /only always on) | |
Gregg 9-Dec-2012 [4668] | Excellent news Doc. |
Kaj 9-Dec-2012 [4669] | Jerry wanted to publish ongoing feature stats |
Arnold 9-Dec-2012 [4670] | Yes I wanted to give it a try for the doc scripts. But parse is not my expertise, and at the moment I am short in time as I can make extra hours at work. So everybody step in please and publish your baby-doc-scripts so we can all contribute little bits. |
Kaj 9-Dec-2012 [4671] | Working to fix that COBOL code for 21-12-2012 to prevent the end of the world, eh? ;-) |
Endo 10-Dec-2012 [4672] | About the case-sensitivity, What about to convert all the words into lowercase in compile time? Does it lead some unicode problems? What if a word is in Chinese, is there lower/upper cases in Chinese? |
GrahamC 10-Dec-2012 [4673] | No case sensitivity in chinese as there is no case |
Kaj 10-Dec-2012 [4674] | The issue is to keep them separate, instead of merging them into lowercase; but Doc has fixed it so far |
BrianH 10-Dec-2012 [4675x2] | For compiled code does it really matter? I thought it would only matter for words-as-data, and that compilation of case-insensitive code would make most words go away. For words-as-data, having some duplicate data when appropriate should be OK. |
Are you going to have case-sensitive objects, or just case-preserving? | |
DocKimbel 10-Dec-2012 [4677x2] | What about to convert all the words into lowercase in compile time? Words values are not "compilable", they are data (words used as variables can be "compiled" to some extents). Converting all words into lowercase during compilation (including JIT-compilation for words constructed at runtime) would make you loose the ability to distinguish lower/upper-cased letters, leading to big issues and pitfalls in the language. For example: (form 'A) = "a" (beause 'A would get converted to 'a). Not an option. |
Are you going to have case-sensitive objects, or just case-preserving? Are you referring to words defined in the object's context? Probably just case-preserving. | |
BrianH 10-Dec-2012 [4679x3] | Yes, that's what I meant. I phrased it that way because there was a big discussion where people were requesting that an option be added to objects to have them be case-sensitive, to distinguish based on case when mapping words to value slots, rather than the case-preserving default. We had to reject that proposal because there was no way to specify that option in the make object! syntax. The only way to do that in Rebol is to have a separate object-like datatype that has case-sensitive word mapping. The same proposal was made for maps, with the same results: a case-sensitive alternate type would be required. For both of those types, SELECT vs. SELECT/case could have some meaningful distinction, though we didn't get far enough for that to be an issue yet. |
This is all old Rebol discussions, but Red would have similar issues with proposing such options because it was a matter of syntax. | |
Back to an older topic, hex syntax. If you had 16#abcdabcd translate to an integer!, it wouldn't have to be considered to be a conflict with #abcdabcd being an issue! value. It's just like {abcdabcd}, #{abcdabcd} and #abcdabcd are different now. There would be no reason to keep the hex syntax once the value is loaded, it could just be a regular integer. You could even keep the issue! type as a word type with some extra series-like operations supported, the way tuple! supports series-like operations without being a series. | |
DocKimbel 10-Dec-2012 [4682x2] | I don't see any advantages in having case-sensitive objects (but I see potential issues allowing that). Have I missed something? |
Hex: your proposition is acceptable, but it makes hex literals writing still a bit more verbose than needed. We should be able to come up with a better solution that leads to just one additional character in order to write and identify hex literals (hence my # suffix proposition, with a base-16 default value). | |
BrianH 10-Dec-2012 [4684x2] | (For comparison again, sorry) In R3, objects are in many ways like the tables in Lua, used for data purposes as well as for contexts, underlying several other datatypes or operations as well. Most contexts are declared using these other datatypes or functions that wrap objects; raw objects are more often used as data structures than as contexts. It might make sense to support case-sensitive objects as data structures. Nonetheless, I wasn't the one making the suggestion, and I'd have to do a bit of research to dig up who was requesting this. |
Most people would prefer case-preserving behavior though, despite how difficult that is for multi-language Unicode words. | |
older newer | first last |