• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r4wp

[#Red] Red language group

Gregg
7-Dec-2012
[4634]
For hex notation in REBOL, I've used (albeit dynamically) a simple 
HEX function with issues. 

  hex #20000001


I'm OK with the suffix approach, but if a prefix approach works I 
like that the prefix clues you in to what you're reading, rather 
than reading the number and then seeing the suffix. The question 
is what sigil to use, if lexical space becomes very tight, as in 
REBOL. Do you have any plans for &?

  &HFFFF000F
  &O77770007  ; though I don't think we need octal
  &B11110001
Maxim
7-Dec-2012
[4635]
the following are currently invalid REBOL notations (the first three 
load in R2 but get scrambled)

I prefer the first tree, since they are pretty obvious without any 
knowledge of the language.

16#FFFF000F
8#7124554764
2#0110110101

H#FFFF000F
O#7124554764
B#0110110101
Gregg
7-Dec-2012
[4636]
I like having the numbers in binary! values, but not as much for 
this. My brain says "this is a binary in base 16 notation", but for 
hex or binary literals, I want to think of the words 'hex and 'binary, 
rather than "this is a base-16 number, which means it's in hex format". 
I think I looked for alternate notations a long time ago. Have to 
see if I can find my notes.
DocKimbel
7-Dec-2012
[4637]
I have found an issue with word! value casing in Red. The Red/System 
code generated for:
	print 'a = 'A
is:
          stack/mark-native ~print
          stack/mark-native ~strict-equal?
	word/push ~a
          word/push ~A
          natives/strict-equal?*
          stack/unwind
          natives/print*
          stack/unwind


The problem is that Red/System is case-insensitive, so ~a and ~A 
are the same variable. So, no way to make it work like that. I see 
two options for solving it:

1) Make Red/System case-sensitive.

2) Deep encode each Red generated symbol to distinguish lower and 
uppercases.


Solution 2) works, but it makes symbol decoration operation very 
costly (each symbol letter is prefixed with a sigil for lowercases 
and another one for uppercases). The example above becomes:

          stack/mark-native ~_p_r_i_n_t
          stack/mark-native ~_s_t_r_i_c_t_-_e_q_u_a_l_?
          word/push ~_a
          word/push ~-A
          natives/strict-equal?*
          stack/unwind
          natives/print*
          stack/unwind


So, it is not nice, it doubles every Red symbol size that is handled 
by Red/System and slows down Red compilation by 25%.

So, my questions are:
a) Does anyone see another cheaper solution to this problem?

b) In case of option 1), do you have anything against making Red/System 
identifiers case-sensitive?
Kaj
7-Dec-2012
[4638]
Hm, I like that Red/System is case-insensitive like REBOL, so I would 
consider it an offer to have to let go of that
DocKimbel
7-Dec-2012
[4639x3]
Hmm, actually, another option should be possible, generating a unique 
new symbol for same words that have different casing. I will test 
it tomorrow. Anyway, if you have ideas/remarks about this, let me 
know.
Anyway, I don't think we use different casing for identifiers in 
Red/System. Even in REBOL, I don't remember ever using same words 
with different casing in the same app.
I would like to fix this issue and make words comparison operators 
work for the new release, so I'll postpone the release for tomorrow.
Gregg
7-Dec-2012
[4642x3]
Do you know how REBOL handles it? I prefer case-insensitive in general, 
but doubling the size of identifiers seems bad, even if hidden from 
us for the most part.
Case-sensitivity could trip up a lot of REBOLers. I know this is 
Red/System, but still. You may also find that people treat it as 
a feature and start giving things names that differ only in case, 
as happens in C.
What are the biggest downsides to having Red/System remain case-insensitive? 
That is, what does case sensitivity buy us?
Kaj
7-Dec-2012
[4645x4]
In REBOL, 'a and 'A are aliases of the same symbol. Red/System converts 
them to their integer identifier, right? I'd say you need different 
identifiers for aliases somehow to implement the REBOL semantics 
of distinguishing equal? and strict-equal?
That is, identifiers need two levels: the first level for identifying 
the symbol, and the second level for distinguishing aliases
The most space efficient encoding I can come up with would be something 
like ~a-1 for 'a and ~A-2 for 'A. That would be cheap to evaluate 
for strict-equal? but expensive for equal?
A faster encoding would be to reserve a part of the integer identifier 
for the alias number, for example one byte. That would reduce the 
number of different symbols to 2^24 and the maximum number of aliases 
for one symbol to 256. That would only allow a word up to 8 characters 
to have all its aliases, but it would be cheap to evaluate for both 
strict-equal? and equal?
DocKimbel
8-Dec-2012
[4649x5]
In REBOL, 'a and 'A are aliases of the same symbol. Red/System converts 
them to their integer identifier, right?


Symbols have two representations in Red compiler, one is at runtime 
(like in REBOL), the other is a compile-time, in the form of Red/System 
variables. In a very early version of the compiler, I was using integers 
(indexes in symbol table) instead of variables, but quickly realizef 
that it was obfuscating the generated Red/System code a lot, making 
it difficult to debug. Also, the integer approach had an additional 
runtime cost at it required to make an array access in order to retrieve 
the symbol value.


Currently, the Red/System ~<name> variables directly point to a word! 
value version, instead of a symbol! for simplicity and efficiency.
I have implemented a compile-time aliasing system for same words 
but different casing. It works fine so far and is cheap compared 
to other options (it requires a conversion table (symbol->alias) 
to be maintained during the compilation).
Aliases are already implemented in the symbol! type. Basically a 
word! relies on a symbol ID, which is an entry in the symbol table. 
Each entries in this table is a symbol! value that references the 
internal Red string! value and a possible alias ID (which is just 
another symbol ID).


Now, I just need to add alias handling in the equal? and strict-equal? 
natives when applied on words to make it work correctly.
What are the biggest downsides to having Red/System remain case-insensitive? 
That is, what does case sensitivity buy us?


Good question. I think it doesn't buy us anything nor does it remove 
us any useful feature. Actually, I think that as long as you are 
consistent in the way you name your identifiers (variables, functions, 
contexts,...), you are case-neutral. So, having Red/System case-sensitive 
wouldn't change anything for me and I guess it would be the same 
for others.


Anyway, I prefer to keep it case-insensitive for now, for the sake 
of consistency with Red, unless I really need to change it.
Ok, now equality comparison operators work on all word datatypes.
Gregg
8-Dec-2012
[4654]
Thanks Doc. This is good information to put in a doc somewhere, even 
if just as a reminder to formally doc it later.
BrianH
8-Dec-2012
[4655x2]
Why would = translate to strict-equal? - shouldn't that be == instead?
This is one area where copying R3 as it is now would be a bad idea 
though. See http://issue.cc/r3/1834for details.
DocKimbel
8-Dec-2012
[4657x2]
Brian: wrt '=, it's a typo, it should be ==.
I haven't implemented EQUIV? yet, I'll look at it when we'll have 
a complete IEEE-754 support (we are missing INFs and NaN handling 
in Red/System).
Marco
8-Dec-2012
[4659]
About hex notation etc (I like case insensitiveness for numbers):
0&a1B
0%10110
or
0b10110
0ha1B
DocKimbel
8-Dec-2012
[4660x3]
0%... prefix will clash with percent! datatype literal form.
The two last (0b... and 0h) do not read easily IMHO, especially if 
lowercases are allowed.
Anyway, having a prefix rather than a suffix is a possible option.
Steeve
9-Dec-2012
[4663]
How do one know a rebol function is supported or not ?

I tried a simple FOR loop, but no result and the compiler is quiet.
DocKimbel
9-Dec-2012
[4664x4]
How do one know a rebol function is supported or not ?

 Currently, only by looking in the source code. The compiler is lack 
 a lot of checks, so you need to get your Red code right for now.
lacking
The source code should be easily parse-able, so the list of functions, 
native, actions, ops could be extracted and pretty-printed as a web 
page. IIRC, someone tried to make such script but I didn't see any 
result yet.
New features added today worth mentioning:


- comparison operators (=, ==, <>, <, <=, >=, >) support extended 
to all datatypes.

- FIND action added (supports block! only for now, /match not implemented, 
/only always on)
Gregg
9-Dec-2012
[4668]
Excellent news Doc.
Kaj
9-Dec-2012
[4669]
Jerry wanted to publish ongoing feature stats
Arnold
9-Dec-2012
[4670]
Yes I wanted to give it a try for the doc scripts. But parse is not 
my expertise, and at the moment I am short in time as I can make 
extra hours at work. So everybody step in please and publish your 
baby-doc-scripts so we can all contribute little bits.
Kaj
9-Dec-2012
[4671]
Working to fix that COBOL code for 21-12-2012 to prevent the end 
of the world, eh? ;-)
Endo
10-Dec-2012
[4672]
About the case-sensitivity,

What about to convert all the words into lowercase in compile time? 
Does it lead some unicode problems? What if a word is in Chinese, 
is there lower/upper cases in Chinese?
GrahamC
10-Dec-2012
[4673]
No case sensitivity in chinese as there is no case
Kaj
10-Dec-2012
[4674]
The issue is to keep them separate, instead of merging them into 
lowercase; but Doc has fixed it so far
BrianH
10-Dec-2012
[4675x2]
For compiled code does it really matter? I thought it would only 
matter for words-as-data, and that compilation of case-insensitive 
code would make most words go away. For words-as-data, having some 
duplicate data when appropriate should be OK.
Are you going to have case-sensitive objects, or just case-preserving?
DocKimbel
10-Dec-2012
[4677x2]
What about to convert all the words into lowercase in compile time?


Words values are not "compilable", they are data (words used as variables 
can be "compiled" to some extents). Converting all words into lowercase 
during compilation (including JIT-compilation for words constructed 
at runtime) would make you loose the ability to distinguish lower/upper-cased 
letters, leading to big issues and pitfalls in the language. For 
example: (form 'A) = "a" (beause 'A would get converted to 'a). Not 
an option.
Are you going to have case-sensitive objects, or just case-preserving?


Are you referring to words defined in the object's context? Probably 
just case-preserving.
BrianH
10-Dec-2012
[4679x3]
Yes, that's what I meant. I phrased it that way because there was 
a big discussion where people were requesting that an option be added 
to objects to have them be case-sensitive, to distinguish based on 
case when mapping words to value slots, rather than the case-preserving 
default. We had to reject that proposal because there was no way 
to specify that option in the make object! syntax. The only way to 
do that in Rebol is to have a separate object-like datatype that 
has case-sensitive word mapping. The same proposal was made for maps, 
with the same results: a case-sensitive alternate type would be required. 
For both of those types, SELECT vs. SELECT/case could have some meaningful 
distinction, though we didn't get far enough for that to be an issue 
yet.
This is all old Rebol discussions, but Red would have similar issues 
with proposing such options because it was a matter of syntax.
Back to an older topic, hex syntax. If you had 16#abcdabcd translate 
to an integer!, it wouldn't have to be considered to be a conflict 
with #abcdabcd being an issue! value. It's just like {abcdabcd}, 
#{abcdabcd} and #abcdabcd are different now. There would be no reason 
to keep the hex syntax once the value is loaded, it could just be 
a regular integer. You could even keep the issue! type as a word 
type with some extra series-like operations supported, the way tuple! 
supports series-like operations without being a series.
DocKimbel
10-Dec-2012
[4682x2]
I don't see any advantages in having case-sensitive objects (but 
I see potential issues allowing that). Have I missed something?
Hex: your proposition is acceptable, but it makes hex literals writing 
still a bit more verbose than needed. We should be able to come up 
with a better solution that leads to just one additional character 
in order to write and identify hex literals (hence my # suffix proposition, 
with a base-16 default value).