r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Pekr
17-Oct-2009
[4521]
So - we don't need complementing to be enhanced? Because we talked 
about it, but it is not defined in proposal, it is not part of Carl's 
feature table, and I also got no reaction on R3 Chat ....
BrianH
17-Oct-2009
[4522x2]
Gabriele, these changes can be backported to R2 in the form of a 
rule compiler that generates (unreadable) R2 parse rules.
Pekr, we still need complementing to be enhanced. Even Carl has said 
so.
Maxim
17-Oct-2009
[4524x3]
but wouldn't work with remark  ;-)
one situation which complemet can't handle very well (ram wise):

union charset "a" complement charset "b"
you end up with a full codepoint bitset minus one byte if it complemented 
or not
BrianH
17-Oct-2009
[4527x2]
Maxim, Remark could be adjusted to use the rule compiler. For that 
matter, Remark could use R2/Forward (which needs some work, but is 
already better than R2 on its own).
Maxim, that is what Pekr was talking about. That is planned to be 
fixed.
Maxim
17-Oct-2009
[4529x4]
a rule compiler doesn't adapt very well to self-modifying rules
laden with many paren expressions and a stack on top of it.
the rule I am writing now actually does JIT rule compilation... hairy 
to debug :-)
since I use binding to map inner rules which are also constructed 
on the fly but have to be pushed and poped from the stack as I traverse 
data... its a lot of fun  :-D
BrianH
17-Oct-2009
[4533x2]
If the self-modifying rules are strung-together basic blocks, you 
can use the rule compiler to generate the blocks. And the R3 changes 
make self-modifying rules less necessary, so you can have even larger 
basic blocks.
Of course the *result* of the compilation would be self-modifying 
rules :)
Maxim
17-Oct-2009
[4535x2]
really, the problem is not the parsing itself... its getting the 
darn rules to generate the proper rules hehehe.
and its not simple parsing since I use parsing index manipulation, 
which is also dictated by the source data in encounters.  its like 
swatting flies using a fly swatter at the end of a rope, while riding 
a roller coster which changes layout every time you ride it  ;-)
BrianH
17-Oct-2009
[4537]
Which is what a rule compiler does :)  Actually, it sounds like you 
could adapt the tricks of the ruule compiler to *your* rule compiler, 
which would let you use the new operations in your rule source and 
have the workarounds generated in the output.
Maxim
17-Oct-2009
[4538x2]
well, build it and I will try it  ;-)
I promise.
BrianH
17-Oct-2009
[4540]
It's on my list...
Maxim
17-Oct-2009
[4541x3]
my deadline is to have a site working by this week... unless this 
darned bug I am trying to kill doesn't kill me first.
doh... when you're too close to the tree... you can't see the forest... 


I was using TO parse command on a rule ... this obviously won't work....
(it only accepts a string... dummy :-)
Pekr
18-Oct-2009
[4544]
ah, got reply on Chat from Carl towards complementing:


Re #5718: Pekr, that's a good question, and I think the answer must 
be YES. We need to be able to complement bitmaps in a 

nice way". Otherwise, Unicode bitmaps, even if simply used on ASCII 
chars, would take a lot of memory.


This change should be listed on the project sheet, and if not, I'll 
add it there."
Chris
22-Oct-2009
[4545x3]
Is there any advantage in breaking up charsets that represent a large 
varied range of the 16-bit character space? For example, XML names 
are defined as below (excluding > 2 ** 16), but are most commonly 
limited to the ascii-friendly subset:

	w1: charset [

  #"A" - #"Z" #"_" #"a" - #"z" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" 
  #"^(F8)" - #"^(02FF)"

  #"^(0370)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" 
  #"^(2070)" - #"^(218F)"

  #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" 
  #"^(FDF0)" - #"^(FFFD)"
	]
	w+: charset [

  #"-" #"." #"0" - #"9" #"A" - #"Z" #"_" #"a" - #"z" #"^(B7)" #"^(C0)" 
  - #"^(D6)"

  #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" 
  #"^(200C)" - #"^(200D)"

  #"^(203F)" - #"^(2040)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" 
  #"^(3001)" - #"^(D7FF)"
		#"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)"
	]
	word: [w1 any w+]
(sorry if that looks messy)
Both w1 and w+ appear to be very large values.  Would it be smart 
to perhaps do:

	[[aw1 | w1] any [aw+ | w+]]

Where 'aw1 and 'aw+ are limited to ascii values?
Steeve
22-Oct-2009
[4548x4]
Uses R3 (and his optimized complemented bitsets)
Anyway, a bitset with a length of 2 ** 16 is not so huge in memory 
(only 16kb)
64 Kb , sorry
So W1 + W+ = 128Kb

Is this a problem ?
Chris
22-Oct-2009
[4552]
That's what I'm asking.  Complemented bitsets wouldn't make a difference 
here though as the excluded range is of similar scope, right?
Steeve
22-Oct-2009
[4553x2]
It seems
if the size is a problem you can build a function to test each range.
But It will be slow
Chris
22-Oct-2009
[4555x3]
Not size, efficiency.
Allowing 'into to look inside strings can break current usage of 
'into, requiring [and any-block! into ...]
An example: a nested d: [k v] structure where 'k is a word and 'v 
is 'd or any other type:

	data: [k [k "s"]]

R2, you can validate with d: [word! [into d | skip]]


Now you have to specify: d: [word! [and any-block! into d | skip]] 
otherwise you get an error if 'v is a string!
Sunanda
25-Oct-2009
[4558]
I guess parse can do this too?

   http://stackoverflow.com/questions/1621906/is-there-a-way-to-split-a-string-by-every-nth-seperator-in-python
Will
25-Oct-2009
[4559]
is R2/Forward available for download? thx
Geomol
25-Oct-2009
[4560x2]
Sunanda, one way:

>> out: clear []

>> parse "this-is-a-string" [mark1: any [thru "-" [to "-" | to end] 
mark2: (append out copy/part mark1 mark2) skip mark1:]]
>> out
== ["this-is" "a-string"]
Another:

>> out: parse "this-is-a-string" "-"
>> forall out [change/part out rejoin [out/1 "-" out/2] 2]
>> out
== ["this-is" "a-string"]
Steeve
25-Oct-2009
[4562]
R3 one liner ;-)

>> map-each [a b] parse "this-is-a-string" "-" [ajoin [a #"-" b]]
Graham
26-Oct-2009
[4563]
Rebol doesn't have lines :)
BrianH
26-Oct-2009
[4564x2]
Chris, there can be an advantage in R3 to breaking up a bitset into 
more that one bitset on occasion, mostly memory savings. However, 
it might not work as well as you might like since offset and/or sparse 
bitsets aren't supported. Bitsets that involve high codepoints will 
take a lot of RAM no matter what you do.
Will, R2/Forward is already available for download in DevBase (R3 
chat). It is a little outdated though, since I had to take a break 
to rewrite R3's module system. I'll catch up when I get the chance. 
The percentage of R3 that I can emulate has gone down drastically 
since the last update, since R3 has made a lot of changes to basic 
datatype behavior since then. We'll see what we can do.
Steeve
26-Oct-2009
[4566x2]
Something funny.

I spent an hour debugging a parsing rule. 
To finally understand this.  
Never name a rule, LIMIT. 
LIMIT keyword is reserved for a further use in parse apparently.
(in R3)
Pekr
26-Oct-2009
[4568x2]
:-)
I thought it is not implemented yet, hence no reservation?
Steeve
26-Oct-2009
[4570]
if you just try to use it, your parsing may crash. So, it's doing 
nothing but it's here.