r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Maxim
17-Oct-2009
[4538x2]
well, build it and I will try it  ;-)
I promise.
BrianH
17-Oct-2009
[4540]
It's on my list...
Maxim
17-Oct-2009
[4541x3]
my deadline is to have a site working by this week... unless this 
darned bug I am trying to kill doesn't kill me first.
doh... when you're too close to the tree... you can't see the forest... 


I was using TO parse command on a rule ... this obviously won't work....
(it only accepts a string... dummy :-)
Pekr
18-Oct-2009
[4544]
ah, got reply on Chat from Carl towards complementing:


Re #5718: Pekr, that's a good question, and I think the answer must 
be YES. We need to be able to complement bitmaps in a 

nice way". Otherwise, Unicode bitmaps, even if simply used on ASCII 
chars, would take a lot of memory.


This change should be listed on the project sheet, and if not, I'll 
add it there."
Chris
22-Oct-2009
[4545x3]
Is there any advantage in breaking up charsets that represent a large 
varied range of the 16-bit character space? For example, XML names 
are defined as below (excluding > 2 ** 16), but are most commonly 
limited to the ascii-friendly subset:

	w1: charset [

  #"A" - #"Z" #"_" #"a" - #"z" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" 
  #"^(F8)" - #"^(02FF)"

  #"^(0370)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" 
  #"^(2070)" - #"^(218F)"

  #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" 
  #"^(FDF0)" - #"^(FFFD)"
	]
	w+: charset [

  #"-" #"." #"0" - #"9" #"A" - #"Z" #"_" #"a" - #"z" #"^(B7)" #"^(C0)" 
  - #"^(D6)"

  #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" 
  #"^(200C)" - #"^(200D)"

  #"^(203F)" - #"^(2040)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" 
  #"^(3001)" - #"^(D7FF)"
		#"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)"
	]
	word: [w1 any w+]
(sorry if that looks messy)
Both w1 and w+ appear to be very large values.  Would it be smart 
to perhaps do:

	[[aw1 | w1] any [aw+ | w+]]

Where 'aw1 and 'aw+ are limited to ascii values?
Steeve
22-Oct-2009
[4548x4]
Uses R3 (and his optimized complemented bitsets)
Anyway, a bitset with a length of 2 ** 16 is not so huge in memory 
(only 16kb)
64 Kb , sorry
So W1 + W+ = 128Kb

Is this a problem ?
Chris
22-Oct-2009
[4552]
That's what I'm asking.  Complemented bitsets wouldn't make a difference 
here though as the excluded range is of similar scope, right?
Steeve
22-Oct-2009
[4553x2]
It seems
if the size is a problem you can build a function to test each range.
But It will be slow
Chris
22-Oct-2009
[4555x3]
Not size, efficiency.
Allowing 'into to look inside strings can break current usage of 
'into, requiring [and any-block! into ...]
An example: a nested d: [k v] structure where 'k is a word and 'v 
is 'd or any other type:

	data: [k [k "s"]]

R2, you can validate with d: [word! [into d | skip]]


Now you have to specify: d: [word! [and any-block! into d | skip]] 
otherwise you get an error if 'v is a string!
Sunanda
25-Oct-2009
[4558]
I guess parse can do this too?

   http://stackoverflow.com/questions/1621906/is-there-a-way-to-split-a-string-by-every-nth-seperator-in-python
Will
25-Oct-2009
[4559]
is R2/Forward available for download? thx
Geomol
25-Oct-2009
[4560x2]
Sunanda, one way:

>> out: clear []

>> parse "this-is-a-string" [mark1: any [thru "-" [to "-" | to end] 
mark2: (append out copy/part mark1 mark2) skip mark1:]]
>> out
== ["this-is" "a-string"]
Another:

>> out: parse "this-is-a-string" "-"
>> forall out [change/part out rejoin [out/1 "-" out/2] 2]
>> out
== ["this-is" "a-string"]
Steeve
25-Oct-2009
[4562]
R3 one liner ;-)

>> map-each [a b] parse "this-is-a-string" "-" [ajoin [a #"-" b]]
Graham
26-Oct-2009
[4563]
Rebol doesn't have lines :)
BrianH
26-Oct-2009
[4564x2]
Chris, there can be an advantage in R3 to breaking up a bitset into 
more that one bitset on occasion, mostly memory savings. However, 
it might not work as well as you might like since offset and/or sparse 
bitsets aren't supported. Bitsets that involve high codepoints will 
take a lot of RAM no matter what you do.
Will, R2/Forward is already available for download in DevBase (R3 
chat). It is a little outdated though, since I had to take a break 
to rewrite R3's module system. I'll catch up when I get the chance. 
The percentage of R3 that I can emulate has gone down drastically 
since the last update, since R3 has made a lot of changes to basic 
datatype behavior since then. We'll see what we can do.
Steeve
26-Oct-2009
[4566x2]
Something funny.

I spent an hour debugging a parsing rule. 
To finally understand this.  
Never name a rule, LIMIT. 
LIMIT keyword is reserved for a further use in parse apparently.
(in R3)
Pekr
26-Oct-2009
[4568x2]
:-)
I thought it is not implemented yet, hence no reservation?
Steeve
26-Oct-2009
[4570]
if you just try to use it, your parsing may crash. So, it's doing 
nothing but it's here.
Pekr
26-Oct-2009
[4571x2]
Hmm, you are right .... But we might need better error message, no?

>> test: ["123"] parse "123" [test]
== true

>> limit: ["123"] parse "123" [limit]
** Script error: PARSE - invalid rule or usage of rule: end!
** Where: parse
** Near: parse "123" [limit]
posted to Chat/R3/Parse group ...
BrianH
26-Oct-2009
[4573x2]
Keywords that are *planned* to be added should definitely be reserved.
Otherwise adding them would be difficult.
Steeve
26-Oct-2009
[4575]
But it should return a proper error message as Pekr noticed it.
BrianH
26-Oct-2009
[4576]
Agreed :)
Robert
8-Nov-2009
[4577x2]
I have used www.antlr.org stuff several years ago with C/C++ target. 
It's a very cool parser generator toolkit. Just took a look again. 
It has emitters for different languages. Maybe one of the parse gurus 
here can take a look if we can do a REBOL emitter.
IMO that would be really nice.
JoshF
17-Nov-2009
[4579x4]
Hi! I'm trying to use REBOL's parse to make a simple calculator dialect. 
However, I'm having trouble with escaping entities (I think)...  
Here's my first try (that worked):
>> parse [3 + 2] [some [integer! (print "number") | ['+ | '- ] (print 
"op")]]
number
op
number
== true
>> parse [3 - 2] [some [integer! (print "number") | ['+ | '- | '* 
| '/ ] (print "op")]]
** Syntax Error: Invalid word-lit -- '

** Near: (line 1) parse [3 - 2] [some [integer! (print "number") 
| ['+ | '- | '* | '/
 ] (print "op")]]
The second one failed when I tried to extend the dialect with multiply 
(*) and divide (/). After further experimentation, it seems that 
you can't escape the "/". Google has not been helpful here... Does 
anybody have any ideas? I could parse for just a word! instead of 
the +, -, etc., but I wanted parse to do the work of deciding what 
was a valid operation or not. Sorry for the multiple messages, I'm 
still trying to figure this client out... Thanks for any advice!
Ladislav
17-Nov-2009
[4583]
JoshF: Rebol load does not parse the '/, but you can do:

as-lit-word: func ['word [any-word!]] [to lit-word! word]
lit-div: as-lit-word /

parse [3 - 2] [some [integer! (print "number") | ['+ | '- | '* | 
lit-div] (print "op")]]
JoshF
17-Nov-2009
[4584x2]
Ha! Black magic! That works a champ Ladislav, thanks very much!  
I had tried 
>> tdiv: to-word "/"
== /

>> parse [3 / 2] [some [integer! (print "number") | ['+ | '- | '* 
| tdiv ] (print "op
)]]
But had gotten the same error. What makes yours work?
Both tdiv and lit-div type? to a word!...
Ladislav
17-Nov-2009
[4586x2]
My example works, since the LIT-DIV variable refers to a lit-word, 
while your tdiv refers to a word
check as follows:

type? :lit-div
type? :tdiv