World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Maxim 17-Oct-2009 [4535x2]	really, the problem is not the parsing itself... its getting the darn rules to generate the proper rules hehehe.
Maxim 17-Oct-2009 [4535x2]	and its not simple parsing since I use parsing index manipulation, which is also dictated by the source data in encounters. its like swatting flies using a fly swatter at the end of a rope, while riding a roller coster which changes layout every time you ride it ;-)
BrianH 17-Oct-2009 [4537]	Which is what a rule compiler does :) Actually, it sounds like you could adapt the tricks of the ruule compiler to your rule compiler, which would let you use the new operations in your rule source and have the workarounds generated in the output.
Maxim 17-Oct-2009 [4538x2]	well, build it and I will try it ;-)
Maxim 17-Oct-2009 [4538x2]	I promise.
BrianH 17-Oct-2009 [4540]	It's on my list...
Maxim 17-Oct-2009 [4541x3]	my deadline is to have a site working by this week... unless this darned bug I am trying to kill doesn't kill me first.
	doh... when you're too close to the tree... you can't see the forest... I was using TO parse command on a rule ... this obviously won't work....
	(it only accepts a string... dummy :-)
Pekr 18-Oct-2009 [4544]	ah, got reply on Chat from Carl towards complementing: Re #5718: Pekr, that's a good question, and I think the answer must be YES. We need to be able to complement bitmaps in a nice way". Otherwise, Unicode bitmaps, even if simply used on ASCII chars, would take a lot of memory. This change should be listed on the project sheet, and if not, I'll add it there."
Chris 22-Oct-2009 [4545x3]	Is there any advantage in breaking up charsets that represent a large varied range of the 16-bit character space? For example, XML names are defined as below (excluding > 2 ** 16), but are most commonly limited to the ascii-friendly subset: w1: charset [ #"A" - #"Z" #"_" #"a" - #"z" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(02FF)" #"^(0370)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] w+: charset [ #"-" #"." #"0" - #"9" #"A" - #"Z" #"_" #"a" - #"z" #"^(B7)" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(203F)" - #"^(2040)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] word: [w1 any w+]
	(sorry if that looks messy)
	Both w1 and w+ appear to be very large values. Would it be smart to perhaps do: [[aw1 \| w1] any [aw+ \| w+]] Where 'aw1 and 'aw+ are limited to ascii values?
Steeve 22-Oct-2009 [4548x4]	Uses R3 (and his optimized complemented bitsets)
	Anyway, a bitset with a length of 2 ** 16 is not so huge in memory (only 16kb)
	64 Kb , sorry
	So W1 + W+ = 128Kb Is this a problem ?
Chris 22-Oct-2009 [4552]	That's what I'm asking. Complemented bitsets wouldn't make a difference here though as the excluded range is of similar scope, right?
Steeve 22-Oct-2009 [4553x2]	It seems
Steeve 22-Oct-2009 [4553x2]	if the size is a problem you can build a function to test each range. But It will be slow
Chris 22-Oct-2009 [4555x3]	Not size, efficiency.
	Allowing 'into to look inside strings can break current usage of 'into, requiring [and any-block! into ...]
	An example: a nested d: [k v] structure where 'k is a word and 'v is 'd or any other type: data: [k [k "s"]] R2, you can validate with d: [word! [into d \| skip]] Now you have to specify: d: [word! [and any-block! into d \| skip]] otherwise you get an error if 'v is a string!
Sunanda 25-Oct-2009 [4558]	I guess parse can do this too? http://stackoverflow.com/questions/1621906/is-there-a-way-to-split-a-string-by-every-nth-seperator-in-python
Will 25-Oct-2009 [4559]	is R2/Forward available for download? thx
Geomol 25-Oct-2009 [4560x2]	Sunanda, one way: >> out: clear [] >> parse "this-is-a-string" [mark1: any [thru "-" [to "-" \| to end] mark2: (append out copy/part mark1 mark2) skip mark1:]] >> out == ["this-is" "a-string"]
Geomol 25-Oct-2009 [4560x2]	Another: >> out: parse "this-is-a-string" "-" >> forall out [change/part out rejoin [out/1 "-" out/2] 2] >> out == ["this-is" "a-string"]
Steeve 25-Oct-2009 [4562]	R3 one liner ;-) >> map-each [a b] parse "this-is-a-string" "-" [ajoin [a #"-" b]]
Graham 26-Oct-2009 [4563]	Rebol doesn't have lines :)
BrianH 26-Oct-2009 [4564x2]	Chris, there can be an advantage in R3 to breaking up a bitset into more that one bitset on occasion, mostly memory savings. However, it might not work as well as you might like since offset and/or sparse bitsets aren't supported. Bitsets that involve high codepoints will take a lot of RAM no matter what you do.
BrianH 26-Oct-2009 [4564x2]	Will, R2/Forward is already available for download in DevBase (R3 chat). It is a little outdated though, since I had to take a break to rewrite R3's module system. I'll catch up when I get the chance. The percentage of R3 that I can emulate has gone down drastically since the last update, since R3 has made a lot of changes to basic datatype behavior since then. We'll see what we can do.
Steeve 26-Oct-2009 [4566x2]	Something funny. I spent an hour debugging a parsing rule. To finally understand this. Never name a rule, LIMIT. LIMIT keyword is reserved for a further use in parse apparently.
Steeve 26-Oct-2009 [4566x2]	(in R3)
Pekr 26-Oct-2009 [4568x2]	:-)
Pekr 26-Oct-2009 [4568x2]	I thought it is not implemented yet, hence no reservation?
Steeve 26-Oct-2009 [4570]	if you just try to use it, your parsing may crash. So, it's doing nothing but it's here.
Pekr 26-Oct-2009 [4571x2]	Hmm, you are right .... But we might need better error message, no? >> test: ["123"] parse "123" [test] == true >> limit: ["123"] parse "123" [limit] Script error: PARSE - invalid rule or usage of rule: end! Where: parse ** Near: parse "123" [limit]
Pekr 26-Oct-2009 [4571x2]	posted to Chat/R3/Parse group ...
BrianH 26-Oct-2009 [4573x2]	Keywords that are planned to be added should definitely be reserved.
BrianH 26-Oct-2009 [4573x2]	Otherwise adding them would be difficult.
Steeve 26-Oct-2009 [4575]	But it should return a proper error message as Pekr noticed it.
BrianH 26-Oct-2009 [4576]	Agreed :)
Robert 8-Nov-2009 [4577x2]	I have used www.antlr.org stuff several years ago with C/C++ target. It's a very cool parser generator toolkit. Just took a look again. It has emitters for different languages. Maybe one of the parse gurus here can take a look if we can do a REBOL emitter.
Robert 8-Nov-2009 [4577x2]	IMO that would be really nice.
JoshF 17-Nov-2009 [4579x4]	Hi! I'm trying to use REBOL's parse to make a simple calculator dialect. However, I'm having trouble with escaping entities (I think)... Here's my first try (that worked):
	>> parse [3 + 2] [some [integer! (print "number") \| ['+ \| '- ] (print "op")]] number op number == true
	>> parse [3 - 2] [some [integer! (print "number") \| ['+ \| '- \| '* \| '/ ] (print "op")]] Syntax Error: Invalid word-lit -- ' Near: (line 1) parse [3 - 2] [some [integer! (print "number") \| ['+ \| '- \| '* \| '/ ] (print "op")]]
	The second one failed when I tried to extend the dialect with multiply (*) and divide (/). After further experimentation, it seems that you can't escape the "/". Google has not been helpful here... Does anybody have any ideas? I could parse for just a word! instead of the +, -, etc., but I wanted parse to do the work of deciding what was a valid operation or not. Sorry for the multiple messages, I'm still trying to figure this client out... Thanks for any advice!
Ladislav 17-Nov-2009 [4583]	JoshF: Rebol load does not parse the '/, but you can do: as-lit-word: func ['word [any-word!]] [to lit-word! word] lit-div: as-lit-word / parse [3 - 2] [some [integer! (print "number") \| ['+ \| '- \| '* \| lit-div] (print "op")]]
JoshF 17-Nov-2009 [4584]	Ha! Black magic! That works a champ Ladislav, thanks very much! I had tried >> tdiv: to-word "/" == / >> parse [3 / 2] [some [integer! (print "number") \| ['+ \| '- \| '* \| tdiv ] (print "op )]] But had gotten the same error. What makes yours work?
older newer	first last