World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Maxim 17-Oct-2009 [4538x2]	well, build it and I will try it ;-)
Maxim 17-Oct-2009 [4538x2]	I promise.
BrianH 17-Oct-2009 [4540]	It's on my list...
Maxim 17-Oct-2009 [4541x3]	my deadline is to have a site working by this week... unless this darned bug I am trying to kill doesn't kill me first.
	doh... when you're too close to the tree... you can't see the forest... I was using TO parse command on a rule ... this obviously won't work....
	(it only accepts a string... dummy :-)
Pekr 18-Oct-2009 [4544]	ah, got reply on Chat from Carl towards complementing: Re #5718: Pekr, that's a good question, and I think the answer must be YES. We need to be able to complement bitmaps in a nice way". Otherwise, Unicode bitmaps, even if simply used on ASCII chars, would take a lot of memory. This change should be listed on the project sheet, and if not, I'll add it there."
Chris 22-Oct-2009 [4545x3]	Is there any advantage in breaking up charsets that represent a large varied range of the 16-bit character space? For example, XML names are defined as below (excluding > 2 ** 16), but are most commonly limited to the ascii-friendly subset: w1: charset [ #"A" - #"Z" #"_" #"a" - #"z" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(02FF)" #"^(0370)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] w+: charset [ #"-" #"." #"0" - #"9" #"A" - #"Z" #"_" #"a" - #"z" #"^(B7)" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(203F)" - #"^(2040)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] word: [w1 any w+]
	(sorry if that looks messy)
	Both w1 and w+ appear to be very large values. Would it be smart to perhaps do: [[aw1 \| w1] any [aw+ \| w+]] Where 'aw1 and 'aw+ are limited to ascii values?
Steeve 22-Oct-2009 [4548x4]	Uses R3 (and his optimized complemented bitsets)
	Anyway, a bitset with a length of 2 ** 16 is not so huge in memory (only 16kb)
	64 Kb , sorry
	So W1 + W+ = 128Kb Is this a problem ?
Chris 22-Oct-2009 [4552]	That's what I'm asking. Complemented bitsets wouldn't make a difference here though as the excluded range is of similar scope, right?
Steeve 22-Oct-2009 [4553x2]	It seems
Steeve 22-Oct-2009 [4553x2]	if the size is a problem you can build a function to test each range. But It will be slow
Chris 22-Oct-2009 [4555x3]	Not size, efficiency.
	Allowing 'into to look inside strings can break current usage of 'into, requiring [and any-block! into ...]
	An example: a nested d: [k v] structure where 'k is a word and 'v is 'd or any other type: data: [k [k "s"]] R2, you can validate with d: [word! [into d \| skip]] Now you have to specify: d: [word! [and any-block! into d \| skip]] otherwise you get an error if 'v is a string!
Sunanda 25-Oct-2009 [4558]	I guess parse can do this too? http://stackoverflow.com/questions/1621906/is-there-a-way-to-split-a-string-by-every-nth-seperator-in-python
Will 25-Oct-2009 [4559]	is R2/Forward available for download? thx
Geomol 25-Oct-2009 [4560x2]	Sunanda, one way: >> out: clear [] >> parse "this-is-a-string" [mark1: any [thru "-" [to "-" \| to end] mark2: (append out copy/part mark1 mark2) skip mark1:]] >> out == ["this-is" "a-string"]
Geomol 25-Oct-2009 [4560x2]	Another: >> out: parse "this-is-a-string" "-" >> forall out [change/part out rejoin [out/1 "-" out/2] 2] >> out == ["this-is" "a-string"]
Steeve 25-Oct-2009 [4562]	R3 one liner ;-) >> map-each [a b] parse "this-is-a-string" "-" [ajoin [a #"-" b]]
Graham 26-Oct-2009 [4563]	Rebol doesn't have lines :)
BrianH 26-Oct-2009 [4564x2]	Chris, there can be an advantage in R3 to breaking up a bitset into more that one bitset on occasion, mostly memory savings. However, it might not work as well as you might like since offset and/or sparse bitsets aren't supported. Bitsets that involve high codepoints will take a lot of RAM no matter what you do.
BrianH 26-Oct-2009 [4564x2]	Will, R2/Forward is already available for download in DevBase (R3 chat). It is a little outdated though, since I had to take a break to rewrite R3's module system. I'll catch up when I get the chance. The percentage of R3 that I can emulate has gone down drastically since the last update, since R3 has made a lot of changes to basic datatype behavior since then. We'll see what we can do.
Steeve 26-Oct-2009 [4566x2]	Something funny. I spent an hour debugging a parsing rule. To finally understand this. Never name a rule, LIMIT. LIMIT keyword is reserved for a further use in parse apparently.
Steeve 26-Oct-2009 [4566x2]	(in R3)
Pekr 26-Oct-2009 [4568x2]	:-)
Pekr 26-Oct-2009 [4568x2]	I thought it is not implemented yet, hence no reservation?
Steeve 26-Oct-2009 [4570]	if you just try to use it, your parsing may crash. So, it's doing nothing but it's here.
Pekr 26-Oct-2009 [4571x2]	Hmm, you are right .... But we might need better error message, no? >> test: ["123"] parse "123" [test] == true >> limit: ["123"] parse "123" [limit] Script error: PARSE - invalid rule or usage of rule: end! Where: parse ** Near: parse "123" [limit]
Pekr 26-Oct-2009 [4571x2]	posted to Chat/R3/Parse group ...
BrianH 26-Oct-2009 [4573x2]	Keywords that are planned to be added should definitely be reserved.
BrianH 26-Oct-2009 [4573x2]	Otherwise adding them would be difficult.
Steeve 26-Oct-2009 [4575]	But it should return a proper error message as Pekr noticed it.
BrianH 26-Oct-2009 [4576]	Agreed :)
Robert 8-Nov-2009 [4577x2]	I have used www.antlr.org stuff several years ago with C/C++ target. It's a very cool parser generator toolkit. Just took a look again. It has emitters for different languages. Maybe one of the parse gurus here can take a look if we can do a REBOL emitter.
Robert 8-Nov-2009 [4577x2]	IMO that would be really nice.
JoshF 17-Nov-2009 [4579x4]	Hi! I'm trying to use REBOL's parse to make a simple calculator dialect. However, I'm having trouble with escaping entities (I think)... Here's my first try (that worked):
	>> parse [3 + 2] [some [integer! (print "number") \| ['+ \| '- ] (print "op")]] number op number == true
	>> parse [3 - 2] [some [integer! (print "number") \| ['+ \| '- \| '* \| '/ ] (print "op")]] Syntax Error: Invalid word-lit -- ' Near: (line 1) parse [3 - 2] [some [integer! (print "number") \| ['+ \| '- \| '* \| '/ ] (print "op")]]
	The second one failed when I tried to extend the dialect with multiply (*) and divide (/). After further experimentation, it seems that you can't escape the "/". Google has not been helpful here... Does anybody have any ideas? I could parse for just a word! instead of the +, -, etc., but I wanted parse to do the work of deciding what was a valid operation or not. Sorry for the multiple messages, I'm still trying to figure this client out... Thanks for any advice!
Ladislav 17-Nov-2009 [4583]	JoshF: Rebol load does not parse the '/, but you can do: as-lit-word: func ['word [any-word!]] [to lit-word! word] lit-div: as-lit-word / parse [3 - 2] [some [integer! (print "number") \| ['+ \| '- \| '* \| lit-div] (print "op")]]
JoshF 17-Nov-2009 [4584x2]	Ha! Black magic! That works a champ Ladislav, thanks very much! I had tried >> tdiv: to-word "/" == / >> parse [3 / 2] [some [integer! (print "number") \| ['+ \| '- \| '* \| tdiv ] (print "op )]] But had gotten the same error. What makes yours work?
JoshF 17-Nov-2009 [4584x2]	Both tdiv and lit-div type? to a word!...
Ladislav 17-Nov-2009 [4586x2]	My example works, since the LIT-DIV variable refers to a lit-word, while your tdiv refers to a word
Ladislav 17-Nov-2009 [4586x2]	check as follows: type? :lit-div type? :tdiv
older newer	first last