World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Maxim 17-Oct-2009 [4513]	gave my two cents on the blog... hehehe
Pekr 17-Oct-2009 [4514]	Max - we are close to finish parse. It is clear Carl will move to other priorities. Sorry to hunt you here or there, but any chance you get your Extensions short doc done? No doc = no callback probably :-)
Maxim 17-Oct-2009 [4515]	I really want to do it... but I'm so deep into parsing right now I don't want to loose the few GB of information in my brain's cache. I'm writing self-modifying parse rules and its pretty nightmarish. although it works.
Gabriele 17-Oct-2009 [4516]	Ah, 6 years of wait finally over. Now, if we could get those changes ported to R2...
Maxim 17-Oct-2009 [4517]	hehehe you finally got your DO ;-)
Henrik 17-Oct-2009 [4518]	Do we have a list of PARSE changes that are no longer compatible with R2? I think that would be important in porting parse scripts from R2 to R3.
Pekr 17-Oct-2009 [4519x3]	Gabriele - wrong perception :-) The correct claim should be - "An now nothing prevents me from fully switching to R3 ..." :-)
	An=And
	So - we don't need complementing to be enhanced? Because we talked about it, but it is not defined in proposal, it is not part of Carl's feature table, and I also got no reaction on R3 Chat ....
BrianH 17-Oct-2009 [4522x2]	Gabriele, these changes can be backported to R2 in the form of a rule compiler that generates (unreadable) R2 parse rules.
BrianH 17-Oct-2009 [4522x2]	Pekr, we still need complementing to be enhanced. Even Carl has said so.
Maxim 17-Oct-2009 [4524x3]	but wouldn't work with remark ;-)
	one situation which complemet can't handle very well (ram wise): union charset "a" complement charset "b"
	you end up with a full codepoint bitset minus one byte if it complemented or not
BrianH 17-Oct-2009 [4527x2]	Maxim, Remark could be adjusted to use the rule compiler. For that matter, Remark could use R2/Forward (which needs some work, but is already better than R2 on its own).
BrianH 17-Oct-2009 [4527x2]	Maxim, that is what Pekr was talking about. That is planned to be fixed.
Maxim 17-Oct-2009 [4529x4]	a rule compiler doesn't adapt very well to self-modifying rules
	laden with many paren expressions and a stack on top of it.
	the rule I am writing now actually does JIT rule compilation... hairy to debug :-)
	since I use binding to map inner rules which are also constructed on the fly but have to be pushed and poped from the stack as I traverse data... its a lot of fun :-D
BrianH 17-Oct-2009 [4533x2]	If the self-modifying rules are strung-together basic blocks, you can use the rule compiler to generate the blocks. And the R3 changes make self-modifying rules less necessary, so you can have even larger basic blocks.
BrianH 17-Oct-2009 [4533x2]	Of course the result of the compilation would be self-modifying rules :)
Maxim 17-Oct-2009 [4535x2]	really, the problem is not the parsing itself... its getting the darn rules to generate the proper rules hehehe.
Maxim 17-Oct-2009 [4535x2]	and its not simple parsing since I use parsing index manipulation, which is also dictated by the source data in encounters. its like swatting flies using a fly swatter at the end of a rope, while riding a roller coster which changes layout every time you ride it ;-)
BrianH 17-Oct-2009 [4537]	Which is what a rule compiler does :) Actually, it sounds like you could adapt the tricks of the ruule compiler to your rule compiler, which would let you use the new operations in your rule source and have the workarounds generated in the output.
Maxim 17-Oct-2009 [4538x2]	well, build it and I will try it ;-)
Maxim 17-Oct-2009 [4538x2]	I promise.
BrianH 17-Oct-2009 [4540]	It's on my list...
Maxim 17-Oct-2009 [4541x3]	my deadline is to have a site working by this week... unless this darned bug I am trying to kill doesn't kill me first.
	doh... when you're too close to the tree... you can't see the forest... I was using TO parse command on a rule ... this obviously won't work....
	(it only accepts a string... dummy :-)
Pekr 18-Oct-2009 [4544]	ah, got reply on Chat from Carl towards complementing: Re #5718: Pekr, that's a good question, and I think the answer must be YES. We need to be able to complement bitmaps in a nice way". Otherwise, Unicode bitmaps, even if simply used on ASCII chars, would take a lot of memory. This change should be listed on the project sheet, and if not, I'll add it there."
Chris 22-Oct-2009 [4545x3]	Is there any advantage in breaking up charsets that represent a large varied range of the 16-bit character space? For example, XML names are defined as below (excluding > 2 ** 16), but are most commonly limited to the ascii-friendly subset: w1: charset [ #"A" - #"Z" #"_" #"a" - #"z" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(02FF)" #"^(0370)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] w+: charset [ #"-" #"." #"0" - #"9" #"A" - #"Z" #"_" #"a" - #"z" #"^(B7)" #"^(C0)" - #"^(D6)" #"^(D8)" - #"^(F6)" #"^(F8)" - #"^(037D)" #"^(037F)" - #"^(1FFF)" #"^(200C)" - #"^(200D)" #"^(203F)" - #"^(2040)" #"^(2070)" - #"^(218F)" #"^(2C00)" - #"^(2FEF)" #"^(3001)" - #"^(D7FF)" #"^(f900)" - #"^(FDCF)" #"^(FDF0)" - #"^(FFFD)" ] word: [w1 any w+]
	(sorry if that looks messy)
	Both w1 and w+ appear to be very large values. Would it be smart to perhaps do: [[aw1 \| w1] any [aw+ \| w+]] Where 'aw1 and 'aw+ are limited to ascii values?
Steeve 22-Oct-2009 [4548x4]	Uses R3 (and his optimized complemented bitsets)
	Anyway, a bitset with a length of 2 ** 16 is not so huge in memory (only 16kb)
	64 Kb , sorry
	So W1 + W+ = 128Kb Is this a problem ?
Chris 22-Oct-2009 [4552]	That's what I'm asking. Complemented bitsets wouldn't make a difference here though as the excluded range is of similar scope, right?
Steeve 22-Oct-2009 [4553x2]	It seems
Steeve 22-Oct-2009 [4553x2]	if the size is a problem you can build a function to test each range. But It will be slow
Chris 22-Oct-2009 [4555x3]	Not size, efficiency.
	Allowing 'into to look inside strings can break current usage of 'into, requiring [and any-block! into ...]
	An example: a nested d: [k v] structure where 'k is a word and 'v is 'd or any other type: data: [k [k "s"]] R2, you can validate with d: [word! [into d \| skip]] Now you have to specify: d: [word! [and any-block! into d \| skip]] otherwise you get an error if 'v is a string!
Sunanda 25-Oct-2009 [4558]	I guess parse can do this too? http://stackoverflow.com/questions/1621906/is-there-a-way-to-split-a-string-by-every-nth-seperator-in-python
Will 25-Oct-2009 [4559]	is R2/Forward available for download? thx
Geomol 25-Oct-2009 [4560x2]	Sunanda, one way: >> out: clear [] >> parse "this-is-a-string" [mark1: any [thru "-" [to "-" \| to end] mark2: (append out copy/part mark1 mark2) skip mark1:]] >> out == ["this-is" "a-string"]
Geomol 25-Oct-2009 [4560x2]	Another: >> out: parse "this-is-a-string" "-" >> forall out [change/part out rejoin [out/1 "-" out/2] 2] >> out == ["this-is" "a-string"]
Steeve 25-Oct-2009 [4562]	R3 one liner ;-) >> map-each [a b] parse "this-is-a-string" "-" [ajoin [a #"-" b]]
older newer	first last