World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Ladislav 4-Oct-2009 [4422]	Some may prefer Steeve's explanation, which looks very good to me.
Maxim 5-Oct-2009 [4423]	pekr, I had the same initial reaction, then realized that it would not be consistent wrt fail or no fail... when NOT would succeed a match (and fail the rule), the input would be beyond what the not is usefull for. when I started thinking about it, if you really want you can simply use a set word/get word pair to advance when the not finds a match to ignore a rule, but then its like not using 'NOT in the first place, so its pointless :-)
Pekr 5-Oct-2009 [4424]	not advancing NOT is not that much useful imo. I know that it can't be technically done, but anyway ...
Steeve 6-Oct-2009 [4425]	Guys, i think your opinion about NOT is a little harsh. In the case of complementing a charset, you just have to SKIP after the NOT rule. In the other cases, not advancing is of better use. At least that's what I see while rewriting some scripts as I do now.
Pekr 6-Oct-2009 [4426]	The case is, that advancing can't be done in fact. It is just some psychological apsect, which leads some of us to think, that the output should be advanced, because we are looking for the complementing feature ...
Steeve 6-Oct-2009 [4427x2]	Ok but could you give a case (other than complemented charsets which can be easly skiped) where you found that advancing is more convenient with a complemented rule. I mean, i can't conceive that such complemented rule would be actually easier to read.
Steeve 6-Oct-2009 [4427x2]	Or easier to write...
PeterWood 6-Oct-2009 [4429x2]	Skip is very slow compared with complemented charsets.
PeterWood 6-Oct-2009 [4429x2]	If you are parsing a string skip only advances one character at a time. When I wrote make-word-list for rebol.org using a complemented charset gave a big performance improvement. Though you may be able to suggest better ways to optimise it - http://www.rebol.org/view-script.r?script=make-word-list.r
Steeve 6-Oct-2009 [4431]	I can have a look, but the purpose of NOT is not to have better perfs than complemented charset, but to allow some simplification when writing rules. Actually, It's the case of most other improvements, easier to write, not inevitably faster. And don't forget that safe complemented charset in R3 are a pain in the ass to construct, because of UTF-8
PeterWood 6-Oct-2009 [4432x2]	Which is why I was dissapointed that I apparently misunderstood from Carl's blog: Changes that are critical, but not highly complicated. For example, providing a NOT command seems easy enough, and it is now critical because using complemented charsets is problematic (due to the Unicode enhancements).
PeterWood 6-Oct-2009 [4432x2]	This is the link to the blog - http://www.rebol.net/r3blogs/0155.html
Steeve 6-Oct-2009 [4434x2]	Well, i saw your script, i don't know if it can be faster, i only can say I would have written it differently. Probaby, using parse and load/next for all normal rebol values. I can see that your rule about matching binaries are false. Cause [#{" thru #"}"] is wrong (what if the the binary contains the #"^}" char ?)
Steeve 6-Oct-2009 [4434x2]	Forget my last remark, a binary can't contain #"^}" but a multi string yes
BrianH 6-Oct-2009 [4436x2]	Peter, Steeve, the original problem that started the parse proposals was the problem of complimenting charsets. However, it quickly changed to improving PARSE in general. Then, while we were waiting for the parse proposals to come up on the todo list, we came up with a better solution to complimenting charsets, which is not yet implemented and which is not limited to PARSE.
BrianH 6-Oct-2009 [4436x2]	Or maybe it's complementing? How would you compliment a charset? Say it has good coverage?
Pekr 6-Oct-2009 [4438]	BrainH: what is the method to get complementing like feature back? I havent seen anything discussed in that manner yet ...
BrianH 6-Oct-2009 [4439x2]	Using a bit in the charset that would mark it as "complemented", and then all of its matching algorithms would do an internal not.
BrianH 6-Oct-2009 [4439x2]	The discussion was primarily in the comments of the relevant tickets in CureCode, though there was some in R3 chat. Not recently.
Pekr 6-Oct-2009 [4441]	If parse was already rewritten from the ground-up, couldn't it be directly adapted to allow streamed parsing? :-) You said it could be added later, when rewrite happens, so ...
BrianH 6-Oct-2009 [4442]	Yeah, but by "later" I meant after 3.0 comes out. I haven't even discussed with Carl what would be involved yet.
Pekr 6-Oct-2009 [4443]	ok
BrianH 6-Oct-2009 [4444x2]	I want to write more port code first and refine the model based on what I learn.
BrianH 6-Oct-2009 [4444x2]	Plus, Carl's rewrite didn't change the basic algorithm of PARSE, just some details. I don't yet know whether my port PARSE is that easy.
Maxim 7-Oct-2009 [4446x2]	even with a complement bit, if you use a few chained union's & complements, etc, you'll eventually need to bake it... in the end, all the complement bit is usefull for is to keep the charset to below half the maximum size of the complete encoding.
Maxim 7-Oct-2009 [4446x2]	(which is still pretty big AFAICT)
BrianH 12-Oct-2009 [4448]	Behavior of BREAK, ANY and SOME decided, finally: http://www.rebol.net/r3blogs/0270.html
Steeve 12-Oct-2009 [4449x2]	Yesssss, the return of BREAK !!!!
Steeve 12-Oct-2009 [4449x2]	i definitly didn't love the previous way to exit from a loop
BrianH 12-Oct-2009 [4451]	And it's finally break from a loop, rather than break from a block (supposedly).
Maxim 12-Oct-2009 [4452x3]	is it just me or Carl is sidestepping the n BREAK proposal?
	or BREAK/depth n in case of a call within a loop.
	(loop in a function, not parse)
BrianH 12-Oct-2009 [4455]	Yes, that's unlikely to be implemented. He says it doesn't fit in with the rest. Same with n FAIL.
Maxim 12-Oct-2009 [4456x2]	but its a hell of a powerfull addition to parse and to general code control. I don't see why Carl can't see any use for it.
Maxim 12-Oct-2009 [4456x2]	being able to drop from foreach [foreach [foreach [....]]] in one call is sooo usefull soo often.
BrianH 12-Oct-2009 [4458]	That proposal was only for PARSE, not for function loops.
Maxim 12-Oct-2009 [4459x2]	I know, but its as usefull within code.
Maxim 12-Oct-2009 [4459x2]	within parse its a great optimisation for in-line parsing.
BrianH 12-Oct-2009 [4461]	And you can do that with CATCH.
Steeve 12-Oct-2009 [4462]	yep, and for functions, you still got THROW/CATCH and RETURN, which are enough to my mind.
Maxim 12-Oct-2009 [4463]	n break allows you to tell the parser that you DON'T want any backtracking. its a way to optimize rules for speed, if nothing else.
Steeve 12-Oct-2009 [4464]	that's an optimization I agree, but all the proposals are the same, optimizations.
BrianH 12-Oct-2009 [4465]	The BREAK, THROW, RETURN, EXIT, HALT and QUIT functions are implemented the same way, just with different error codes.
Maxim 12-Oct-2009 [4466x2]	obviously n BREAK can be simulated using longer, non-recursive rules.
Maxim 12-Oct-2009 [4466x2]	but n BREAK allows us to leverage smaller rules reuse, as if they where large complex rules and still benefit from the same speed of a root rule backtrack.
BrianH 12-Oct-2009 [4468]	I think that Carl is trying to balance speed, ease of use, and debugability. In practice n BREAK would be tricky to debug, and doesn't actually reflect what PARSE does internally. Apparently PARSE isn't actually recursive descent - it just fakes it with a state machine.
Maxim 12-Oct-2009 [4469]	yeah, maybe its just really hard to implement based on the parse algorythm... so not worth the time to implement.
BrianH 12-Oct-2009 [4470]	Plus, it makes the code flow really tricky to understand. You aren't doing your later maintainer of your code any favors (even if it's you).
Pekr 12-Oct-2009 [4471]	I am not sure why "to end" attempted several times does not fail? Simply put - if you put any rule, it consumes the input, so I would expect, that once at the end of th input = the rule was matches, second call of "to end" should fail, no? It does not correspond to "to "abc"", which called consecutively would try to find ANOTHER match for "abc", not just the same. I don't see a reason, why "to end" should have an exception. It should imo definitely cause termination.
older newer	first last