r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

PeterWood
6-Oct-2009
[4430]
If you are parsing a string skip only advances one character at a 
time. When I wrote make-word-list for rebol.org using a complemented 
charset gave a big performance improvement.


Though you may be able to suggest better ways to optimise it - http://www.rebol.org/view-script.r?script=make-word-list.r
Steeve
6-Oct-2009
[4431]
I can have a look, but the purpose of NOT is not to have better perfs 
than complemented charset, but to allow some simplification when 
writing rules.

Actually, It's the case of most other improvements, easier to write, 
not inevitably faster.

And don't forget that safe complemented charset in R3 are a pain 
in the ass to construct, because of UTF-8
PeterWood
6-Oct-2009
[4432x2]
Which is why I was dissapointed that I apparently misunderstood from 
Carl's blog:


Changes that are critical, but not highly complicated. For example, 
providing a NOT command seems easy enough, and it is now critical 
because using complemented charsets is problematic (due to the Unicode 
enhancements). 
This is the link to the blog - http://www.rebol.net/r3blogs/0155.html
Steeve
6-Oct-2009
[4434x2]
Well, i saw your script, i don't know if it can be faster, i only 
can say I would have written it differently.
Probaby, using parse and load/next for all normal rebol values.

I can see that your rule about matching binaries are false. Cause 
[#{" thru #"}"] is wrong (what if the the binary contains the #"^}" 
char ?)
Forget my last remark, a binary can't contain #"^}" but a multi string 
yes
BrianH
6-Oct-2009
[4436x2]
Peter, Steeve, the original problem that started the parse proposals 
was the problem of complimenting charsets. However, it quickly changed 
to improving PARSE in general. Then, while we were waiting for the 
parse proposals to come up on the todo list, we came up with a better 
solution to complimenting charsets, which is not yet implemented 
and which is not limited to PARSE.
Or maybe it's complementing? How would you compliment a charset? 
Say it has good coverage?
Pekr
6-Oct-2009
[4438]
BrainH: what is the method to get complementing like feature back? 
I havent seen anything discussed in that manner yet ...
BrianH
6-Oct-2009
[4439x2]
Using a bit in the charset that would mark it as "complemented", 
and then all of its matching algorithms would do an internal not.
The discussion was primarily in the comments of the relevant tickets 
in CureCode, though there was some in R3 chat. Not recently.
Pekr
6-Oct-2009
[4441]
If parse was already rewritten from the ground-up, couldn't it be 
directly adapted to allow streamed parsing? :-) You said it could 
be added later, when rewrite happens, so ...
BrianH
6-Oct-2009
[4442]
Yeah, but by "later" I meant after 3.0 comes out. I haven't even 
discussed with Carl what would be involved yet.
Pekr
6-Oct-2009
[4443]
ok
BrianH
6-Oct-2009
[4444x2]
I want to write more port code first and refine the model based on 
what I learn.
Plus, Carl's rewrite didn't change the basic algorithm of PARSE, 
just some details. I don't yet know whether my port PARSE is that 
easy.
Maxim
7-Oct-2009
[4446x2]
even with a complement bit, if you use a few chained union's & complements, 
etc, you'll eventually need to bake it... in the end, all the complement 
bit is usefull for is to keep the charset to below half the maximum 
size of the complete encoding.
(which is still pretty big AFAICT)
BrianH
12-Oct-2009
[4448]
Behavior of BREAK, ANY and SOME decided, finally: http://www.rebol.net/r3blogs/0270.html
Steeve
12-Oct-2009
[4449x2]
Yesssss, the return of BREAK !!!!
i definitly  didn't love the previous way to exit from a loop
BrianH
12-Oct-2009
[4451]
And it's finally break from a loop, rather than break from a block 
(supposedly).
Maxim
12-Oct-2009
[4452x3]
is it just me or Carl is sidestepping the n BREAK proposal?
or BREAK/depth n  in case of a call within a loop.
(loop in a function, not parse)
BrianH
12-Oct-2009
[4455]
Yes, that's unlikely to be implemented. He says it doesn't fit in 
with the rest. Same with n FAIL.
Maxim
12-Oct-2009
[4456x2]
but its a hell of a powerfull addition to parse and to general code 
control.  I don't see why Carl can't see any use for it.
being able to drop from foreach [foreach [foreach [....]]] in one 
call is sooo usefull soo often.
BrianH
12-Oct-2009
[4458]
That proposal was only for PARSE, not for function loops.
Maxim
12-Oct-2009
[4459x2]
I know, but its as usefull within code.
within parse its a great optimisation for in-line parsing.
BrianH
12-Oct-2009
[4461]
And you can do that with CATCH.
Steeve
12-Oct-2009
[4462]
yep, and for functions, you still got THROW/CATCH and RETURN, which 
are enough to my mind.
Maxim
12-Oct-2009
[4463]
n break allows you to tell the parser that you DON'T want any backtracking. 
 its a way to optimize rules for speed, if nothing else.
Steeve
12-Oct-2009
[4464]
that's an optimization I agree, but all the proposals are the same, 
optimizations.
BrianH
12-Oct-2009
[4465]
The BREAK, THROW, RETURN, EXIT, HALT and QUIT functions are implemented 
the same way, just with different error codes.
Maxim
12-Oct-2009
[4466x2]
obviously n BREAK can be simulated using longer, non-recursive rules.
but n BREAK allows us to leverage smaller rules reuse, as if they 
where large complex rules and still benefit from the same speed of 
a root rule backtrack.
BrianH
12-Oct-2009
[4468]
I think that Carl is trying to balance speed, ease of use, and debugability. 
In practice n BREAK would be tricky to debug, and doesn't actually 
reflect what PARSE does internally. Apparently PARSE isn't actually 
recursive descent - it just fakes it with a state machine.
Maxim
12-Oct-2009
[4469]
yeah, maybe its just really hard to implement based on the parse 
algorythm... so not worth the time to implement.
BrianH
12-Oct-2009
[4470]
Plus, it makes the code flow really tricky to understand. You aren't 
doing your later maintainer of your code any favors (even if it's 
you).
Pekr
12-Oct-2009
[4471]
I am not sure why "to end" attempted several times does not fail? 
Simply put - if you put any rule, it consumes the input, so I would 
expect, that once at the end of th input = the rule was matches, 
second call of "to end" should fail, no? It does not correspond to 
"to "abc"", which called consecutively would try to find ANOTHER 
match for "abc", not just the same. I don't see a reason, why "to 
end" should have an exception. It should imo definitely cause termination.
BrianH
12-Oct-2009
[4472x3]
Because you can't through the end, not even with THRU END. And once 
you reach the end, END always succeeds.
And TO "abc" will also continue to succeed, matching the same "abc" 
every time. THRU "abc" skips past the "abc" like you say.
you can't -> you can't go
Pekr
13-Oct-2009
[4475]
Are parse enhancements over, or do we get some other?
BrianH
13-Oct-2009
[4476]
Well, a89 isn't out yet (when last I checked). Beyond that, it depends 
on how Carl reacts to the recent blog on the parse plans.
Pekr
13-Oct-2009
[4477]
So according to his doc, we should get BREAK/RETURN and DO?
BrianH
13-Oct-2009
[4478]
We don't need BREAK/return anymore, but he can reuse the code he 
used to implement the RETURN paren operation. I hope we get DO, but 
Carl was iffy since few were puushing for it. Gabriele's been silent, 
which is weird since it was his idea.
Pekr
13-Oct-2009
[4479]
Carl should not judge upon his own usage of some features ...