r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

BrianH
30-Sep-2009
[4195]
Carl *suggested* the find/not option :)
Maxim
30-Sep-2009
[4196]
'NOT  and the 'TO/'THRU multi.  its just soooo much simpler to slice 
and dice text using it, which is a large percentage of what parse 
is used for.


I remember Carl's historical reason that it made us lazy... hehehe... 
his CR.LF stripping example shows that he is quite lazy himself  
;-)
Steeve
30-Sep-2009
[4197]
TO/THRU multi is not so important in text parsing because we can 
use charsets
Ladislav
30-Sep-2009
[4198]
I am just curious, whether Carl intends to implement the full TO/THRU, 
acting on any subrule
Steeve
30-Sep-2009
[4199]
it's working now in a84
Ladislav
30-Sep-2009
[4200]
It is badluck, that recursive rules are not useful in fact, since 
the stack is too small :-(
BrianH
30-Sep-2009
[4201]
There are likely to be limits. I'm a bit shocked that he was able 
to push it as far as he did :)
Maxim
30-Sep-2009
[4202x2]
steeve, it is... you don't have to build a grammar, just find a set 
of words...
I also remember that Carl feared it would lead to people building 
RE-like slow parsers.
Ladislav
30-Sep-2009
[4204]
full TO/THRU is not implemented yet, trust me
BrianH
30-Sep-2009
[4205]
If you say so, Ladislav, then I look forward to what is to come :)
Steeve
30-Sep-2009
[4206x2]
working here

>> parse "abcd" [any [to ["c" | "b"] ?? skip]]
skip: "bcd"
skip: "cd"
== false
what do you mean by "full" implemented ?
Ladislav
30-Sep-2009
[4208x2]
well, but I do not know if Carl intends to implement the full TO/THRU, 
as I said...
what do I mean? the "full"

    a: [thru b]

can be defined as a "shorcut" for

    a: [b | skip a]


you can try, that this works for any rule B as far as you don't need 
to use too deep a recursion. As opposed to that, the THRU keyword 
does not accept any rule B, as documented in Carl''s blog
BrianH
30-Sep-2009
[4210]
Right. I would just be happy if he adds the not modifier, but the 
current capabilities are moree than I was expecting.
Maxim
30-Sep-2009
[4211]
actually, to/thru multi, is like an ANY with an embedded skip... 
I don't think there is any slowness related to it.
coudn't thru be implemented this way?

skip-rule: [skip]

thru:   [any [ ["c" (skip-rule: [fail]) |  "b" (skip-rule: [fail])] 
skip-rule]]
Steeve
30-Sep-2009
[4212]
but the A: [b | skip A] is weird, i never do that to avoid the stack 
limit error.
this instead:
a: [
    any [
       b break
       | skip
    ]
]
Maxim
30-Sep-2009
[4213]
the above has the advantage of not requiring stack.
Steeve
30-Sep-2009
[4214]
you don't know break Maxim ;-)
Ladislav
30-Sep-2009
[4215x3]
Yes, Steeve, I used it just the recursive expression just for demonstration 
purposes (recursion can be even used to demonstrate how ANY works). 
OTOH, it is a problem to e.g. define a non-recursive rule to detect 
correctly parenthesized string containing just #"(" and #")"
So, another problem of Parse is, that recursion does not work as 
well as it should :-(
(or, as well as may be needed)
BrianH
30-Sep-2009
[4218]
Do you mean the USE proposal, or does it suck in other ways?
Ladislav
30-Sep-2009
[4219x2]
just this: correct-paren: [any [#"(" correct-paren #")"]] - you can 
use it to parse strings, but just "short ones"
or, rather, just the "shallow ones"
BrianH
30-Sep-2009
[4221x2]
Ah yes. For long ones you need a counter in a production.
Which can be checked with IF (condition) now :)
Ladislav
30-Sep-2009
[4223]
hmm, counter does not help you in case like:


correct-bracket: [any [#"(" correct-bracket #")" | #"[" correct-bracket 
#"]"]]
Maxim
30-Sep-2009
[4224]
what is the problem with correct-paren:  ?  you mean it can't go 
beyond  n recursions?  or is it something else?
BrianH
30-Sep-2009
[4225x2]
Not by itself it doesn't. It does kinda make things tricky that recursion 
is implemented with recursion.
Rather than with CPS or tables.
Ladislav
30-Sep-2009
[4227]
yes, Max, the problem is, that the rule does not work when applied 
to deep recursive data
Maxim
30-Sep-2009
[4228]
ok, well that is a general problem of parse, (and of REBOL in general, 
I might add).
BrianH
30-Sep-2009
[4229]
Yup, recursion is bounded and not tail-call optimized :(
Maxim
30-Sep-2009
[4230]
it would be nice to be able to have control over the interpreter 
stack...
Steeve
30-Sep-2009
[4231]
well, i must say i never have such problems, i use rules adapted 
to incremental parsing to avoid too deep recursions.
BrianH
30-Sep-2009
[4232]
For those who think PARSE isn't tricky enough to use now, that would 
be great, Maxim.
Steeve
30-Sep-2009
[4233]
i agree
Maxim
30-Sep-2009
[4234]
still, I've never had problems in real-life data so far..., but I 
tend to build very wide & flat rules, instead of using recursive 
rules... I guess that's why.
Steeve
30-Sep-2009
[4235]
It would avoid that i manage my own stack most of the time
Maxim
30-Sep-2009
[4236]
any one done any metrics on actually parse depth limits?
Ladislav
30-Sep-2009
[4237]
re the tail-call optimization: some recursive rules, like e.g. the 
above correct-bracket rule are not tail-optimizable; OTOH, I already 
programmed a couple of algorithms, where I used my own stack to make 
sure it is as big as needed
BrianH
30-Sep-2009
[4238x3]
Ladislav, counters would have to be combined with an explicit stack 
in the mixed bracket case.
As you already know, of course :)
It would be like run-length encoding your explicit stack.
Steeve
30-Sep-2009
[4241]
I remember having requested 3 commands to speed up the management 
of  our own stack in the past [push, pop and rollback]
Maxim
30-Sep-2009
[4242x2]
remark currently uses a stack and stores inner parsing explicitely, 
with offsets& stuff while flattening the inner-most tags first, this 
allows the whole engine to use a single flat 'ANY [ dozens | of | 
rules ].  it loads rebol dialect data within nested < > pairs in 
html ...  works well, but its pretty complex to setup.  would be 
nice to have something standard to code against.
something user types would be well suited for in R3  :-)
BrianH
30-Sep-2009
[4244]
Yeah, I caught the "read-only access to the stack" request too. Useless 
for incremental parsing without being able to restore, basically 
parsing continuations. Which would require a ground-up rewrite of 
PARSE.