r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Oldes
1-Dec-2010
[5330x2]
I understand now, thanks.
it's very useful, I woder why I've not found it earlier :)
Ladislav
1-Dec-2010
[5332]
The substring property is just a recent addition
Oldes
1-Dec-2010
[5333]
And is there any nice solution for my string parsing above? I can 
live with the temps, just was thinking if it could be done better.. 
anyway, at least I know how to use INTO:)
Ladislav
1-Dec-2010
[5334x2]
That is normally a "job" for a subrule
it looks, that you could use e.g. the REJECT keyword
Oldes
1-Dec-2010
[5336x2]
I know, but that would require complex rules, I'm lazy parser:) Btw.. 
my real example looks like:
    some [
        thru {<h2><a} thru ">" copy name to {<}
        copy doc to {^/ </div>} (
            parse doc [
                thru {<pre class="code">} copy code to {</pre} (
                    probe name
                    probe code
                )
                any [
                    thru {<h5>} copy arg to {<}
                    thru {<ol><p>} copy arg-desc to {</p></ol>}

                    ( printf ["  * " 10 " - "] reduce [arg arg-desc] )
                ]
            ]
        )
    ]
Never mind, I can live with current way anyway.. I was just wondering 
if the INTO is not intended for such a cases. Now I know it isn't.
Ladislav
1-Dec-2010
[5338x3]
For comparison, a similar rule can be written as follows:

some [
	thru {<h2><a} thru ">" copy name to {<}
	copy doc any [
		and {^/ </div>} break
		| thru {<pre class="code">} copy code to {</pre} (
	          	probe name
	      		probe code
	       	)
			any [
	            thru {<h5>} copy arg to {<}
	            thru {<ol><p>} copy arg-desc to {</p></ol>}
	            (printf ["  * " 10 " - "] reduce [arg arg-desc])
	  		]
  		| skip
	]
]
Aha, sorry, that is not similar enough :-( To be similar, it should 
look as follows, I guess:

some [
    thru {<h2><a} thru ">" copy name to {<}
    copy doc any [
        thru {<pre class="code">} copy code to {</pre} (
            probe name
            probe code
        )
        any [
            thru {<h5>} copy arg to {<}
            thru {<ol><p>} copy arg-desc to {</p></ol>}
            (printf ["  * " 10 " - "] reduce [arg arg-desc])
        ]
  		to {^/ </div>}
	]
]
Still not cigar, third time:

some [
    thru {<h2><a} thru ">" copy name to {<}
    copy doc [
        thru {<pre class="code">} copy code to {</pre} (
            probe name
            probe code
        )
        any [
            thru {<h5>} copy arg to {<}
            thru {<ol><p>} copy arg-desc to {</p></ol>}
            (printf ["  * " 10 " - "] reduce [arg arg-desc])
        ]
  		to {^/ </div>}
	]
]
Oldes
1-Dec-2010
[5341x2]
That's not correct.. there is a reason for the temp parse and that's 
here because thru "<h5" would skip out of the div.
the DOC is just the temp var for the second parse.
Ladislav
1-Dec-2010
[5343]
But, in that case your "inner parse" fails, without you noticing 
it?
Oldes
1-Dec-2010
[5344x2]
why? it does not fails.. or maybe fails, but I have the data from 
the doc div, that's all.. it's lazy parsing :)
btw.. I need to parse the source only once so I really don't have 
to care about some exceptions.
Ladislav
1-Dec-2010
[5346]
I have the data
 - I doubt you get the data if the "inner parse" fails
Oldes
1-Dec-2010
[5347x2]
believe me I have.. :) the script is already ready.. I was just thinking 
if there is some special parse keyword, like INTO, so I could do 
it without the second parse next time, that's all. I use such a lazy 
parsing very often.
in your case I would need to jump at least over each tag start, not 
using thru "<h5". But then there would be problem, that I need to 
stop the doc div only if it's exactly "^/ </div" (to avoid case that 
there would be another inner giv). I know it's not safe, but I can 
see what I do by examining the source I want to parse first. (240kB 
html in my case)
Ladislav
1-Dec-2010
[5349]
Aha, that "I can see what I do by examining..." looks substantial. 
Nevertheless, there is still a way how to do a similar thing without 
calling Parse again
Oldes
1-Dec-2010
[5350]
I believe, but important is if it would be easy enough to satisfy 
 my lazines... something like the INTO for block parsing.
Ladislav
1-Dec-2010
[5351x4]
what about this, is it the rule you wanted?

some [
    thru {<h2><a} thru ">" copy name to {<}
    to {^/ </div>} doc: [
        thru {<pre class="code">} copy code to {</pre} (
            probe name
            probe code
        )
        any [
            here: if (lesser? index? here index? doc)
            thru {<h5>} copy arg to {<}
            thru {<ol><p>} copy arg-desc to {</p></ol>}
            (printf ["  * " 10 " - "] reduce [arg arg-desc])
        ] 
    ]
    :doc
]
aha, I missed there should be doc-start and doc-end
some [
    thru {<h2><a} thru ">" copy name to {<}
    doc-start: to {^/ </div>} doc-end: :doc-start
	[
        thru {<pre class="code">} copy code to {</pre} (
            probe name
            probe code
        )
        any [
            thru {<h5>}
			here: if (lesser? index? here index? doc-end)
			copy arg to {<}
            thru {<ol><p>} copy arg-desc to {</p></ol>}
            (printf ["  * " 10 " - "] reduce [arg arg-desc])
        ] 
    ]
    :doc-end
]
Nevertheless, both the variant you posted, as well as the variant 
I posted parse a part of the text more than once. A variant parsing 
the text only once can be written as well.
Steeve
1-Dec-2010
[5355]
this should work with R3:

    some [
        thru {<h2><a} thru ">" copy name to {<}
        copy doc to {^/ </div>} :doc
		thru {<pre class="code">} copy code to {</pre} (
			probe name
			probe code
		)
		any [
			thru {<h5>} copy arg to {<}
			thru {<ol><p>} copy arg-desc to {</p></ol>}
			( printf ["  * " 10 " - "] reduce [arg arg-desc] )
		]
    ]

notice the :doc,  which allows to switch the current input parsed
Oldes
1-Dec-2010
[5356]
The last Ladislav's version is working, but it's far to be easy to 
use for lazy parsing. I think that I will stay with my version;-)
Ladislav
1-Dec-2010
[5357]
Use what suits your needs best. Nevertheless, as far as code size, 
etc. are compared, they are the same (even sharing the property, 
that the part of code is parsed twice).
BrianH
1-Dec-2010
[5358]
Was Carl's proposed LIMIT keyword implemented yet?
Ladislav
1-Dec-2010
[5359]
Not yet, I guess.
BrianH
1-Dec-2010
[5360x2]
That is what he proposed to deal with this issue. I look forward 
to it.
I use the new INTO string feature a lot with file management code.
Steeve
1-Dec-2010
[5362]
Hey you don't like my solution ? it's simple enough, don't need of 
LIMIT
BrianH
1-Dec-2010
[5363x2]
I like your solution, and would use solutions like that. I just would 
prefer to have LIMIT :)
It has a lot of overhead though (copy overhead).
Steeve
1-Dec-2010
[5365]
I knew, you would say that... :-)
BrianH
1-Dec-2010
[5366]
You have to be careful with INTO string though because there is a 
lot of PARSE code out there that depended on INTO failing with non-blocks, 
and triggering an alternation. Learn to like AND type INTO if your 
code depends on that.
Ladislav
1-Dec-2010
[5367]
Hey you don't like my solution ?

 - I guess, that Oldes does not like it, since it does not "stay in 
 the limit" of DOC
Steeve
1-Dec-2010
[5368]
He just has to switch again that. I don't think he even understood 
or read what I proposed
BrianH
1-Dec-2010
[5369]
Ah, I must have misunderstood the solution then. I thought it was 
a two-pass thing with subparsing of a copy.
Ladislav
1-Dec-2010
[5370]
aha, you would like to use the get-word to switch the input
Steeve
1-Dec-2010
[5371]
Brian, it is
Ladislav
1-Dec-2010
[5372]
I have seen that proposed, but it is not available currently (I would 
support such a proposal, though)
Steeve
1-Dec-2010
[5373x2]
Ladislav, It's working in R3 currently
since a while
Ladislav
1-Dec-2010
[5375]
Checking
BrianH
1-Dec-2010
[5376]
That's what I thought. I originally proposed that kind of input switching 
in 2000. It can cause problems with backtracking though, so a sub-parse 
in an IF operation can be safer.
Steeve
1-Dec-2010
[5377]
Lot of back tracking problems may arrise in a lot of way when you 
do parsing.
I'm not sure it's an argument :-)
BrianH
1-Dec-2010
[5378]
There are no backtracking problems with COPY x TO thing IF (parse 
x rule). But that was the original reason we didn't have input switching. 
The reason I requested input switching in the first place was to 
make it easier to implement the continuous parsing that Pekr was 
requesting at the time :)
Steeve
1-Dec-2010
[5379]
I don'"t see where is the problem, you just have to switch back the 
original serie if the sub-rule fail, no need of if (parse ...) thing