r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Ladislav
18-Oct-2010
[5306]
R3 can let you define that typeset and use it any time you like
Henrik
18-Oct-2010
[5307]
ok, that is possibly good enough for generating specs.
Gregg
18-Oct-2010
[5308]
I don't remember what all we did Henrik, but some of our test generation 
stuff on another world had some support for typesets IIRC.
Henrik
18-Oct-2010
[5309]
Gregg, ok
Steeve
18-Oct-2010
[5310]
Henrik, with a parse rule ?
Henrik
18-Oct-2010
[5311]
Steeve, yes.
Steeve
18-Oct-2010
[5312]
R3 does it
AdrianS
18-Oct-2010
[5313]
Graham, try http://gskinner.com/RegExrfor working out regexes. It 
has a really nice UI where you can hover over the components of the 
regex and see exactly what they do.
GrahamC
18-Oct-2010
[5314]
Thanks
Sunanda
4-Nov-2010
[5315]
Question on StackOverflow.....there must be a better answer than 
mine, and I'd suspect it involves PARSE (better answers usually do:)

    http://stackoverflow.com/questions/4093714/is-there-finer-granularity-than-load-next-for-reading-structured-data
GrahamC
4-Nov-2010
[5316x3]
Use fixed length records
Anyone got a parse rule that strips out everything between tags in 
an "xml" document
whitespace: charset [ "^/^- " ]
    swsp: [ any whitespace ]
    result: copy ""

    parse/all pqri-xml  [ some [ copy t thru ">" (append result t) swsp 
    to "<" ]]
Ladislav
4-Nov-2010
[5319]
Posted an answer mentioning the test framework, which does almost 
exactly what Fork asked
Gabriele
5-Nov-2010
[5320x3]
also, Carl's clean-script and script colorizer use parse + load/next 
to do the same thing. my Wetan uses the same method.
http://www.colellachiara.com/soft/MD3/emitters/wetan.html#section-4.2
basically, as long as you skip over [, (, ), and ] you can just use 
load/next. I'm also skipping over #[ because I want to preserve literal 
values while formatting (that is, preserve what the user typed)
Oldes
1-Dec-2010
[5323]
How to use the new INTO parse keyword? Could it be used to avoid 
the temp parse like in this (very simplified example)?

  parse "<a>123</a>" [thru "<a>" copy tmp to "</a>" (probe tmp  probe 
  parse tmp ["123"]) to end]

Note that I know that in this example it's easy to use just one parse 
and avoid the temp.
Ladislav
1-Dec-2010
[5324x3]
INTO is neither new, not it is meant for string parsing
You can take advantage of using it when parsing a block and needing 
to parse a subblock (of any-block! type) or a substring
(of the said block)
Oldes
1-Dec-2010
[5327]
can you give me a simple example, please?
Ladislav
1-Dec-2010
[5328x2]
>> parse [a b "123" c] [2 word! into [3 skip] word!]
== true
>> parse [a b c/d/e] [2 word! into [3 word!]]
== true
Oldes
1-Dec-2010
[5330x2]
I understand now, thanks.
it's very useful, I woder why I've not found it earlier :)
Ladislav
1-Dec-2010
[5332]
The substring property is just a recent addition
Oldes
1-Dec-2010
[5333]
And is there any nice solution for my string parsing above? I can 
live with the temps, just was thinking if it could be done better.. 
anyway, at least I know how to use INTO:)
Ladislav
1-Dec-2010
[5334x2]
That is normally a "job" for a subrule
it looks, that you could use e.g. the REJECT keyword
Oldes
1-Dec-2010
[5336x2]
I know, but that would require complex rules, I'm lazy parser:) Btw.. 
my real example looks like:
    some [
        thru {<h2><a} thru ">" copy name to {<}
        copy doc to {^/ </div>} (
            parse doc [
                thru {<pre class="code">} copy code to {</pre} (
                    probe name
                    probe code
                )
                any [
                    thru {<h5>} copy arg to {<}
                    thru {<ol><p>} copy arg-desc to {</p></ol>}

                    ( printf ["  * " 10 " - "] reduce [arg arg-desc] )
                ]
            ]
        )
    ]
Never mind, I can live with current way anyway.. I was just wondering 
if the INTO is not intended for such a cases. Now I know it isn't.
Ladislav
1-Dec-2010
[5338x3]
For comparison, a similar rule can be written as follows:

some [
	thru {<h2><a} thru ">" copy name to {<}
	copy doc any [
		and {^/ </div>} break
		| thru {<pre class="code">} copy code to {</pre} (
	          	probe name
	      		probe code
	       	)
			any [
	            thru {<h5>} copy arg to {<}
	            thru {<ol><p>} copy arg-desc to {</p></ol>}
	            (printf ["  * " 10 " - "] reduce [arg arg-desc])
	  		]
  		| skip
	]
]
Aha, sorry, that is not similar enough :-( To be similar, it should 
look as follows, I guess:

some [
    thru {<h2><a} thru ">" copy name to {<}
    copy doc any [
        thru {<pre class="code">} copy code to {</pre} (
            probe name
            probe code
        )
        any [
            thru {<h5>} copy arg to {<}
            thru {<ol><p>} copy arg-desc to {</p></ol>}
            (printf ["  * " 10 " - "] reduce [arg arg-desc])
        ]
  		to {^/ </div>}
	]
]
Still not cigar, third time:

some [
    thru {<h2><a} thru ">" copy name to {<}
    copy doc [
        thru {<pre class="code">} copy code to {</pre} (
            probe name
            probe code
        )
        any [
            thru {<h5>} copy arg to {<}
            thru {<ol><p>} copy arg-desc to {</p></ol>}
            (printf ["  * " 10 " - "] reduce [arg arg-desc])
        ]
  		to {^/ </div>}
	]
]
Oldes
1-Dec-2010
[5341x2]
That's not correct.. there is a reason for the temp parse and that's 
here because thru "<h5" would skip out of the div.
the DOC is just the temp var for the second parse.
Ladislav
1-Dec-2010
[5343]
But, in that case your "inner parse" fails, without you noticing 
it?
Oldes
1-Dec-2010
[5344x2]
why? it does not fails.. or maybe fails, but I have the data from 
the doc div, that's all.. it's lazy parsing :)
btw.. I need to parse the source only once so I really don't have 
to care about some exceptions.
Ladislav
1-Dec-2010
[5346]
I have the data
 - I doubt you get the data if the "inner parse" fails
Oldes
1-Dec-2010
[5347x2]
believe me I have.. :) the script is already ready.. I was just thinking 
if there is some special parse keyword, like INTO, so I could do 
it without the second parse next time, that's all. I use such a lazy 
parsing very often.
in your case I would need to jump at least over each tag start, not 
using thru "<h5". But then there would be problem, that I need to 
stop the doc div only if it's exactly "^/ </div" (to avoid case that 
there would be another inner giv). I know it's not safe, but I can 
see what I do by examining the source I want to parse first. (240kB 
html in my case)
Ladislav
1-Dec-2010
[5349]
Aha, that "I can see what I do by examining..." looks substantial. 
Nevertheless, there is still a way how to do a similar thing without 
calling Parse again
Oldes
1-Dec-2010
[5350]
I believe, but important is if it would be easy enough to satisfy 
 my lazines... something like the INTO for block parsing.
Ladislav
1-Dec-2010
[5351x4]
what about this, is it the rule you wanted?

some [
    thru {<h2><a} thru ">" copy name to {<}
    to {^/ </div>} doc: [
        thru {<pre class="code">} copy code to {</pre} (
            probe name
            probe code
        )
        any [
            here: if (lesser? index? here index? doc)
            thru {<h5>} copy arg to {<}
            thru {<ol><p>} copy arg-desc to {</p></ol>}
            (printf ["  * " 10 " - "] reduce [arg arg-desc])
        ] 
    ]
    :doc
]
aha, I missed there should be doc-start and doc-end
some [
    thru {<h2><a} thru ">" copy name to {<}
    doc-start: to {^/ </div>} doc-end: :doc-start
	[
        thru {<pre class="code">} copy code to {</pre} (
            probe name
            probe code
        )
        any [
            thru {<h5>}
			here: if (lesser? index? here index? doc-end)
			copy arg to {<}
            thru {<ol><p>} copy arg-desc to {</p></ol>}
            (printf ["  * " 10 " - "] reduce [arg arg-desc])
        ] 
    ]
    :doc-end
]
Nevertheless, both the variant you posted, as well as the variant 
I posted parse a part of the text more than once. A variant parsing 
the text only once can be written as well.
Steeve
1-Dec-2010
[5355]
this should work with R3:

    some [
        thru {<h2><a} thru ">" copy name to {<}
        copy doc to {^/ </div>} :doc
		thru {<pre class="code">} copy code to {</pre} (
			probe name
			probe code
		)
		any [
			thru {<h5>} copy arg to {<}
			thru {<ol><p>} copy arg-desc to {</p></ol>}
			( printf ["  * " 10 " - "] reduce [arg arg-desc] )
		]
    ]

notice the :doc,  which allows to switch the current input parsed