World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Ladislav 18-Oct-2010 [5306]	R3 can let you define that typeset and use it any time you like
Henrik 18-Oct-2010 [5307]	ok, that is possibly good enough for generating specs.
Gregg 18-Oct-2010 [5308]	I don't remember what all we did Henrik, but some of our test generation stuff on another world had some support for typesets IIRC.
Henrik 18-Oct-2010 [5309]	Gregg, ok
Steeve 18-Oct-2010 [5310]	Henrik, with a parse rule ?
Henrik 18-Oct-2010 [5311]	Steeve, yes.
Steeve 18-Oct-2010 [5312]	R3 does it
AdrianS 18-Oct-2010 [5313]	Graham, try http://gskinner.com/RegExrfor working out regexes. It has a really nice UI where you can hover over the components of the regex and see exactly what they do.
GrahamC 18-Oct-2010 [5314]	Thanks
Sunanda 4-Nov-2010 [5315]	Question on StackOverflow.....there must be a better answer than mine, and I'd suspect it involves PARSE (better answers usually do:) http://stackoverflow.com/questions/4093714/is-there-finer-granularity-than-load-next-for-reading-structured-data
GrahamC 4-Nov-2010 [5316x3]	Use fixed length records
	Anyone got a parse rule that strips out everything between tags in an "xml" document
	whitespace: charset [ "^/^- " ] swsp: [ any whitespace ] result: copy "" parse/all pqri-xml [ some [ copy t thru ">" (append result t) swsp to "<" ]]
Ladislav 4-Nov-2010 [5319]	Posted an answer mentioning the test framework, which does almost exactly what Fork asked
Gabriele 5-Nov-2010 [5320x3]	also, Carl's clean-script and script colorizer use parse + load/next to do the same thing. my Wetan uses the same method.
	http://www.colellachiara.com/soft/MD3/emitters/wetan.html#section-4.2
	basically, as long as you skip over [, (, ), and ] you can just use load/next. I'm also skipping over #[ because I want to preserve literal values while formatting (that is, preserve what the user typed)
Oldes 1-Dec-2010 [5323]	How to use the new INTO parse keyword? Could it be used to avoid the temp parse like in this (very simplified example)? parse "<a>123</a>" [thru "<a>" copy tmp to "</a>" (probe tmp probe parse tmp ["123"]) to end] Note that I know that in this example it's easy to use just one parse and avoid the temp.
Ladislav 1-Dec-2010 [5324x3]	INTO is neither new, not it is meant for string parsing
	You can take advantage of using it when parsing a block and needing to parse a subblock (of any-block! type) or a substring
	(of the said block)
Oldes 1-Dec-2010 [5327]	can you give me a simple example, please?
Ladislav 1-Dec-2010 [5328x2]	>> parse [a b "123" c] [2 word! into [3 skip] word!] == true
Ladislav 1-Dec-2010 [5328x2]	>> parse [a b c/d/e] [2 word! into [3 word!]] == true
Oldes 1-Dec-2010 [5330x2]	I understand now, thanks.
Oldes 1-Dec-2010 [5330x2]	it's very useful, I woder why I've not found it earlier :)
Ladislav 1-Dec-2010 [5332]	The substring property is just a recent addition
Oldes 1-Dec-2010 [5333]	And is there any nice solution for my string parsing above? I can live with the temps, just was thinking if it could be done better.. anyway, at least I know how to use INTO:)
Ladislav 1-Dec-2010 [5334x2]	That is normally a "job" for a subrule
Ladislav 1-Dec-2010 [5334x2]	it looks, that you could use e.g. the REJECT keyword
Oldes 1-Dec-2010 [5336x2]	I know, but that would require complex rules, I'm lazy parser:) Btw.. my real example looks like: some [ thru {<h2><a} thru ">" copy name to {<} copy doc to {^/ </div>} ( parse doc [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} ( printf [" * " 10 " - "] reduce [arg arg-desc] ) ] ] ) ]
Oldes 1-Dec-2010 [5336x2]	Never mind, I can live with current way anyway.. I was just wondering if the INTO is not intended for such a cases. Now I know it isn't.
Ladislav 1-Dec-2010 [5338x3]	For comparison, a similar rule can be written as follows: some [ thru {<h2><a} thru ">" copy name to {<} copy doc any [ and {^/ </div>} break \| thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] \| skip ] ]
	Aha, sorry, that is not similar enough :-( To be similar, it should look as follows, I guess: some [ thru {<h2><a} thru ">" copy name to {<} copy doc any [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] to {^/ </div>} ] ]
	Still not cigar, third time: some [ thru {<h2><a} thru ">" copy name to {<} copy doc [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] to {^/ </div>} ] ]
Oldes 1-Dec-2010 [5341x2]	That's not correct.. there is a reason for the temp parse and that's here because thru "<h5" would skip out of the div.
Oldes 1-Dec-2010 [5341x2]	the DOC is just the temp var for the second parse.
Ladislav 1-Dec-2010 [5343]	But, in that case your "inner parse" fails, without you noticing it?
Oldes 1-Dec-2010 [5344x2]	why? it does not fails.. or maybe fails, but I have the data from the doc div, that's all.. it's lazy parsing :)
Oldes 1-Dec-2010 [5344x2]	btw.. I need to parse the source only once so I really don't have to care about some exceptions.
Ladislav 1-Dec-2010 [5346]	I have the data - I doubt you get the data if the "inner parse" fails
Oldes 1-Dec-2010 [5347x2]	believe me I have.. :) the script is already ready.. I was just thinking if there is some special parse keyword, like INTO, so I could do it without the second parse next time, that's all. I use such a lazy parsing very often.
Oldes 1-Dec-2010 [5347x2]	in your case I would need to jump at least over each tag start, not using thru "<h5". But then there would be problem, that I need to stop the doc div only if it's exactly "^/ </div" (to avoid case that there would be another inner giv). I know it's not safe, but I can see what I do by examining the source I want to parse first. (240kB html in my case)
Ladislav 1-Dec-2010 [5349]	Aha, that "I can see what I do by examining..." looks substantial. Nevertheless, there is still a way how to do a similar thing without calling Parse again
Oldes 1-Dec-2010 [5350]	I believe, but important is if it would be easy enough to satisfy my lazines... something like the INTO for block parsing.
Ladislav 1-Dec-2010 [5351x4]	what about this, is it the rule you wanted? some [ thru {<h2><a} thru ">" copy name to {<} to {^/ </div>} doc: [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ here: if (lesser? index? here index? doc) thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] ] :doc ]
	aha, I missed there should be doc-start and doc-end
	some [ thru {<h2><a} thru ">" copy name to {<} doc-start: to {^/ </div>} doc-end: :doc-start [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} here: if (lesser? index? here index? doc-end) copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] ] :doc-end ]
	Nevertheless, both the variant you posted, as well as the variant I posted parse a part of the text more than once. A variant parsing the text only once can be written as well.
Steeve 1-Dec-2010 [5355]	this should work with R3: some [ thru {<h2><a} thru ">" copy name to {<} copy doc to {^/ </div>} :doc thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} ( printf [" * " 10 " - "] reduce [arg arg-desc] ) ] ] notice the :doc, which allows to switch the current input parsed
older newer	first last