World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Ladislav 22-Sep-2010 [5295x2]	, since I guess, that this way, he will not have to just go into an "unknown territory"
Ladislav 22-Sep-2010 [5295x2]	I must say, that I was actually surprised, how people (including me) have struggled to circumvent this problem, while having such an elegant way available to solve it.
GrahamC 18-Oct-2010 [5297]	a regex question ... ([0-9]{4})(-([0-9]{2})(-([0-9]{2})(T([0-9]{2}):([0-9]{2})(:([0-9]{2})(\.([0-9]+))?)?(Z\|(([-+])([0-9]{2}):([0-9]{2})))))) is apparently failing this string : 2010-10-18T07:06:25.00Z What tool can I use to check this string against this regex ?
Sunanda 18-Oct-2010 [5298]	Regexlib has a different ISO-8601 date matching regex: http://regexlib.com/REDetails.aspx?regexp_id=2092 And the ability to enter any regex and target strings to test what happens: http://regexlib.com/RETester.aspx?
GrahamC 18-Oct-2010 [5299x2]	found this one too http://www.fileformat.info/tool/regex.htm
GrahamC 18-Oct-2010 [5299x2]	and it seems my string is passing ... hmm
Sunanda 18-Oct-2010 [5301]	The problem with regexes is they are impossible to debug.....Best just to rewrite continually until they work :)
GrahamC 18-Oct-2010 [5302]	I'm trying to validate some XML against an online validator and it's rejecting my dates :(
Henrik 18-Oct-2010 [5303]	how do you specify an element to be of the type any-type! except none! ?
Ladislav 18-Oct-2010 [5304]	I am afraid, that you need to list all types excluding none
Henrik 18-Oct-2010 [5305]	does R3 solve this? if not, maybe that would be a good problem to solve.
Ladislav 18-Oct-2010 [5306]	R3 can let you define that typeset and use it any time you like
Henrik 18-Oct-2010 [5307]	ok, that is possibly good enough for generating specs.
Gregg 18-Oct-2010 [5308]	I don't remember what all we did Henrik, but some of our test generation stuff on another world had some support for typesets IIRC.
Henrik 18-Oct-2010 [5309]	Gregg, ok
Steeve 18-Oct-2010 [5310]	Henrik, with a parse rule ?
Henrik 18-Oct-2010 [5311]	Steeve, yes.
Steeve 18-Oct-2010 [5312]	R3 does it
AdrianS 18-Oct-2010 [5313]	Graham, try http://gskinner.com/RegExrfor working out regexes. It has a really nice UI where you can hover over the components of the regex and see exactly what they do.
GrahamC 18-Oct-2010 [5314]	Thanks
Sunanda 4-Nov-2010 [5315]	Question on StackOverflow.....there must be a better answer than mine, and I'd suspect it involves PARSE (better answers usually do:) http://stackoverflow.com/questions/4093714/is-there-finer-granularity-than-load-next-for-reading-structured-data
GrahamC 4-Nov-2010 [5316x3]	Use fixed length records
	Anyone got a parse rule that strips out everything between tags in an "xml" document
	whitespace: charset [ "^/^- " ] swsp: [ any whitespace ] result: copy "" parse/all pqri-xml [ some [ copy t thru ">" (append result t) swsp to "<" ]]
Ladislav 4-Nov-2010 [5319]	Posted an answer mentioning the test framework, which does almost exactly what Fork asked
Gabriele 5-Nov-2010 [5320x3]	also, Carl's clean-script and script colorizer use parse + load/next to do the same thing. my Wetan uses the same method.
	http://www.colellachiara.com/soft/MD3/emitters/wetan.html#section-4.2
	basically, as long as you skip over [, (, ), and ] you can just use load/next. I'm also skipping over #[ because I want to preserve literal values while formatting (that is, preserve what the user typed)
Oldes 1-Dec-2010 [5323]	How to use the new INTO parse keyword? Could it be used to avoid the temp parse like in this (very simplified example)? parse "<a>123</a>" [thru "<a>" copy tmp to "</a>" (probe tmp probe parse tmp ["123"]) to end] Note that I know that in this example it's easy to use just one parse and avoid the temp.
Ladislav 1-Dec-2010 [5324x3]	INTO is neither new, not it is meant for string parsing
	You can take advantage of using it when parsing a block and needing to parse a subblock (of any-block! type) or a substring
	(of the said block)
Oldes 1-Dec-2010 [5327]	can you give me a simple example, please?
Ladislav 1-Dec-2010 [5328x2]	>> parse [a b "123" c] [2 word! into [3 skip] word!] == true
Ladislav 1-Dec-2010 [5328x2]	>> parse [a b c/d/e] [2 word! into [3 word!]] == true
Oldes 1-Dec-2010 [5330x2]	I understand now, thanks.
Oldes 1-Dec-2010 [5330x2]	it's very useful, I woder why I've not found it earlier :)
Ladislav 1-Dec-2010 [5332]	The substring property is just a recent addition
Oldes 1-Dec-2010 [5333]	And is there any nice solution for my string parsing above? I can live with the temps, just was thinking if it could be done better.. anyway, at least I know how to use INTO:)
Ladislav 1-Dec-2010 [5334x2]	That is normally a "job" for a subrule
Ladislav 1-Dec-2010 [5334x2]	it looks, that you could use e.g. the REJECT keyword
Oldes 1-Dec-2010 [5336x2]	I know, but that would require complex rules, I'm lazy parser:) Btw.. my real example looks like: some [ thru {<h2><a} thru ">" copy name to {<} copy doc to {^/ </div>} ( parse doc [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} ( printf [" * " 10 " - "] reduce [arg arg-desc] ) ] ] ) ]
Oldes 1-Dec-2010 [5336x2]	Never mind, I can live with current way anyway.. I was just wondering if the INTO is not intended for such a cases. Now I know it isn't.
Ladislav 1-Dec-2010 [5338x3]	For comparison, a similar rule can be written as follows: some [ thru {<h2><a} thru ">" copy name to {<} copy doc any [ and {^/ </div>} break \| thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] \| skip ] ]
	Aha, sorry, that is not similar enough :-( To be similar, it should look as follows, I guess: some [ thru {<h2><a} thru ">" copy name to {<} copy doc any [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] to {^/ </div>} ] ]
	Still not cigar, third time: some [ thru {<h2><a} thru ">" copy name to {<} copy doc [ thru {<pre class="code">} copy code to {</pre} ( probe name probe code ) any [ thru {<h5>} copy arg to {<} thru {<ol><p>} copy arg-desc to {</p></ol>} (printf [" * " 10 " - "] reduce [arg arg-desc]) ] to {^/ </div>} ] ]
Oldes 1-Dec-2010 [5341x2]	That's not correct.. there is a reason for the temp parse and that's here because thru "<h5" would skip out of the div.
Oldes 1-Dec-2010 [5341x2]	the DOC is just the temp var for the second parse.
Ladislav 1-Dec-2010 [5343]	But, in that case your "inner parse" fails, without you noticing it?
Oldes 1-Dec-2010 [5344]	why? it does not fails.. or maybe fails, but I have the data from the doc div, that's all.. it's lazy parsing :)
older newer	first last