r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Ladislav
22-Sep-2010
[5295x2]
, since I guess, that this way, he will not have to just go into 
an "unknown territory"
I must say, that I was actually surprised, how people (including 
me) have struggled to circumvent this problem, while having such 
an elegant way available to solve it.
GrahamC
18-Oct-2010
[5297]
a regex question ...  ([0-9]{4})(-([0-9]{2})(-([0-9]{2})(T([0-9]{2}):([0-9]{2})(:([0-9]{2})(\.([0-9]+))?)?(Z|(([-+])([0-9]{2}):([0-9]{2}))))))

is apparently failing this string : 2010-10-18T07:06:25.00Z

What tool can I use to check this string against this regex ?
Sunanda
18-Oct-2010
[5298]
Regexlib has a different ISO-8601 date matching regex:
    http://regexlib.com/REDetails.aspx?regexp_id=2092

And the ability to enter any regex and target strings to test what 
happens:
    http://regexlib.com/RETester.aspx?
GrahamC
18-Oct-2010
[5299x2]
found this one too http://www.fileformat.info/tool/regex.htm
and it seems my string is passing ... hmm
Sunanda
18-Oct-2010
[5301]
The problem with regexes is they are impossible to debug.....Best 
just to rewrite continually until they work :)
GrahamC
18-Oct-2010
[5302]
I'm trying to validate some XML against an online validator and it's 
rejecting my dates :(
Henrik
18-Oct-2010
[5303]
how do you specify an element to be of the type any-type! except 
none! ?
Ladislav
18-Oct-2010
[5304]
I am afraid, that you need to list all types excluding none
Henrik
18-Oct-2010
[5305]
does R3 solve this? if not, maybe that would be a good problem to 
solve.
Ladislav
18-Oct-2010
[5306]
R3 can let you define that typeset and use it any time you like
Henrik
18-Oct-2010
[5307]
ok, that is possibly good enough for generating specs.
Gregg
18-Oct-2010
[5308]
I don't remember what all we did Henrik, but some of our test generation 
stuff on another world had some support for typesets IIRC.
Henrik
18-Oct-2010
[5309]
Gregg, ok
Steeve
18-Oct-2010
[5310]
Henrik, with a parse rule ?
Henrik
18-Oct-2010
[5311]
Steeve, yes.
Steeve
18-Oct-2010
[5312]
R3 does it
AdrianS
18-Oct-2010
[5313]
Graham, try http://gskinner.com/RegExrfor working out regexes. It 
has a really nice UI where you can hover over the components of the 
regex and see exactly what they do.
GrahamC
18-Oct-2010
[5314]
Thanks
Sunanda
4-Nov-2010
[5315]
Question on StackOverflow.....there must be a better answer than 
mine, and I'd suspect it involves PARSE (better answers usually do:)

    http://stackoverflow.com/questions/4093714/is-there-finer-granularity-than-load-next-for-reading-structured-data
GrahamC
4-Nov-2010
[5316x3]
Use fixed length records
Anyone got a parse rule that strips out everything between tags in 
an "xml" document
whitespace: charset [ "^/^- " ]
    swsp: [ any whitespace ]
    result: copy ""

    parse/all pqri-xml  [ some [ copy t thru ">" (append result t) swsp 
    to "<" ]]
Ladislav
4-Nov-2010
[5319]
Posted an answer mentioning the test framework, which does almost 
exactly what Fork asked
Gabriele
5-Nov-2010
[5320x3]
also, Carl's clean-script and script colorizer use parse + load/next 
to do the same thing. my Wetan uses the same method.
http://www.colellachiara.com/soft/MD3/emitters/wetan.html#section-4.2
basically, as long as you skip over [, (, ), and ] you can just use 
load/next. I'm also skipping over #[ because I want to preserve literal 
values while formatting (that is, preserve what the user typed)
Oldes
1-Dec-2010
[5323]
How to use the new INTO parse keyword? Could it be used to avoid 
the temp parse like in this (very simplified example)?

  parse "<a>123</a>" [thru "<a>" copy tmp to "</a>" (probe tmp  probe 
  parse tmp ["123"]) to end]

Note that I know that in this example it's easy to use just one parse 
and avoid the temp.
Ladislav
1-Dec-2010
[5324x3]
INTO is neither new, not it is meant for string parsing
You can take advantage of using it when parsing a block and needing 
to parse a subblock (of any-block! type) or a substring
(of the said block)
Oldes
1-Dec-2010
[5327]
can you give me a simple example, please?
Ladislav
1-Dec-2010
[5328x2]
>> parse [a b "123" c] [2 word! into [3 skip] word!]
== true
>> parse [a b c/d/e] [2 word! into [3 word!]]
== true
Oldes
1-Dec-2010
[5330x2]
I understand now, thanks.
it's very useful, I woder why I've not found it earlier :)
Ladislav
1-Dec-2010
[5332]
The substring property is just a recent addition
Oldes
1-Dec-2010
[5333]
And is there any nice solution for my string parsing above? I can 
live with the temps, just was thinking if it could be done better.. 
anyway, at least I know how to use INTO:)
Ladislav
1-Dec-2010
[5334x2]
That is normally a "job" for a subrule
it looks, that you could use e.g. the REJECT keyword
Oldes
1-Dec-2010
[5336x2]
I know, but that would require complex rules, I'm lazy parser:) Btw.. 
my real example looks like:
    some [
        thru {<h2><a} thru ">" copy name to {<}
        copy doc to {^/ </div>} (
            parse doc [
                thru {<pre class="code">} copy code to {</pre} (
                    probe name
                    probe code
                )
                any [
                    thru {<h5>} copy arg to {<}
                    thru {<ol><p>} copy arg-desc to {</p></ol>}

                    ( printf ["  * " 10 " - "] reduce [arg arg-desc] )
                ]
            ]
        )
    ]
Never mind, I can live with current way anyway.. I was just wondering 
if the INTO is not intended for such a cases. Now I know it isn't.
Ladislav
1-Dec-2010
[5338x3]
For comparison, a similar rule can be written as follows:

some [
	thru {<h2><a} thru ">" copy name to {<}
	copy doc any [
		and {^/ </div>} break
		| thru {<pre class="code">} copy code to {</pre} (
	          	probe name
	      		probe code
	       	)
			any [
	            thru {<h5>} copy arg to {<}
	            thru {<ol><p>} copy arg-desc to {</p></ol>}
	            (printf ["  * " 10 " - "] reduce [arg arg-desc])
	  		]
  		| skip
	]
]
Aha, sorry, that is not similar enough :-( To be similar, it should 
look as follows, I guess:

some [
    thru {<h2><a} thru ">" copy name to {<}
    copy doc any [
        thru {<pre class="code">} copy code to {</pre} (
            probe name
            probe code
        )
        any [
            thru {<h5>} copy arg to {<}
            thru {<ol><p>} copy arg-desc to {</p></ol>}
            (printf ["  * " 10 " - "] reduce [arg arg-desc])
        ]
  		to {^/ </div>}
	]
]
Still not cigar, third time:

some [
    thru {<h2><a} thru ">" copy name to {<}
    copy doc [
        thru {<pre class="code">} copy code to {</pre} (
            probe name
            probe code
        )
        any [
            thru {<h5>} copy arg to {<}
            thru {<ol><p>} copy arg-desc to {</p></ol>}
            (printf ["  * " 10 " - "] reduce [arg arg-desc])
        ]
  		to {^/ </div>}
	]
]
Oldes
1-Dec-2010
[5341x2]
That's not correct.. there is a reason for the temp parse and that's 
here because thru "<h5" would skip out of the div.
the DOC is just the temp var for the second parse.
Ladislav
1-Dec-2010
[5343]
But, in that case your "inner parse" fails, without you noticing 
it?
Oldes
1-Dec-2010
[5344]
why? it does not fails.. or maybe fails, but I have the data from 
the doc div, that's all.. it's lazy parsing :)