r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Ladislav
22-Sep-2010
[5292x5]
Actually, both variants are implemented, even the one without the 
overhead (which I implemented first).
(or, to be more precise, maybe there is a possibility to make a variant 
not binding the rule at all, which would then deserve to be called 
"without the overhead" rather than any of my variants)
But, as you said, one of my motivations was to write it as a mezzanine 
to have some "inspiration"/experiences with it for Carl.
, since I guess, that this way, he will not have to just go into 
an "unknown territory"
I must say, that I was actually surprised, how people (including 
me) have struggled to circumvent this problem, while having such 
an elegant way available to solve it.
GrahamC
18-Oct-2010
[5297]
a regex question ...  ([0-9]{4})(-([0-9]{2})(-([0-9]{2})(T([0-9]{2}):([0-9]{2})(:([0-9]{2})(\.([0-9]+))?)?(Z|(([-+])([0-9]{2}):([0-9]{2}))))))

is apparently failing this string : 2010-10-18T07:06:25.00Z

What tool can I use to check this string against this regex ?
Sunanda
18-Oct-2010
[5298]
Regexlib has a different ISO-8601 date matching regex:
    http://regexlib.com/REDetails.aspx?regexp_id=2092

And the ability to enter any regex and target strings to test what 
happens:
    http://regexlib.com/RETester.aspx?
GrahamC
18-Oct-2010
[5299x2]
found this one too http://www.fileformat.info/tool/regex.htm
and it seems my string is passing ... hmm
Sunanda
18-Oct-2010
[5301]
The problem with regexes is they are impossible to debug.....Best 
just to rewrite continually until they work :)
GrahamC
18-Oct-2010
[5302]
I'm trying to validate some XML against an online validator and it's 
rejecting my dates :(
Henrik
18-Oct-2010
[5303]
how do you specify an element to be of the type any-type! except 
none! ?
Ladislav
18-Oct-2010
[5304]
I am afraid, that you need to list all types excluding none
Henrik
18-Oct-2010
[5305]
does R3 solve this? if not, maybe that would be a good problem to 
solve.
Ladislav
18-Oct-2010
[5306]
R3 can let you define that typeset and use it any time you like
Henrik
18-Oct-2010
[5307]
ok, that is possibly good enough for generating specs.
Gregg
18-Oct-2010
[5308]
I don't remember what all we did Henrik, but some of our test generation 
stuff on another world had some support for typesets IIRC.
Henrik
18-Oct-2010
[5309]
Gregg, ok
Steeve
18-Oct-2010
[5310]
Henrik, with a parse rule ?
Henrik
18-Oct-2010
[5311]
Steeve, yes.
Steeve
18-Oct-2010
[5312]
R3 does it
AdrianS
18-Oct-2010
[5313]
Graham, try http://gskinner.com/RegExrfor working out regexes. It 
has a really nice UI where you can hover over the components of the 
regex and see exactly what they do.
GrahamC
18-Oct-2010
[5314]
Thanks
Sunanda
4-Nov-2010
[5315]
Question on StackOverflow.....there must be a better answer than 
mine, and I'd suspect it involves PARSE (better answers usually do:)

    http://stackoverflow.com/questions/4093714/is-there-finer-granularity-than-load-next-for-reading-structured-data
GrahamC
4-Nov-2010
[5316x3]
Use fixed length records
Anyone got a parse rule that strips out everything between tags in 
an "xml" document
whitespace: charset [ "^/^- " ]
    swsp: [ any whitespace ]
    result: copy ""

    parse/all pqri-xml  [ some [ copy t thru ">" (append result t) swsp 
    to "<" ]]
Ladislav
4-Nov-2010
[5319]
Posted an answer mentioning the test framework, which does almost 
exactly what Fork asked
Gabriele
5-Nov-2010
[5320x3]
also, Carl's clean-script and script colorizer use parse + load/next 
to do the same thing. my Wetan uses the same method.
http://www.colellachiara.com/soft/MD3/emitters/wetan.html#section-4.2
basically, as long as you skip over [, (, ), and ] you can just use 
load/next. I'm also skipping over #[ because I want to preserve literal 
values while formatting (that is, preserve what the user typed)
Oldes
1-Dec-2010
[5323]
How to use the new INTO parse keyword? Could it be used to avoid 
the temp parse like in this (very simplified example)?

  parse "<a>123</a>" [thru "<a>" copy tmp to "</a>" (probe tmp  probe 
  parse tmp ["123"]) to end]

Note that I know that in this example it's easy to use just one parse 
and avoid the temp.
Ladislav
1-Dec-2010
[5324x3]
INTO is neither new, not it is meant for string parsing
You can take advantage of using it when parsing a block and needing 
to parse a subblock (of any-block! type) or a substring
(of the said block)
Oldes
1-Dec-2010
[5327]
can you give me a simple example, please?
Ladislav
1-Dec-2010
[5328x2]
>> parse [a b "123" c] [2 word! into [3 skip] word!]
== true
>> parse [a b c/d/e] [2 word! into [3 word!]]
== true
Oldes
1-Dec-2010
[5330x2]
I understand now, thanks.
it's very useful, I woder why I've not found it earlier :)
Ladislav
1-Dec-2010
[5332]
The substring property is just a recent addition
Oldes
1-Dec-2010
[5333]
And is there any nice solution for my string parsing above? I can 
live with the temps, just was thinking if it could be done better.. 
anyway, at least I know how to use INTO:)
Ladislav
1-Dec-2010
[5334x2]
That is normally a "job" for a subrule
it looks, that you could use e.g. the REJECT keyword
Oldes
1-Dec-2010
[5336x2]
I know, but that would require complex rules, I'm lazy parser:) Btw.. 
my real example looks like:
    some [
        thru {<h2><a} thru ">" copy name to {<}
        copy doc to {^/ </div>} (
            parse doc [
                thru {<pre class="code">} copy code to {</pre} (
                    probe name
                    probe code
                )
                any [
                    thru {<h5>} copy arg to {<}
                    thru {<ol><p>} copy arg-desc to {</p></ol>}

                    ( printf ["  * " 10 " - "] reduce [arg arg-desc] )
                ]
            ]
        )
    ]
Never mind, I can live with current way anyway.. I was just wondering 
if the INTO is not intended for such a cases. Now I know it isn't.
Ladislav
1-Dec-2010
[5338x3]
For comparison, a similar rule can be written as follows:

some [
	thru {<h2><a} thru ">" copy name to {<}
	copy doc any [
		and {^/ </div>} break
		| thru {<pre class="code">} copy code to {</pre} (
	          	probe name
	      		probe code
	       	)
			any [
	            thru {<h5>} copy arg to {<}
	            thru {<ol><p>} copy arg-desc to {</p></ol>}
	            (printf ["  * " 10 " - "] reduce [arg arg-desc])
	  		]
  		| skip
	]
]
Aha, sorry, that is not similar enough :-( To be similar, it should 
look as follows, I guess:

some [
    thru {<h2><a} thru ">" copy name to {<}
    copy doc any [
        thru {<pre class="code">} copy code to {</pre} (
            probe name
            probe code
        )
        any [
            thru {<h5>} copy arg to {<}
            thru {<ol><p>} copy arg-desc to {</p></ol>}
            (printf ["  * " 10 " - "] reduce [arg arg-desc])
        ]
  		to {^/ </div>}
	]
]
Still not cigar, third time:

some [
    thru {<h2><a} thru ">" copy name to {<}
    copy doc [
        thru {<pre class="code">} copy code to {</pre} (
            probe name
            probe code
        )
        any [
            thru {<h5>} copy arg to {<}
            thru {<ol><p>} copy arg-desc to {</p></ol>}
            (printf ["  * " 10 " - "] reduce [arg arg-desc])
        ]
  		to {^/ </div>}
	]
]
Oldes
1-Dec-2010
[5341]
That's not correct.. there is a reason for the temp parse and that's 
here because thru "<h5" would skip out of the div.