World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Carl 31-Dec-2009 [4783]	I'm still running into some problems with PARSE... mainly from the expectation of what ANY and SOME should do. For example: >> parse "" [any [copy tmp to end]] >> tmp == ""
Steeve 31-Dec-2009 [4784]	what do you expect in this case ?
Carl 31-Dec-2009 [4785x3]	In the rewrite of DECODE-CGI, that behavior of ANY forces me to write: parse "" [any [end break \| copy tmp to end]] This seems wrong to me if we define ANY as a MATCHing function, not as a LOOP function. This topic has been debated a bit between a few of us, but I think it deserves more attention.
	In other words, is ANY smart about the input? If there is no input, why should it even try? Of course, in the past we've used ANY a bit like WHILE -- as a LOOPing method, not really as a MATCHing method.
	It's a small thing, and maybe too late to change. I wanted to point it out.
Steeve 31-Dec-2009 [4788x2]	We have so much alternatives that i don't see this as a burden
Steeve 31-Dec-2009 [4788x2]	any [and skip copy tmp to end] any [copy tmp [skip to end]] etc...
Carl 31-Dec-2009 [4790]	There are a few ways to do it, but that is not my point.
Steeve 31-Dec-2009 [4791]	I see your point, but what if the ANY block contains production rules ? parse "" [any [and skip copy tmp to end break \| insert "1" and insert "2"]] (i know, stupid example)
Gregg 31-Dec-2009 [4792x2]	We have some cool new parse enhancements; really, really nice some of them. What I think will add the most value to PARSE--and maybe this is just me--are practical examples, idioms, and best practices.
Gregg 31-Dec-2009 [4792x2]	For example - Parsing an input that has nested structures, and how to collect the values you want. - Showing the user where the parse failed. - How to avoid infinite parse loops. - How to safely modify the input stream. More advanced examples would be great too of course.
Pekr 1-Jan-2010 [4794]	Carl - first "error" in parse rewrite with some/any is the auto protection for non advancing input. It is like writting in BASIC 10 Print "Hello" 20 goto 10 ... and not expecting it to run forever, because some magical internal mechanism kicks-in. If I write the code which could cause infinite loop, then be it. For me it causes the opposite reaction - some/any are not safe to use, let us use while instead .... something like: parse str [some [to "abc"]] is so obvious and self explanatory, that actually not looping forever almost feels like parse error. But - even if I don't like it, maybe most such infinite loop hits are more difficult to notice, so that actually the prevention might be ok, I don't know. As for me though, I would probably prefer some internal capability to detect such case, and some debug option to show last rule/position, where it happens ... I am not fluent enough with parse theory, but maybe it also relates to your loop vs matching note above ...
BrianH 6-Jan-2010 [4795x5]	BenBran: Not sure where to put this so asking here: I downloaded a web script and it has a snippet I don't understand: buffer: make string! 1024 ;; contains the browser request file: "index.html" parse buffer ["get" ["http" \| "/ " \| copy file to " " ]] what does: copy file to " " mean or do? tia
	The copy and to are parse operations. COPY copies the data covered by the next operation, the TO. TO covers the data from the current parse position until the first instance it can find of its argument.
	So, copy file to " " is the equivalent of this regular REBOL code: file: if find data " " [copy/part data find data " "]
	Sort of. The actual code is a little more complex, more like this: either tmp: find data " " [file: if 0 < offset? data tmp [copy/part data tmp]] [break]
	The break being a parse match fail, and file being set to none for a zero-length match.
BenBran 6-Jan-2010 [4800]	I get whats happening now. If i compare buffer and file I see the clipped text: >> probe file == "index.html" >> probe buffer {GET /a.html HTTP/1.1 Host: localhost User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safar i/531.21.10 Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,/;q=0.5 Accept-Language: en-US Accept-Encoding: gzip, deflate Connection: keep-alive Address: 127.0.0.1} >>probe parse buffer ["get" ["http" \| "/ " \| copy file to " "]] == false >> probe file == "/a.html" Should I have been able to see the results instead of == false?
Graham 6-Jan-2010 [4801x4]	false is the value returned by the parse function
	if you want the value you have to change the parse rule
	umm.. parse returns either true or false ...
	true if the rule completes to the end, false otherwise
BenBran 6-Jan-2010 [4805]	ok I see. Thanks.
BrianH 6-Jan-2010 [4806]	Was going to reply but Graham types faster :)
Graham 6-Jan-2010 [4807]	parse buffer [ "get" [ "http" \| "/" \| copy file to #" " ( print file) ] to end ] will return true
BrianH 6-Jan-2010 [4808x3]	PARSE returns true if the rule matches and covers the entire input, or false otherwise. Your rule matched but there was input left over. PARSE's return value doesn't matter in this case, just whether file is set or not. If you are using R3 you can do this too: parse buffer [ "get" [ "http" \| "/" \| return to " "]]
	That would return the file instead of setting a variable and not return false because of leftover input.
	>> parse "GET /a.html HTTP/1.1" ["get " return to " "] == "/a.html" Note that /all is the default in R3 so you need to specify space after GET.
BenBran 6-Jan-2010 [4811]	for completeness in R3 - I tried the lines above: >> parse "GET /a.html HTTP/1.1" ["get " return to " "] Script Error: Invalid argument: ?native? Where: halt-view ** Near: parse "GET /a.html HTTP/1.1" ["get " return to " "] I must be missing something simple
BrianH 6-Jan-2010 [4812]	What version of REBOL are you using? system/version ...
BenBran 6-Jan-2010 [4813]	>> help system SYSTEM is an object of value: version tuple! 2.7.7.3.1 build date! 1-Jan-2010/12:15:27-8:00 product word! View core tuple! 2.7.7 components block! length: 60
BrianH 6-Jan-2010 [4814]	That is R2, not R3.
BenBran 6-Jan-2010 [4815]	doh!
BrianH 6-Jan-2010 [4816]	You were right, it was something simple :)
BenBran 6-Jan-2010 [4817x2]	lol :-)
BenBran 6-Jan-2010 [4817x2]	yes it works perfect in R3. Thanks again.
Graham 14-Jan-2010 [4819]	>> parse [ <tag> ] [ copy t tag! ] == true >> t == [<tag>] never noticed it made a block! before
ChristianE 14-Jan-2010 [4820x5]	>> parse [ <tag> ] [ set t tag! ] == true >> t == <tag>
	There's a difference between COPY and SET in block parsing mode.
	From the docs: SET - set the next value to a variable COPY - copy the next match sequence to a variable
	Good the remember when dealing with "sequences": >> parse [ <tag> </tag> ] [ copy t [ tag! tag!] ] == true >> t == [<tag> </tag>] >> parse [ <tag> </tag> ] [ set t [ tag! tag!] ] == true >> t == <tag>
	the = to.
Graham 14-Jan-2010 [4825]	I've always used 'set ... not sure why I used 'copy this time!
Graham 29-Jan-2010 [4826x3]	<?xml version="1.0"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Body><SelectResponse xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><SelectResult><Item><Name>2010-01-29T09:54:48.000ZI3s3NjIxRjZERDI1MUY0QzQyMDk4M0JDMzkwMERGOEQxQTVDRDY5MzEwfQ==</Name><Attribute><Name>Subject</Name><Value>hello?</Value></Attribute><Attribute><Name>Userid</Name><Value>Guest</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T09:54:48.000Z</Value></Attribute></Item><Item><Name>2010-01-29T09:58:36.000ZI3swMTZBODg3QjAxNDQ2NEU5OENCNTA3OTc5OTg0Mjc1MTJGQzkxQTc0fQ==</Name><Attribute><Name>Subject</Name><Value>First Message</Value></Attribute><Attribute><Name>Userid</Name><Value>Graham</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T09:58:36.000Z</Value></Attribute></Item><Item><Name>2010-01-29T11:06:18.000ZI3tFREFCRUYwNTY4OTdBMzcwODM2NzJGQUE5MzAwRUE3NjYwMTMwMTY5fQ==</Name><Attribute><Name>Subject</Name><Value>Index working</Value></Attribute><Attribute><Name>Userid</Name><Value>Graham</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T11:06:18.000Z</Value></Attribute></Item></SelectResult><ResponseMetadata><RequestId>14873461-626a-44bf-2d7d-c1b23694b2e0</RequestId><BoxUsage>0.0000411449</BoxUsage></ResponseMetadata></SelectResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>
	results: copy [] parse result [ thru <SelectResult> some [ thru <Item> copy item to </Item> ( ?? item if parse item [ thru <Name> copy itemid to </Name> thru {<Name>Subject</Name>} thru <Value> copy subject to </Value> thru {<Name>Userid</Name>} thru <Value> copy userid to </Value> thru {<Name>UTCDate</Name>} thru <Value> copy utcdate to </Value> to end ][ repend results [ utcdate itemid userid subject ] ] ) ] ]
	This parse works fine in R2, but doesn't work in R3 ... I coudn't see why last night ... still can't ...
Steeve 29-Jan-2010 [4829x3]	Is that result a block or string ?
	because in a string you can't find tag! values
	i'm wrong T_T
Graham 29-Jan-2010 [4832]	It's a string ...
older newer	first last