r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

ChristianE
14-Jan-2010
[4822x3]
From the docs:

SET - set the next value to a variable
COPY - copy the next match sequence to a variable
Good the remember when dealing with "sequences":

>> parse [ <tag> </tag> ] [ copy t [ tag! tag!] ]
== true
>> t
== [<tag> </tag>]
>> parse [ <tag> </tag> ] [ set t [ tag! tag!] ]
== true
>> t
== <tag>
the = to.
Graham
14-Jan-2010
[4825]
I've always used 'set ... not sure why I used 'copy this time!
Graham
29-Jan-2010
[4826x3]
<?xml version="1.0"?>

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"><SOAP-ENV:Body><SelectResponse 
xmlns="http://sdb.amazonaws.com/doc/2009-04-15/"><SelectResult><Item><Name>2010-01-29T09:54:48.000ZI3s3NjIxRjZERDI1MUY0QzQyMDk4M0JDMzkwMERGOEQxQTVDRDY5MzEwfQ==</Name><Attribute><Name>Subject</Name><Value>hello?</Value></Attribute><Attribute><Name>Userid</Name><Value>Guest</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T09:54:48.000Z</Value></Attribute></Item><Item><Name>2010-01-29T09:58:36.000ZI3swMTZBODg3QjAxNDQ2NEU5OENCNTA3OTc5OTg0Mjc1MTJGQzkxQTc0fQ==</Name><Attribute><Name>Subject</Name><Value>First 
Message</Value></Attribute><Attribute><Name>Userid</Name><Value>Graham</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T09:58:36.000Z</Value></Attribute></Item><Item><Name>2010-01-29T11:06:18.000ZI3tFREFCRUYwNTY4OTdBMzcwODM2NzJGQUE5MzAwRUE3NjYwMTMwMTY5fQ==</Name><Attribute><Name>Subject</Name><Value>Index 
working</Value></Attribute><Attribute><Name>Userid</Name><Value>Graham</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T11:06:18.000Z</Value></Attribute></Item></SelectResult><ResponseMetadata><RequestId>14873461-626a-44bf-2d7d-c1b23694b2e0</RequestId><BoxUsage>0.0000411449</BoxUsage></ResponseMetadata></SelectResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>
results: copy []
	parse result [
		thru <SelectResult> 
		some [
			thru <Item> copy item to </Item> (
				?? item

    if parse item [ thru <Name> copy itemid to </Name> thru {<Name>Subject</Name>} 
    thru <Value> copy subject to </Value> thru {<Name>Userid</Name>} 
    thru <Value> copy userid to </Value> thru {<Name>UTCDate</Name>} 
    thru <Value> copy utcdate to </Value> to end ][
					repend results [ utcdate itemid userid subject ]
				]
			)
		]
	]
This parse works fine in R2, but doesn't work in R3 ... I coudn't 
see why last night ... still can't ...
Steeve
29-Jan-2010
[4829x3]
Is that result a block or string ?
because in a string you can't find tag! values
i'm wrong 
T_T
Graham
29-Jan-2010
[4832]
It's a string ...
Steeve
29-Jan-2010
[4833]
but i'm still wrong ;-)
Graham
29-Jan-2010
[4834x2]
Yes, tags are a type of string ...
this is what I get for item 


e>2010-01-29T11:06:18.000ZI3tFREFCRUYwNTY4OTdBMzcwODM2NzJGQUE5MzAwRUE3NjYwMTMwMTY5fQ==</Name><Attribute><Name>Subject</Name><Value>Index 
working</Value></Attribute><Attribute><Name>Userid</Name><Value>Graham</Value></Attribute><Attribute><Name>UTCDate</Name><Value>2010-01-29T11:06:18.000Z</Value></Attribute>
Steeve
29-Jan-2010
[4836x2]
>> parse "<a><item>" [thru <a> ??]
end!: "item>"
== false
a bug
Graham
29-Jan-2010
[4838]
I'm not familiar with that ... what should it say?
Steeve
29-Jan-2010
[4839x2]
It should say:
>> parse "<a><item>" [thru <a> ??]
end!: "<item>"
== false
parsing thru a tag eat one more char
Graham
29-Jan-2010
[4841]
Ah .. ?? is a new debugging function
Steeve
29-Jan-2010
[4842]
yep
Graham
29-Jan-2010
[4843x2]
Should have known about it last night!  Would have saved me sometime 
:(
Well, this looks like an unreported bug ...
Steeve
29-Jan-2010
[4845]
exactly
Graham
29-Jan-2010
[4846]
Shall you or I curecode it?
Steeve
29-Jan-2010
[4847x2]
you
;-)
Graham
29-Jan-2010
[4849x2]
okey dokey
Now I know I can't use r3 for parsing xml .... :(

http://www.curecode.org/rebol3/ticket.rsp?id=1449
Steeve
29-Jan-2010
[4851]
you can, just replace <tag> by a real string "<tag>"
Graham
29-Jan-2010
[4852x4]
ugly !  :)
Point taken ...
Is there any likelihood of the parse enhancements making it to r2? 
 Anyone know?
( without the bugs of course )
Steeve
29-Jan-2010
[4856]
0%
BrianH
29-Jan-2010
[4857x2]
And there is a great likelihood of the bugs being fixed in R3. And 
there aren't many in PARSE, just that tag bug afaik.
Graham, I deleted bug #1449 since it was already reported as #682. 
See also #854 and #1160 (and #10, which was incorrectly "fixed").
Graham
29-Jan-2010
[4859]
your response says it was fixed ...
BrianH
29-Jan-2010
[4860]
Partially - it used to be worse. That's why it's marked a "problem".
Graham
29-Jan-2010
[4861]
only eats one char instead of two ... so that's a 50% improvement
BrianH
29-Jan-2010
[4862x2]
The worst was when someone "fixed" #10 to make it compatible with 
R2's buggy behavior. Bad fixes get marked as a problem.
Check out #666 for R3's official policy on bug-for-bug compatibility 
:)
Graham
29-Jan-2010
[4864]
at least it should not introduce new bugs
BrianH
29-Jan-2010
[4865]
Agreed (and the policy agrees too).
Graham
29-Jan-2010
[4866]
I looked for a previous report on this bug but couldn't find it .. 
4 pages of bugs with parse in them.  I wonder if they can be filtered 
to only show active bugs
BrianH
29-Jan-2010
[4867]
Bring it up in the !CureCode group.
Graham
7-Feb-2010
[4868x2]
I want to extract all the dates ( dd-mmm-yy, dd mmm yyyy d mmmmmmm 
yy )


extract-dates: func [ txt 
	/local months dates days month year
][
	dates: copy []
	months: copy []
	digit: charset [ #"0" - #"9" ]
	digits: [ some digit ]
	foreach mon system/locale/months [
		repend months [ mon '|  copy/part mon 3 '| ]
	]
	remove back tail months
	parse txt [
		some [
			to 1 2 digits copy days 1 2 digit [ #" " | #"-" ]
			copy month months
			[ #" " | #"-" ]
			copy year [ 4 digits | 2 digits ]
			( repend dates rejoin [ days "-" month "-" year ] ) |
			thru 1 2 digits ??
		]
	]
	dates
]


extract-dates "asdf sdfsf  11 Jan 2008 12-January-10 fasdfsaf asdf 
as 11 2 3 3  13-Feb-08 asdfasf "
not working ...
Steeve
7-Feb-2010
[4870]
R2 or R3 ?
In any case, the first rule may fail.
you can't do "TO 1 2 digits"
BrianH
7-Feb-2010
[4871]
TO and THRU have limited argument syntax, and don't support full 
rules. Both R2 and R3 support literal value arguments (that don't 
count as rules). R3 also supports a block of literal values delimited 
by |, and those values are less limted.