r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Brock
30-Jan-2005
[3x8]
ID
	"TerritoryName"	"Province"
3	"South Western Ontario Generalist"	"ON"
45	"Wallaceburg"	"ON"
47	"Surrey, Langley, White Rock, Port Moody"	"BC"
I read the file in using...
terrs: read %territory.txt	;terrid, territory, prov
;(clearer)
terrs: read %territory.txt
When I include the Header row the first line is not parsed properly, 
notice "ID" is not included in the first element of terr2....
>> terr2: parse/all terrs "^/"

== ["ID" {^-"TerritoryName"^-"Province"} {3^-"South Western Ontario 
Generalist"^-"ON"} {45^-"Wa
llaceburg"^-"ON"} {47^-"Surrey, Lang...
however, not including the header row the parse returns as expected...
>> terr2: parse/all terrs "^/"

== [{3^-"South Western Ontario Generalist"^-"ON"} {45^-"Wallaceburg"^-"ON"} 
{47^-"Surrey, Langl
ey, White Rock, Port Moody"^-"BC"} {...
What is it about the first row of data in particular it starting 
with a string that prevents the parsing of the carriage return ("^/") 
to not work?
Chris
30-Jan-2005
[11]
It may be a recurrence of a problem parsing " at the beginning of 
a line -- doesn't seem to be a problem on the beta that I have open 
(1.2.54).  As an alternative, you could always read/lines, or simply 
load the file...
Geomol
30-Jan-2005
[12]
It's a bug, as I see it.
eFishAnt
30-Jan-2005
[13]
what version is Brock using?
Tomc
30-Jan-2005
[14x2]
note:  block: parse/all read file newline    could as well be written 
  block read/lines file  to side step the issue
that said parse/all could do better at  truly ignoring all  as it 
is being asked to
Brock
30-Jan-2005
[16x3]
eFish... 1.2.40, also tried latest beta 1.2.57, and went back to 
1.2.8 to see if it handled it any different.
Tom: Yes, I was aware of read/lines and how it is similar (apparently 
not the same) as parse/all series "^/".  read/lines worked just fine.

I don't know why last night I wasn't happy with read/lines - must 
have been tired!
Chris: I tried 1.2.56.3.1, so if was fixed in .54, it missed the 
cut for the following releases.
Graham
30-Jan-2005
[19]
and what do you do if the text is not a file ?  Write your own parse 
rule.
Chris
30-Jan-2005
[20]
I think I messed up -- not fixed on my 1.2.54...
Romano
30-Jan-2005
[21]
1.2.57
>> parse/all {"a""b""c"de} "e"
== ["a" "b" "c" "d"]
Please, add the bug to RAMBO.
Graham
8-Mar-2005
[22]
How do you break a parse rule and still have it be true ?
Tomc
8-Mar-2005
[23]
to end
Graham
8-Mar-2005
[24]
So, what is break used for then?
Tomc
8-Mar-2005
[25]
break out of sub rules
Graham
8-Mar-2005
[26]
oh .. ok.
Graham
10-Mar-2005
[27]
Actually, what I was wondering was how to break out of an action 
in a rule and still let it return true
sqlab
10-Mar-2005
[28]
In the new alphas you can do
>> catch [parse/all "   aa" [" " (throw true)]]
== true
Graham
10-Mar-2005
[29]
ahh.. that's what I need.
Anton
10-Mar-2005
[30x2]
In a paren, you can change a 'break-rule from an empty block to [to 
end], then include that break-rule in your main parse-rule somewhere.
(for a more backwards compatible solution)
JaimeVargas
10-Mar-2005
[32]
parse/all "first second" [["first" break] to end]
Romano
10-Mar-2005
[33]
Graham: break do not invalidate the rule by itself
Graham
10-Mar-2005
[34]
Anton, can you give an example of what you mean?
Anton
10-Mar-2005
[35x3]
Oh dear, looks like my memory was not so great. I was using 'break 
in the early-exit rule here:

extract-link-by-name: func ["Returns the first A link with the specified 
text in the source string."
	name [any-string!] "link text"
	str [any-string!] "html source string"
	/local start end terms non-terms link text early-exit
][
	early-exit: []
	if parse/all str [
		some [
			thru "<a " any " " 
			["name" | [
				"href" any " " "=" any " "

    [{"} (terms: {"}) | (terms: " >")] ; set terms depending on how url 
    starts, with double-quote?
				(non-terms: complement charset terms)
				start: some non-terms end: (link: copy/part start end)
				opt {"} any " "
				thru ">" copy text to "</a" ; <- was before just "<"
				(if text = name [early-exit: [to end break]])
			]]
			early-exit
		]
	][reduce ['link link 'name text]]
]
Ah, here is an example of what I was thinking:

>> rule: [] parse "aaabbbb" [any [rule ["a" (print 'a) | "b" (print 
'b rule: [to end " "])]] to end]
a
a
a
b
== true
When the first b is encountered, rule is set to a rule which cannot 
be succeed, thus breaking out of the outer any.
Graham
10-Mar-2005
[38]
so, rule ends up by being redefined?
Graham
12-Mar-2005
[39x5]
I'm using these rules in my server side implemention to the top command


one-line-rule: [copy line thru {^/} ( if line = ".^/" [ line: join 
"." line ] write-client line)]

header-rule: [copy header thru {^/^/} (write-client header write-client 
)]

msg is the email message including header and body
lines is the number of lines requested by the TOP command

parse msg compose [ header-rule (lines) one-line-rule ]


Now, I can't check the parse syntax as rebol.com is down, but I seem 
to always get the whole email with my header-rule and not just the 
header.
correction: header-rule: [copy header thru {^/^/} (write-client header)]
the rules work when tested at the console.  So, I'm thinking something 
else is wrong.
The email is obtained by reading from a port in binary mode, so could 
this be due to the lack of line ending conversion ?
From a unix system to windows32?
Brett
12-Mar-2005
[44x2]
Hi Graham. Line ending in Internet protocols = Apples; Line/Paragraph 
representation in text files = Oranges. :-)

Best not to compare them for this task.  As far as I've seen, Internet 
protocols have an on-the-wire line ending of CRLF.  So yes this is 
very likely a problem with your parse rules.
With the port in binary mode - you must use CRLF to identify/transmit 
line endings.
Graham
12-Mar-2005
[46]
Hi Brett.  Do you then see a problem with me parse rules?
Tomc
12-Mar-2005
[47]
seems link to get all possible raw line endings something  like  
copy line thru [ 1 2 ["^J" | "^M"]]   might work
Graham
12-Mar-2005
[48x2]
my one-line-rule seems to work already ..
copy header thru [ 2 [ "^J" | "^M"] ] ... ?
Tomc
12-Mar-2005
[50]
unix would be a single "^J"
Graham
12-Mar-2005
[51x2]
header is separated from body by one blank line.
So, I need to look for two consecutive line endings to find the end 
of the header