World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Brock 30-Jan-2005 [3x8]	ID "TerritoryName" "Province" 3 "South Western Ontario Generalist" "ON" 45 "Wallaceburg" "ON" 47 "Surrey, Langley, White Rock, Port Moody" "BC"
	I read the file in using...
	terrs: read %territory.txt ;terrid, territory, prov
	;(clearer)
	terrs: read %territory.txt
	When I include the Header row the first line is not parsed properly, notice "ID" is not included in the first element of terr2.... >> terr2: parse/all terrs "^/" == ["ID" {^-"TerritoryName"^-"Province"} {3^-"South Western Ontario Generalist"^-"ON"} {45^-"Wa llaceburg"^-"ON"} {47^-"Surrey, Lang...
	however, not including the header row the parse returns as expected... >> terr2: parse/all terrs "^/" == [{3^-"South Western Ontario Generalist"^-"ON"} {45^-"Wallaceburg"^-"ON"} {47^-"Surrey, Langl ey, White Rock, Port Moody"^-"BC"} {...
	What is it about the first row of data in particular it starting with a string that prevents the parsing of the carriage return ("^/") to not work?
Chris 30-Jan-2005 [11]	It may be a recurrence of a problem parsing " at the beginning of a line -- doesn't seem to be a problem on the beta that I have open (1.2.54). As an alternative, you could always read/lines, or simply load the file...
Geomol 30-Jan-2005 [12]	It's a bug, as I see it.
eFishAnt 30-Jan-2005 [13]	what version is Brock using?
Tomc 30-Jan-2005 [14x2]	note: block: parse/all read file newline could as well be written block read/lines file to side step the issue
Tomc 30-Jan-2005 [14x2]	that said parse/all could do better at truly ignoring all as it is being asked to
Brock 30-Jan-2005 [16x3]	eFish... 1.2.40, also tried latest beta 1.2.57, and went back to 1.2.8 to see if it handled it any different.
	Tom: Yes, I was aware of read/lines and how it is similar (apparently not the same) as parse/all series "^/". read/lines worked just fine. I don't know why last night I wasn't happy with read/lines - must have been tired!
	Chris: I tried 1.2.56.3.1, so if was fixed in .54, it missed the cut for the following releases.
Graham 30-Jan-2005 [19]	and what do you do if the text is not a file ? Write your own parse rule.
Chris 30-Jan-2005 [20]	I think I messed up -- not fixed on my 1.2.54...
Romano 30-Jan-2005 [21]	1.2.57 >> parse/all {"a""b""c"de} "e" == ["a" "b" "c" "d"] Please, add the bug to RAMBO.
Graham 8-Mar-2005 [22]	How do you break a parse rule and still have it be true ?
Tomc 8-Mar-2005 [23]	to end
Graham 8-Mar-2005 [24]	So, what is break used for then?
Tomc 8-Mar-2005 [25]	break out of sub rules
Graham 8-Mar-2005 [26]	oh .. ok.
Graham 10-Mar-2005 [27]	Actually, what I was wondering was how to break out of an action in a rule and still let it return true
sqlab 10-Mar-2005 [28]	In the new alphas you can do >> catch [parse/all " aa" [" " (throw true)]] == true
Graham 10-Mar-2005 [29]	ahh.. that's what I need.
Anton 10-Mar-2005 [30x2]	In a paren, you can change a 'break-rule from an empty block to [to end], then include that break-rule in your main parse-rule somewhere.
Anton 10-Mar-2005 [30x2]	(for a more backwards compatible solution)
JaimeVargas 10-Mar-2005 [32]	parse/all "first second" [["first" break] to end]
Romano 10-Mar-2005 [33]	Graham: break do not invalidate the rule by itself
Graham 10-Mar-2005 [34]	Anton, can you give an example of what you mean?
Anton 10-Mar-2005 [35x3]	Oh dear, looks like my memory was not so great. I was using 'break in the early-exit rule here: extract-link-by-name: func ["Returns the first A link with the specified text in the source string." name [any-string!] "link text" str [any-string!] "html source string" /local start end terms non-terms link text early-exit ][ early-exit: [] if parse/all str [ some [ thru "<a " any " " ["name" \| [ "href" any " " "=" any " " [{"} (terms: {"}) \| (terms: " >")] ; set terms depending on how url starts, with double-quote? (non-terms: complement charset terms) start: some non-terms end: (link: copy/part start end) opt {"} any " " thru ">" copy text to "</a" ; <- was before just "<" (if text = name [early-exit: [to end break]]) ]] early-exit ] ][reduce ['link link 'name text]] ]
	Ah, here is an example of what I was thinking: >> rule: [] parse "aaabbbb" [any [rule ["a" (print 'a) \| "b" (print 'b rule: [to end " "])]] to end] a a a b == true
	When the first b is encountered, rule is set to a rule which cannot be succeed, thus breaking out of the outer any.
Graham 10-Mar-2005 [38]	so, rule ends up by being redefined?
Graham 12-Mar-2005 [39x5]	I'm using these rules in my server side implemention to the top command one-line-rule: [copy line thru {^/} ( if line = ".^/" [ line: join "." line ] write-client line)] header-rule: [copy header thru {^/^/} (write-client header write-client )] msg is the email message including header and body lines is the number of lines requested by the TOP command parse msg compose [ header-rule (lines) one-line-rule ] Now, I can't check the parse syntax as rebol.com is down, but I seem to always get the whole email with my header-rule and not just the header.
	correction: header-rule: [copy header thru {^/^/} (write-client header)]
	the rules work when tested at the console. So, I'm thinking something else is wrong.
	The email is obtained by reading from a port in binary mode, so could this be due to the lack of line ending conversion ?
	From a unix system to windows32?
Brett 12-Mar-2005 [44x2]	Hi Graham. Line ending in Internet protocols = Apples; Line/Paragraph representation in text files = Oranges. :-) Best not to compare them for this task. As far as I've seen, Internet protocols have an on-the-wire line ending of CRLF. So yes this is very likely a problem with your parse rules.
Brett 12-Mar-2005 [44x2]	With the port in binary mode - you must use CRLF to identify/transmit line endings.
Graham 12-Mar-2005 [46]	Hi Brett. Do you then see a problem with me parse rules?
Tomc 12-Mar-2005 [47]	seems link to get all possible raw line endings something like copy line thru [ 1 2 ["^J" \| "^M"]] might work
Graham 12-Mar-2005 [48x2]	my one-line-rule seems to work already ..
Graham 12-Mar-2005 [48x2]	copy header thru [ 2 [ "^J" \| "^M"] ] ... ?
Tomc 12-Mar-2005 [50]	unix would be a single "^J"
Graham 12-Mar-2005 [51x2]	header is separated from body by one blank line.
Graham 12-Mar-2005 [51x2]	So, I need to look for two consecutive line endings to find the end of the header
older newer	first last