World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Anton 10-Mar-2005 [35x3]	Oh dear, looks like my memory was not so great. I was using 'break in the early-exit rule here: extract-link-by-name: func ["Returns the first A link with the specified text in the source string." name [any-string!] "link text" str [any-string!] "html source string" /local start end terms non-terms link text early-exit ][ early-exit: [] if parse/all str [ some [ thru "<a " any " " ["name" \| [ "href" any " " "=" any " " [{"} (terms: {"}) \| (terms: " >")] ; set terms depending on how url starts, with double-quote? (non-terms: complement charset terms) start: some non-terms end: (link: copy/part start end) opt {"} any " " thru ">" copy text to "</a" ; <- was before just "<" (if text = name [early-exit: [to end break]]) ]] early-exit ] ][reduce ['link link 'name text]] ]
	Ah, here is an example of what I was thinking: >> rule: [] parse "aaabbbb" [any [rule ["a" (print 'a) \| "b" (print 'b rule: [to end " "])]] to end] a a a b == true
	When the first b is encountered, rule is set to a rule which cannot be succeed, thus breaking out of the outer any.
Graham 10-Mar-2005 [38]	so, rule ends up by being redefined?
Graham 12-Mar-2005 [39x5]	I'm using these rules in my server side implemention to the top command one-line-rule: [copy line thru {^/} ( if line = ".^/" [ line: join "." line ] write-client line)] header-rule: [copy header thru {^/^/} (write-client header write-client )] msg is the email message including header and body lines is the number of lines requested by the TOP command parse msg compose [ header-rule (lines) one-line-rule ] Now, I can't check the parse syntax as rebol.com is down, but I seem to always get the whole email with my header-rule and not just the header.
	correction: header-rule: [copy header thru {^/^/} (write-client header)]
	the rules work when tested at the console. So, I'm thinking something else is wrong.
	The email is obtained by reading from a port in binary mode, so could this be due to the lack of line ending conversion ?
	From a unix system to windows32?
Brett 12-Mar-2005 [44x2]	Hi Graham. Line ending in Internet protocols = Apples; Line/Paragraph representation in text files = Oranges. :-) Best not to compare them for this task. As far as I've seen, Internet protocols have an on-the-wire line ending of CRLF. So yes this is very likely a problem with your parse rules.
Brett 12-Mar-2005 [44x2]	With the port in binary mode - you must use CRLF to identify/transmit line endings.
Graham 12-Mar-2005 [46]	Hi Brett. Do you then see a problem with me parse rules?
Tomc 12-Mar-2005 [47]	seems link to get all possible raw line endings something like copy line thru [ 1 2 ["^J" \| "^M"]] might work
Graham 12-Mar-2005 [48x2]	my one-line-rule seems to work already ..
Graham 12-Mar-2005 [48x2]	copy header thru [ 2 [ "^J" \| "^M"] ] ... ?
Tomc 12-Mar-2005 [50]	unix would be a single "^J"
Graham 12-Mar-2005 [51x2]	header is separated from body by one blank line.
Graham 12-Mar-2005 [51x2]	So, I need to look for two consecutive line endings to find the end of the header
Tomc 12-Mar-2005 [53]	yes , my rule was for a single lin as in your first code sample
Graham 12-Mar-2005 [54]	is ^J the line feed ?
Tomc 12-Mar-2005 [55x2]	copy header ti ["^J^J" \| "^M^M" \| "^M^J^M^J"]
Tomc 12-Mar-2005 [55x2]	^J same as ^/
Graham 12-Mar-2005 [57]	>> header-rule: [copy header thru ["^J^J" \| "^M^M" \| "^M^J^M^J"] (write-client header)] == [copy header thru ["^/^/" \| "^M^M" \| {^M ^M }] (write-client header)] >> parse m header-rule Script Error: Invalid argument: \| Near: parse m header-rule
Tomc 12-Mar-2005 [58x2]	try {^M^/^M^/}
Tomc 12-Mar-2005 [58x2]	not that it shoulf matter
Graham 12-Mar-2005 [60]	same problem
Tomc 12-Mar-2005 [61x3]	it is interperting the third newline ... odd buggy odd
	parse all?
	os use \| 2 CRLF
Graham 12-Mar-2005 [64]	>> unix: [ copy header thru "^J^J" ] == [copy header thru "^/^/"] >> msdos: [ copy header thru "^/^/" ] == [copy header thru "^/^/"] >> parse m [ [ unix \| msdos ] ( write-client header ) ]
Tomc 12-Mar-2005 [65]	ms dos not correct
Graham 12-Mar-2005 [66]	is that correct rule for unix ?
Tomc 12-Mar-2005 [67]	yes
Graham 12-Mar-2005 [68]	what's msdos ?
Tomc 12-Mar-2005 [69x3]	mac ^M^M dod ^M^J unix ^J
	mac is a single ^M for a single line
	[2 "^/" \| 2 "^M" \| 2 CLRF]
Graham 12-Mar-2005 [72x2]	>> parse m [ [2 "^/" \| 2 "^M" \| 2 crlf ] ( write-client header ) ] == false
Graham 12-Mar-2005 [72x2]	dos is : join crlf crlf .. isn't it ?
Tomc 12-Mar-2005 [74]	... copy haader theu ...
Graham 12-Mar-2005 [75]	parse m [ copy header thru [2 "^/" \| 2 "^M" \| 2 crlf ] ( write-client header ) ] ** Script Error: Invalid argument: 2 \| 2 crlf
Tomc 12-Mar-2005 [76x2]	copy header [thru 2 "^/" \| thru 1 "^M" \| thru 2 crlf]
Tomc 12-Mar-2005 [76x2]	it annoys me the to/thru does not distribure over the block of ORs
Graham 12-Mar-2005 [78]	me too ...
Tomc 12-Mar-2005 [79]	off to the coast for the weekend, got to pack
Graham 12-Mar-2005 [80x3]	ok.
	This rule seems to now work for me ... header-rule: [ [ copy header thru {^M^J^M^J} \| copy header thru {^/^/} ] (write-client header )]
	not sure if the second rule is needed ...
Chris 12-Mar-2005 [83x2]	;I tend to use charsets as a way of skipping 'thru while looking for multiple possibilities: line-end: charset [#"^/" #"^M"] ; etc line: complement newlines parse m [copy header [some [some line line-end]]]
Chris 12-Mar-2005 [83x2]	a way of skipping -> "an alternative to"
older newer	first last