r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Anton
10-Mar-2005
[35x3]
Oh dear, looks like my memory was not so great. I was using 'break 
in the early-exit rule here:

extract-link-by-name: func ["Returns the first A link with the specified 
text in the source string."
	name [any-string!] "link text"
	str [any-string!] "html source string"
	/local start end terms non-terms link text early-exit
][
	early-exit: []
	if parse/all str [
		some [
			thru "<a " any " " 
			["name" | [
				"href" any " " "=" any " "

    [{"} (terms: {"}) | (terms: " >")] ; set terms depending on how url 
    starts, with double-quote?
				(non-terms: complement charset terms)
				start: some non-terms end: (link: copy/part start end)
				opt {"} any " "
				thru ">" copy text to "</a" ; <- was before just "<"
				(if text = name [early-exit: [to end break]])
			]]
			early-exit
		]
	][reduce ['link link 'name text]]
]
Ah, here is an example of what I was thinking:

>> rule: [] parse "aaabbbb" [any [rule ["a" (print 'a) | "b" (print 
'b rule: [to end " "])]] to end]
a
a
a
b
== true
When the first b is encountered, rule is set to a rule which cannot 
be succeed, thus breaking out of the outer any.
Graham
10-Mar-2005
[38]
so, rule ends up by being redefined?
Graham
12-Mar-2005
[39x5]
I'm using these rules in my server side implemention to the top command


one-line-rule: [copy line thru {^/} ( if line = ".^/" [ line: join 
"." line ] write-client line)]

header-rule: [copy header thru {^/^/} (write-client header write-client 
)]

msg is the email message including header and body
lines is the number of lines requested by the TOP command

parse msg compose [ header-rule (lines) one-line-rule ]


Now, I can't check the parse syntax as rebol.com is down, but I seem 
to always get the whole email with my header-rule and not just the 
header.
correction: header-rule: [copy header thru {^/^/} (write-client header)]
the rules work when tested at the console.  So, I'm thinking something 
else is wrong.
The email is obtained by reading from a port in binary mode, so could 
this be due to the lack of line ending conversion ?
From a unix system to windows32?
Brett
12-Mar-2005
[44x2]
Hi Graham. Line ending in Internet protocols = Apples; Line/Paragraph 
representation in text files = Oranges. :-)

Best not to compare them for this task.  As far as I've seen, Internet 
protocols have an on-the-wire line ending of CRLF.  So yes this is 
very likely a problem with your parse rules.
With the port in binary mode - you must use CRLF to identify/transmit 
line endings.
Graham
12-Mar-2005
[46]
Hi Brett.  Do you then see a problem with me parse rules?
Tomc
12-Mar-2005
[47]
seems link to get all possible raw line endings something  like  
copy line thru [ 1 2 ["^J" | "^M"]]   might work
Graham
12-Mar-2005
[48x2]
my one-line-rule seems to work already ..
copy header thru [ 2 [ "^J" | "^M"] ] ... ?
Tomc
12-Mar-2005
[50]
unix would be a single "^J"
Graham
12-Mar-2005
[51x2]
header is separated from body by one blank line.
So, I need to look for two consecutive line endings to find the end 
of the header
Tomc
12-Mar-2005
[53]
yes ,  my rule was for a single lin as in your first code  sample
Graham
12-Mar-2005
[54]
is ^J the line feed ?
Tomc
12-Mar-2005
[55x2]
copy header ti ["^J^J" | "^M^M" | "^M^J^M^J"]
^J same as ^/
Graham
12-Mar-2005
[57]
>> header-rule: [copy header thru ["^J^J" | "^M^M" | "^M^J^M^J"] 
 (write-client header)]
== [copy header thru ["^/^/" | "^M^M" | {^M
^M
}] (write-client header)]
>> parse m header-rule
** Script Error: Invalid argument:

 |


** Near: parse m header-rule
Tomc
12-Mar-2005
[58x2]
try {^M^/^M^/}
not that it shoulf matter
Graham
12-Mar-2005
[60]
same problem
Tomc
12-Mar-2005
[61x3]
it is interperting the third newline ... odd  buggy odd
parse all?
os use |  2 CRLF
Graham
12-Mar-2005
[64]
>> unix: [ copy header thru "^J^J" ]
== [copy header thru "^/^/"]
>> msdos: [ copy header thru "^/^/" ]
== [copy header thru "^/^/"]
>> parse m [ [ unix | msdos ] ( write-client header ) ]
Tomc
12-Mar-2005
[65]
ms dos  not correct
Graham
12-Mar-2005
[66]
is that correct rule for unix ?
Tomc
12-Mar-2005
[67]
yes
Graham
12-Mar-2005
[68]
what's msdos ?
Tomc
12-Mar-2005
[69x3]
mac ^M^M  dod ^M^J unix ^J
mac is a single ^M for a single line
[2 "^/" | 2 "^M" | 2 CLRF]
Graham
12-Mar-2005
[72x2]
>> parse m [ [2 "^/" | 2 "^M" | 2 crlf ] ( write-client header ) 
]
== false
dos is : join crlf crlf .. isn't it ?
Tomc
12-Mar-2005
[74]
... copy haader theu ...
Graham
12-Mar-2005
[75]
parse m [ copy header thru [2 "^/" | 2 "^M" | 2 crlf ] ( write-client 
header ) ]
** Script Error: Invalid argument: 2
 | 2 crlf
Tomc
12-Mar-2005
[76x2]
copy header [thru 2 "^/" | thru 1 "^M" | thru 2 crlf]
it annoys me the to/thru does not distribure over the block of ORs
Graham
12-Mar-2005
[78]
me too ...
Tomc
12-Mar-2005
[79]
off to the coast for the weekend, got to pack
Graham
12-Mar-2005
[80x3]
ok.
This rule seems to now work for me ...


  header-rule:  [ [ copy header thru {^M^J^M^J} | copy header thru 
  {^/^/} ] (write-client header )]
not sure if the second rule is needed ...
Chris
12-Mar-2005
[83x2]
;I tend to use charsets as a way of skipping 'thru while looking 
for multiple possibilities:
line-end: charset [#"^/" #"^M"] ; etc
line: complement newlines

parse m [copy header [some [some line line-end]]]
a way of skipping
 -> "an alternative to"