r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

BrianW
20-Mar-2005
[102]
It wouldn't have to be industrial-strength, but it would like a security 
blanket for developers experimenting with the new language. PCRE 
is found all over the place in languages on Linux machines, and the 
absence makes some developers uncomfortable - despite the fact that 
Parse is better.
Tomc
20-Mar-2005
[103]
yea but then rebol programs would start getting comtaminated with 
unfriendly gobbeldy gook and rebol developers would have to learn 
pcre
BrianW
20-Mar-2005
[104]
Good point. One article I was thinking would be along the lines of 
a "phrasebook", translating PCRE concepts to Parse equivalents.
Graham
20-Mar-2005
[105]
sometimes it is just is too hard to get parse working ...an alternative 
would be nice
BrianW
20-Mar-2005
[106]
What about a parse rule that takes pcre strings as input and produces 
a parse rule as output?
Graham
20-Mar-2005
[107]
I've got this rule to parse email headers which only works some of 
the time.

				header-rule: [
					thru "^/Date:" copy m-date to newline |
					thru "^/From:" copy m-from to newline |
					thru "^/Subject:" copy m-subject to newline |
					thru "^/To:" copy m-to to newline |
					thru "^/Return-path: "
				]
				m-subject: m-date: m-from: m-to: none

				parse header [some header-rule]
Tomc
20-Mar-2005
[108x2]
I am not totatly against REs I use them all the time in shells, and 
having them built in would make writing "work alike" programs easier 
but over all , it  seems to me like a step down
(you can garuntee the order in which the header lines come?
BrianW
20-Mar-2005
[110]
No, order may vary.
Graham
20-Mar-2005
[111]
no, that's why I use  "some"
Tomc
20-Mar-2005
[112]
but it will only work when order is the same
Graham
20-Mar-2005
[113]
I was under the impression that it would keep applying the rule ...
Tomc
20-Mar-2005
[114]
thru "^To:"  is thru to even if you bypass other valid lined to get 
there
BrianW
20-Mar-2005
[115]
So how would he say "Any of these in any order?"
Vincent
20-Mar-2005
[116]
you should only go 'thru the common line start. try something like:
header-rule: [
    "Date:" copy ... to newline |
    "From:" copy ... to newline |
...
]
parse header [some [thru "^/" header-rule]
Tomc
20-Mar-2005
[117]
I will be a few to make concreat but basicly you work with what is 
common to all lines , in this case colons and newlines
Graham
20-Mar-2005
[118]
Hmm.  Works so far :)
Vincent
20-Mar-2005
[119]
sorry, I missed a problem in this expression: the header must start 
with a newline, so
parse header [header-rule some [thru "^/" header-rule]]
is better
Tomc
20-Mar-2005
[120x3]
might have to be careful if you want the first line
ahh
got you caught it
Graham
20-Mar-2005
[123x3]
Ahh...well, invariablly the first line of the header is "Return-path:" 
so i'ts not a problem.
invariable because if it's not there, I add it!
Thanks, I should asked much sooner rather than struggling with it.
Tomc
20-Mar-2005
[126]
what about when you get novel headders? do you care?
Graham
20-Mar-2005
[127x3]
no, these are the ones I display when reading an email ...
If the user requests full header display, I just show them raw.
I need the "^/... as nowadays, there's email coming thru with authentication 
signatures that contain the headers in a block
Tomc
20-Mar-2005
[130]
so once you have done some header-lines  and got the ones you are 
interested in you skip the rest with thru "^/^/"
Graham
20-Mar-2005
[131x3]
actually, I copy the header  and body out first and process them 
separately.
parse msg [ copy header thru {^/^} copy body to end ]
actually I use this: parse msg [copy header thru {^M^/^M^/} copy 
body to end]
Tomc
20-Mar-2005
[134]
the last line matching rule in  header-rule  should be   |  to newline
Graham
20-Mar-2005
[135]
sorry?
Tomc
20-Mar-2005
[136x2]
to not break out of the rules before you reach the end of the header
if you came accross a novel header line before you came across  the 
To: line you would not get to the To: line
Graham
20-Mar-2005
[138x2]
header-rule: [
					thru "Date:" copy m-date to newline |
					thru "From:" copy m-from to newline |
					thru "Subject:" copy m-subject to newline |
					thru "To:" copy m-to to newline
				]
				m-subject: m-date: m-from: m-to: none

				parse header [header-rule some [ thru "^/" header-rule]]
that's what I have at present ...
Tomc
20-Mar-2005
[140]
just making it explicit
Graham
20-Mar-2005
[141x2]
I should remove those "thru"s I've got there.
this header-rule should now be applied each time I get  a "^/" ...
Tomc
20-Mar-2005
[143]
and if you had a header with a line that did not begin with  'Date, 
From, Subject or To  then you could prematurely break out of header-rule 
 before you got all your bits
Graham
20-Mar-2005
[144]
How ?
Vincent
20-Mar-2005
[145]
header-rule: [
    "Date:" copy m-date to newline |
    "From:" copy m-from to newline |
    "Subject:" copy m-subject to newline |
    "To:" copy m-to to newline |
    to newline
]
m-subject: m-date: m-from: m-to: none

parse header [header-rule some [ thru "^/" header-rule]]
Graham
20-Mar-2005
[146]
oh, I see ...
Vincent
20-Mar-2005
[147]
else
Date: ...
X-Something: ...   ; break the rule
To: ...
From: ...
Brett
20-Mar-2005
[148]
If you are testing "^/" I would think that you need to use parse/all.


You may find my script helpful for visualising the effect of your 
rules:


http://www.rebol.org/cgi-bin/cgiwrap/rebol/documentation.r?script=parse-analysis-view.r
Vincent
20-Mar-2005
[149]
oops - you're right, I missed the big one.
Graham
20-Mar-2005
[150]
so, is PCRE easier to understand ??
Tomc
20-Mar-2005
[151]
$&^*&#&%(*_&$*@#@