r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Graham
20-Mar-2005
[133]
actually I use this: parse msg [copy header thru {^M^/^M^/} copy 
body to end]
Tomc
20-Mar-2005
[134]
the last line matching rule in  header-rule  should be   |  to newline
Graham
20-Mar-2005
[135]
sorry?
Tomc
20-Mar-2005
[136x2]
to not break out of the rules before you reach the end of the header
if you came accross a novel header line before you came across  the 
To: line you would not get to the To: line
Graham
20-Mar-2005
[138x2]
header-rule: [
					thru "Date:" copy m-date to newline |
					thru "From:" copy m-from to newline |
					thru "Subject:" copy m-subject to newline |
					thru "To:" copy m-to to newline
				]
				m-subject: m-date: m-from: m-to: none

				parse header [header-rule some [ thru "^/" header-rule]]
that's what I have at present ...
Tomc
20-Mar-2005
[140]
just making it explicit
Graham
20-Mar-2005
[141x2]
I should remove those "thru"s I've got there.
this header-rule should now be applied each time I get  a "^/" ...
Tomc
20-Mar-2005
[143]
and if you had a header with a line that did not begin with  'Date, 
From, Subject or To  then you could prematurely break out of header-rule 
 before you got all your bits
Graham
20-Mar-2005
[144]
How ?
Vincent
20-Mar-2005
[145]
header-rule: [
    "Date:" copy m-date to newline |
    "From:" copy m-from to newline |
    "Subject:" copy m-subject to newline |
    "To:" copy m-to to newline |
    to newline
]
m-subject: m-date: m-from: m-to: none

parse header [header-rule some [ thru "^/" header-rule]]
Graham
20-Mar-2005
[146]
oh, I see ...
Vincent
20-Mar-2005
[147]
else
Date: ...
X-Something: ...   ; break the rule
To: ...
From: ...
Brett
20-Mar-2005
[148]
If you are testing "^/" I would think that you need to use parse/all.


You may find my script helpful for visualising the effect of your 
rules:


http://www.rebol.org/cgi-bin/cgiwrap/rebol/documentation.r?script=parse-analysis-view.r
Vincent
20-Mar-2005
[149]
oops - you're right, I missed the big one.
Graham
20-Mar-2005
[150]
so, is PCRE easier to understand ??
Tomc
20-Mar-2005
[151]
$&^*&#&%(*_&$*@#@
Graham
20-Mar-2005
[152]
looks like perl
Tomc
20-Mar-2005
[153]
that is just random chars not a pcre for paesing mail headers
Graham
20-Mar-2005
[154x2]
Oh :)
I was just attempting to bring the subject back on topic before I 
interrupted it.
Tomc
20-Mar-2005
[156]
that was not an interuprion , more liks exactly what this group is 
for
Graham
20-Mar-2005
[157]
since I have no idea what pcre was ..
Tomc
20-Mar-2005
[158x5]
. match any sigle char but newline
* 0 or more of the precedding
()   pit in var $n  [n1,2,3 ...]
/T0: (.*)
$1  has to whom the email is addressed
Graham
20-Mar-2005
[163]
While we're here .. what this taint thing that Perl has, and is it 
a concern for Rebol ?
Tomc
20-Mar-2005
[164]
tainting forces you to consider the users input  and explicitly allow 
it to pass
Anton
20-Mar-2005
[165]
I think only people who miss it want it. :)
BrianW
20-Mar-2005
[166]
Taint mode tells Perl that you aren't sure whether your incoming 
data is safe. It's just a shortcut for enforcing commonsense programming.
Graham
20-Mar-2005
[167]
so, it's to prevent incoming data being executed ?
Tomc
20-Mar-2005
[168x2]
you can write a well considered script without taint that is far 
more secure than a script that passes taint mode by making a simple 
rule that does not properly catch  problems
you basicky have to weite a regular expression to accept user input
Vincent
20-Mar-2005
[170]
Graham: for your header, like Brett said, parse/all is needed when 
you work on strings with newlines and spaces. last line should be:
parse/all header [header-rule some [ thru "^/" header-rule]]
BrianW
20-Mar-2005
[171]
Graham, yes, but it's also used in other situations: force the programmer 
to escape HTML input before printing it back out, massaging data 
so that it's friendlier for the database, etc.
Graham
20-Mar-2005
[172]
Yeah, I got that Vincent.  Curiously though it has worked without 
it.
Tomc
20-Mar-2005
[173x2]
in your example  having a rule more like 
header-rule: [
    "Date:" copy date-rule  |
    "From:" copy email-rule |
    "Subject:" copy some alpha-num  |
    "To:" copy email-rule |
    to newline
]

where email-rule only matched email addresses  
 would more taint like
and being very careful to never  effectivly 

do [ user-input]


without being sure user-input  could not cause unintended side effectd
Chris
31-Mar-2005
[175x3]
Not quite sure what to make of the following:
>> rule: [set w 'pubDate (print w)]
== [set w 'pubDate (print w)]
>> parse [pubdate] rule
pubdate
== true
>> parse/case [pubdate] rule
pubdate
== true
First off, would the last result be a bug?
Secondly, I'd like to ensure that whether the block is [pubdate] 
or [pubDate] that 'w stores 'pubDate.  I had hoped that as 'pubDate 
is set in the rule, it might take precedence over pubdate in the 
block :^(
DideC
1-Apr-2005
[178]
I suppose /case only act on string!
Gabriele
1-Apr-2005
[179x3]
/case only applies to strings. Chris, you can:
>> parse [pubdate] ['pubDate (w: 'pubDate print w)]
pubDate
== true
but i'm not sure you'll like it.
Graham
9-Apr-2005
[182]
should be something easier than this 

like-i: charset [ #"1" #"l" #"L" #"I" #"i" ]
like-a: charset [ #"a" #"A" #"@"  ]
like-v: charset [ #"\" #"/" #"v" #"V" ]


cialis: [ "c" like-i like-a 2 like-i "s" ]
viagra: [ 1 2 like-v like-i like-a "gr" like-a ]

parse "\/[1-:-gr]@" [ viagra ]
parse "[c1-:-Lls]" [ cialis ]