r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Graham
20-Mar-2005
[167]
so, it's to prevent incoming data being executed ?
Tomc
20-Mar-2005
[168x2]
you can write a well considered script without taint that is far 
more secure than a script that passes taint mode by making a simple 
rule that does not properly catch  problems
you basicky have to weite a regular expression to accept user input
Vincent
20-Mar-2005
[170]
Graham: for your header, like Brett said, parse/all is needed when 
you work on strings with newlines and spaces. last line should be:
parse/all header [header-rule some [ thru "^/" header-rule]]
BrianW
20-Mar-2005
[171]
Graham, yes, but it's also used in other situations: force the programmer 
to escape HTML input before printing it back out, massaging data 
so that it's friendlier for the database, etc.
Graham
20-Mar-2005
[172]
Yeah, I got that Vincent.  Curiously though it has worked without 
it.
Tomc
20-Mar-2005
[173x2]
in your example  having a rule more like 
header-rule: [
    "Date:" copy date-rule  |
    "From:" copy email-rule |
    "Subject:" copy some alpha-num  |
    "To:" copy email-rule |
    to newline
]

where email-rule only matched email addresses  
 would more taint like
and being very careful to never  effectivly 

do [ user-input]


without being sure user-input  could not cause unintended side effectd
Chris
31-Mar-2005
[175x3]
Not quite sure what to make of the following:
>> rule: [set w 'pubDate (print w)]
== [set w 'pubDate (print w)]
>> parse [pubdate] rule
pubdate
== true
>> parse/case [pubdate] rule
pubdate
== true
First off, would the last result be a bug?
Secondly, I'd like to ensure that whether the block is [pubdate] 
or [pubDate] that 'w stores 'pubDate.  I had hoped that as 'pubDate 
is set in the rule, it might take precedence over pubdate in the 
block :^(
DideC
1-Apr-2005
[178]
I suppose /case only act on string!
Gabriele
1-Apr-2005
[179x3]
/case only applies to strings. Chris, you can:
>> parse [pubdate] ['pubDate (w: 'pubDate print w)]
pubDate
== true
but i'm not sure you'll like it.
Graham
9-Apr-2005
[182x2]
should be something easier than this 

like-i: charset [ #"1" #"l" #"L" #"I" #"i" ]
like-a: charset [ #"a" #"A" #"@"  ]
like-v: charset [ #"\" #"/" #"v" #"V" ]


cialis: [ "c" like-i like-a 2 like-i "s" ]
viagra: [ 1 2 like-v like-i like-a "gr" like-a ]

parse "\/[1-:-gr]@" [ viagra ]
parse "[c1-:-Lls]" [ cialis ]
hmm.. altme converts my double quote to a single quote
Gabriele
9-Apr-2005
[184]
maybe use   charset "1lLIi"    to avoid that much typing ;)
Anton
9-Apr-2005
[185]
Graham, the link width is slightly incorrect, so it obscures half 
of the double quote, so it looks like a single.
Tomc
28-Apr-2005
[186x4]
flatten: func [b [block!] /local flat][
	flat: copy[]
	rule: [
		some[
			[x: block! (parse first :x rule)] |
			[copy token any-type! (append flat token)]
		]
	]
	parse b rule
	flat
]
without the recursive call to parse
flatten: func [b [block!] /local flat rule x][
	flat: copy[]

 rule: [some[[x: block! :x into rule] | [copy token any-type! (append 
 flat token)]]]
	parse b rule
	flat
]
a flatten  that changed it's block in place would be useful at times
Gregg
30-Apr-2005
[190]
Something like this? (it's not parse based though)

    flatten: func [block] [
        head forall block [
            if block? block/1 [change/part block block/1 1]
        ]
    ]
Robert
5-Jun-2005
[191x2]
I have a problem with parse not terminating the parsing. Here is 
my code for parsing CamelCase words:

rebol []

; CamelCase Test


test-text: "FirstWord test. This is a CamelCase test Text. CamelCase2 
is the base idea for a WiKi. CamelcasE"

upper-case: charset "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
delimiters:	charset " .,;|^-^/"
rest-chars: complement union upper-case delimiters

text: ""

parse/all/case test-text [
	some [

   copy camelcase-word [upper-case some rest-chars upper-case any rest-chars] 
   (
		 	if not empty? text [?? text clear text]
		 	print ["CamelCase word found:" camelcase-word]
		)

  | copy flowtext [any [rest-chars | upper-case] any delimiters] (
			append text flowtext
		)
	]
]

halt
Any idea why parse doesn't return?
sqlab
5-Jun-2005
[193]
[any [rest-chars | upper-case] any delimiters] is always true, even 
if there is no char left  at the end. But it does not move the cursor.
Tomc
5-Jun-2005
[194]
rebol []

; CamelCase Test


test-text: "FirstWord test. This is a CamelCase test Text. CamelCase2 
is the base idea for a WiKi. CamelcasE"

upper-case: charset "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
delimiter:	charset " .,;|^-^/"
rest-char: complement union upper-case delimiter

text: copy ""


camelcase-rule: [some [upper-case some rest-char upper-case any rest-char] 
delimiter]

parse/all/case test-text[
	some [ 
			copy camelcase-word  camelcase-rule
				(if not empty? text [?? text clear text]
		 		print ["CamelCase word found: " camelcase-word]
				)
			| 
			copy flowtext upper-case 
				(append text flowtext)
			|
			copy flowtext[any [rest-char | delimiter]] 
				(append text flowtext)
	]
]
halt
Graham
5-Jun-2005
[195x2]
what about camelCAse?
Personally I prefer the way mediawiki does it ... using [[ .. ]] 
... instead of having strange cases in words
Tomc
5-Jun-2005
[197]
yes, I was also concerned about  A1CamelCase  but figured Robert 
 just needed to get thru his first question first
Robert
6-Jun-2005
[198x2]
Thanks, for the fix. Sometimes it helps to get some distance by asking 
others :-))
I like CamelCase words. Simple to remember and use. IIRC camelCAse 
is not a valid CamelCase word. But anyway, it depends how I teach 
my users :-))
Graham
6-Jun-2005
[200]
http://en.wikipedia.org/wiki/CamelCase... CamelCase is referred 
to UpperCamelCase, and camelCase is referred to as lowerCamelCase
Robert
6-Jun-2005
[201]
Tom, your example doesn't terminate, like mine. The thing IMO is 
that the last Word is a CamelCase word and the 'end condition is 
somehow missed. It nevery reaches the halt.
sqlab
6-Jun-2005
[202]
If you do not want to change the parse rules, you can just add
	if not flowtext [halt]
before
	append text flowtext
Robert
6-Jun-2005
[203]
I can change the parse rules. This is just a test script, the rule 
needs to be included in a broader parsing engine. So, it must return 
TRUE.
Tomc
6-Jun-2005
[204]
Robet you also have to worry about  YaBaDaBaDoCamelCases  (even and 
odd) 

to get it to return true ,  figure out what is left when the outter 
most  some finishes.
parse ...[
	some [
		...
	]
	copy remenant to end ( print remenant)
]


then make  the  your rule cpnsume the remenant ok if you  don't care 
just put a  
	to end
there
sqlab
7-Jun-2005
[205x2]
You can either put your parse in a catch [] and throw a true if not 
flowtext 
or something like this
parse/all/case test-text [
	some [

   copy camelcase-word [upper-case some rest-chars upper-case any rest-chars] 
   (
		 	if not empty? text [?? text clear text]
		 	print ["CamelCase word found:" camelcase-word]
		)

  | copy flowtext [some [rest-chars | upper-case] any delimiters] (
			append text flowtext
		)
		| copy flowtext [some delimiters] (
			append text flowtext
		)
	] to end
]
addendum/corrected
	] to end (if not empty? text [?? text])
MichaelAppelmans
7-Jun-2005
[207x2]
getting the following error  when running Didec's delete email script 
against a mailbox with large number of emails (250+):internal limit 
reached: parse
Near: [parse data maillist
   addr-list]
Where: parse-mail-list
is this a rebol internal limit of should i start debugging?
Graham
7-Jun-2005
[209x2]
probably a parse limitation
I think I've opened up mailboxes with over 400 emails before using 
Cerebrus' mailbox manager with no problems
MichaelAppelmans
7-Jun-2005
[211]
oh well. thanks :)
Graham
7-Jun-2005
[212]
what you could do, is extract Didier's implementation of the TOP 
command, and then get the first line of each header in your mailbox. 
 If it has the return-path set to <>, then note it in a list.  When 
finished, go thru and issue deletes on all of those.
MichaelAppelmans
7-Jun-2005
[213]
thanks! I'll have a look at that.
Gabriele
7-Jun-2005
[214]
is the To: line very, very long? there's a recursion limit in the 
parser for the address list. since you are probably not interested 
in parsing the To: header, maybe you can disable it in import-email.
Robert
8-Jun-2005
[215x2]
Hmm... my parse still not termines the 'some part. I never reach 
the end. The problem is that the rest of the string is "" and this 
seems not to be handled.
Ok, got it. Now it works.