World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Graham 20-Mar-2005 [167]	so, it's to prevent incoming data being executed ?
Tomc 20-Mar-2005 [168x2]	you can write a well considered script without taint that is far more secure than a script that passes taint mode by making a simple rule that does not properly catch problems
Tomc 20-Mar-2005 [168x2]	you basicky have to weite a regular expression to accept user input
Vincent 20-Mar-2005 [170]	Graham: for your header, like Brett said, parse/all is needed when you work on strings with newlines and spaces. last line should be: parse/all header [header-rule some [ thru "^/" header-rule]]
BrianW 20-Mar-2005 [171]	Graham, yes, but it's also used in other situations: force the programmer to escape HTML input before printing it back out, massaging data so that it's friendlier for the database, etc.
Graham 20-Mar-2005 [172]	Yeah, I got that Vincent. Curiously though it has worked without it.
Tomc 20-Mar-2005 [173x2]	in your example having a rule more like header-rule: [ "Date:" copy date-rule \| "From:" copy email-rule \| "Subject:" copy some alpha-num \| "To:" copy email-rule \| to newline ] where email-rule only matched email addresses would more taint like
Tomc 20-Mar-2005 [173x2]	and being very careful to never effectivly do [ user-input] without being sure user-input could not cause unintended side effectd
Chris 31-Mar-2005 [175x3]	Not quite sure what to make of the following: >> rule: [set w 'pubDate (print w)] == [set w 'pubDate (print w)] >> parse [pubdate] rule pubdate == true >> parse/case [pubdate] rule pubdate == true
	First off, would the last result be a bug?
	Secondly, I'd like to ensure that whether the block is [pubdate] or [pubDate] that 'w stores 'pubDate. I had hoped that as 'pubDate is set in the rule, it might take precedence over pubdate in the block :^(
DideC 1-Apr-2005 [178]	I suppose /case only act on string!
Gabriele 1-Apr-2005 [179x3]	/case only applies to strings. Chris, you can:
	>> parse [pubdate] ['pubDate (w: 'pubDate print w)] pubDate == true
	but i'm not sure you'll like it.
Graham 9-Apr-2005 [182x2]	should be something easier than this like-i: charset [ #"1" #"l" #"L" #"I" #"i" ] like-a: charset [ #"a" #"A" #"@" ] like-v: charset [ #"\" #"/" #"v" #"V" ] cialis: [ "c" like-i like-a 2 like-i "s" ] viagra: [ 1 2 like-v like-i like-a "gr" like-a ] parse "\/[1-:-gr]@" [ viagra ] parse "[c1-:-Lls]" [ cialis ]
Graham 9-Apr-2005 [182x2]	hmm.. altme converts my double quote to a single quote
Gabriele 9-Apr-2005 [184]	maybe use charset "1lLIi" to avoid that much typing ;)
Anton 9-Apr-2005 [185]	Graham, the link width is slightly incorrect, so it obscures half of the double quote, so it looks like a single.
Tomc 28-Apr-2005 [186x4]	flatten: func [b [block!] /local flat][ flat: copy[] rule: [ some[ [x: block! (parse first :x rule)] \| [copy token any-type! (append flat token)] ] ] parse b rule flat ]
	without the recursive call to parse
	flatten: func [b [block!] /local flat rule x][ flat: copy[] rule: [some[[x: block! :x into rule] \| [copy token any-type! (append flat token)]]] parse b rule flat ]
	a flatten that changed it's block in place would be useful at times
Gregg 30-Apr-2005 [190]	Something like this? (it's not parse based though) flatten: func [block] [ head forall block [ if block? block/1 [change/part block block/1 1] ] ]
Robert 5-Jun-2005 [191x2]	I have a problem with parse not terminating the parsing. Here is my code for parsing CamelCase words: rebol [] ; CamelCase Test test-text: "FirstWord test. This is a CamelCase test Text. CamelCase2 is the base idea for a WiKi. CamelcasE" upper-case: charset "ABCDEFGHIJKLMNOPQRSTUVWXYZ" delimiters: charset " .,;\|^-^/" rest-chars: complement union upper-case delimiters text: "" parse/all/case test-text [ some [ copy camelcase-word [upper-case some rest-chars upper-case any rest-chars] ( if not empty? text [?? text clear text] print ["CamelCase word found:" camelcase-word] ) \| copy flowtext [any [rest-chars \| upper-case] any delimiters] ( append text flowtext ) ] ] halt
Robert 5-Jun-2005 [191x2]	Any idea why parse doesn't return?
sqlab 5-Jun-2005 [193]	[any [rest-chars \| upper-case] any delimiters] is always true, even if there is no char left at the end. But it does not move the cursor.
Tomc 5-Jun-2005 [194]	rebol [] ; CamelCase Test test-text: "FirstWord test. This is a CamelCase test Text. CamelCase2 is the base idea for a WiKi. CamelcasE" upper-case: charset "ABCDEFGHIJKLMNOPQRSTUVWXYZ" delimiter: charset " .,;\|^-^/" rest-char: complement union upper-case delimiter text: copy "" camelcase-rule: [some [upper-case some rest-char upper-case any rest-char] delimiter] parse/all/case test-text[ some [ copy camelcase-word camelcase-rule (if not empty? text [?? text clear text] print ["CamelCase word found: " camelcase-word] ) \| copy flowtext upper-case (append text flowtext) \| copy flowtext[any [rest-char \| delimiter]] (append text flowtext) ] ] halt
Graham 5-Jun-2005 [195x2]	what about camelCAse?
Graham 5-Jun-2005 [195x2]	Personally I prefer the way mediawiki does it ... using [[ .. ]] ... instead of having strange cases in words
Tomc 5-Jun-2005 [197]	yes, I was also concerned about A1CamelCase but figured Robert just needed to get thru his first question first
Robert 6-Jun-2005 [198x2]	Thanks, for the fix. Sometimes it helps to get some distance by asking others :-))
Robert 6-Jun-2005 [198x2]	I like CamelCase words. Simple to remember and use. IIRC camelCAse is not a valid CamelCase word. But anyway, it depends how I teach my users :-))
Graham 6-Jun-2005 [200]	http://en.wikipedia.org/wiki/CamelCase... CamelCase is referred to UpperCamelCase, and camelCase is referred to as lowerCamelCase
Robert 6-Jun-2005 [201]	Tom, your example doesn't terminate, like mine. The thing IMO is that the last Word is a CamelCase word and the 'end condition is somehow missed. It nevery reaches the halt.
sqlab 6-Jun-2005 [202]	If you do not want to change the parse rules, you can just add if not flowtext [halt] before append text flowtext
Robert 6-Jun-2005 [203]	I can change the parse rules. This is just a test script, the rule needs to be included in a broader parsing engine. So, it must return TRUE.
Tomc 6-Jun-2005 [204]	Robet you also have to worry about YaBaDaBaDoCamelCases (even and odd) to get it to return true , figure out what is left when the outter most some finishes. parse ...[ some [ ... ] copy remenant to end ( print remenant) ] then make the your rule cpnsume the remenant ok if you don't care just put a to end there
sqlab 7-Jun-2005 [205x2]	You can either put your parse in a catch [] and throw a true if not flowtext or something like this parse/all/case test-text [ some [ copy camelcase-word [upper-case some rest-chars upper-case any rest-chars] ( if not empty? text [?? text clear text] print ["CamelCase word found:" camelcase-word] ) \| copy flowtext [some [rest-chars \| upper-case] any delimiters] ( append text flowtext ) \| copy flowtext [some delimiters] ( append text flowtext ) ] to end ]
sqlab 7-Jun-2005 [205x2]	addendum/corrected ] to end (if not empty? text [?? text])
MichaelAppelmans 7-Jun-2005 [207x2]	getting the following error when running Didec's delete email script against a mailbox with large number of emails (250+):internal limit reached: parse Near: [parse data maillist addr-list] Where: parse-mail-list
MichaelAppelmans 7-Jun-2005 [207x2]	is this a rebol internal limit of should i start debugging?
Graham 7-Jun-2005 [209x2]	probably a parse limitation
Graham 7-Jun-2005 [209x2]	I think I've opened up mailboxes with over 400 emails before using Cerebrus' mailbox manager with no problems
MichaelAppelmans 7-Jun-2005 [211]	oh well. thanks :)
Graham 7-Jun-2005 [212]	what you could do, is extract Didier's implementation of the TOP command, and then get the first line of each header in your mailbox. If it has the return-path set to <>, then note it in a list. When finished, go thru and issue deletes on all of those.
MichaelAppelmans 7-Jun-2005 [213]	thanks! I'll have a look at that.
Gabriele 7-Jun-2005 [214]	is the To: line very, very long? there's a recursion limit in the parser for the address list. since you are probably not interested in parsing the To: header, maybe you can disable it in import-email.
Robert 8-Jun-2005 [215x2]	Hmm... my parse still not termines the 'some part. I never reach the end. The problem is that the rest of the string is "" and this seems not to be handled.
Robert 8-Jun-2005 [215x2]	Ok, got it. Now it works.
older newer	first last