World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Graham 27-Jun-2006 [989]	Yeah ... it was a way to mark up text wherever a sequence of CAPS occurs
JaimeVargas 27-Jun-2006 [990]	Any how you got the idea. This type of problem could actually use the rewrite rules engine from gabriele. The principle is the same.
BrianH 27-Jun-2006 [991]	; Sorry, more fixes capitals: charset ["#"A" - #"Z"] alpha: charset ["#"A" - #"Z" #"a" - #"z"] non-alpha: complement alpha parse/all/case [any [ any non-alpha a: 5 capitals any capitals b: [non-alpha \| end] ( b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b ) :b \| some alpha ] to end]
Graham 27-Jun-2006 [992x2]	Brian ... your rules are incorrect.
Graham 27-Jun-2006 [992x2]	you have extra " in the charset defiitions
BrianH 27-Jun-2006 [994]	Right. I was running this in my head, as I don't have test data. REBOL usually catches syntax errors :)
Graham 27-Jun-2006 [995]	Actually I would like to add a parse problem to the weeklyblog and get people to submit answers :)
BrianH 27-Jun-2006 [996]	I use parse quite a bit. It's funny, I've never needed the GUI of View, but I use parse daily.
Graham 27-Jun-2006 [997x2]	And give a prize for the shortest answer
Graham 27-Jun-2006 [997x2]	say a copy of Microsoft VB :)
BrianH 27-Jun-2006 [999]	By shortest, go for most efficient. Otherwise variable naming becomes an issue.
JaimeVargas 27-Jun-2006 [1000]	Shorter may not be clearer or abstract enough. I prefer the something that can become an API an reused. But we will need to exclude the rewrite-rule of Gabriel. ;-)
BrianH 27-Jun-2006 [1001]	Hey, I worked on those rules, they're pretty good :)
Graham 27-Jun-2006 [1002]	shortest .. I mean the least number of words, and operators - not in length
JaimeVargas 27-Jun-2006 [1003]	Hahaha, no offense Brian. ;-)
Graham 27-Jun-2006 [1004]	Can't use this problem though .. this group is web public!
BrianH 27-Jun-2006 [1005]	It's too simple anyways.
Graham 27-Jun-2006 [1006x2]	I am hoping that it will be instructive ... as well.
Graham 27-Jun-2006 [1006x2]	so simple things are good to start with ... we can have harder ones once we see how people respond
BrianH 27-Jun-2006 [1008]	Seriously though, three charsets and two temporary variables, there's got to be a more efficient way.
Volker 27-Jun-2006 [1009]	should non-alpha be non-captitals? (running in head too)
BrianH 27-Jun-2006 [1010x2]	; Sorry, more fixes capitals: charset [#"A" - #"Z"] alpha: charset [#"A" - #"Z" #"a" - #"z"] non-alpha: complement alpha parse/all/case [any [to alpha [ a: 5 capitals any capitals b: [non-alpha \| end] ( b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b ) :b \| some alpha ]] to end]
BrianH 27-Jun-2006 [1010x2]	No, because that would allow words like this: ABCDEFghij
Volker 27-Jun-2006 [1012x2]	it would collect as long as there are capitals?
Volker 27-Jun-2006 [1012x2]	and "g" is none
BrianH 27-Jun-2006 [1014]	Well, that would be up to Graham. His original description would seem to exclude such words.
Volker 27-Jun-2006 [1015]	means all uppercase?
BrianH 27-Jun-2006 [1016]	As far as I can tell.
Graham 27-Jun-2006 [1017]	Yeah .. all uppercase ..
Volker 27-Jun-2006 [1018]	because " a: 5 capitals any capitals b:" stops at "g" and friends.
BrianH 27-Jun-2006 [1019]	More importantly, it fails at "g" and friends, backtracks and proceeds to the next alternate action, some alpha.
Volker 27-Jun-2006 [1020]	Late, but got it. it would enclose "ABCDEF" but should ignore it because of the small letters..
BrianH 27-Jun-2006 [1021x2]	Yup. Parse is fun.
BrianH 27-Jun-2006 [1021x2]	You can drop one charset by changing [non-alpha \| end] to [alpha end skip \| end \| none] .
Volker 27-Jun-2006 [1023]	would alpha break work?
BrianH 27-Jun-2006 [1024]	No, that would break out of the enclosing all loop. The end skip will always fail and proceed to the next alternate.
Tomc 27-Jun-2006 [1025]	capital: charset {ABCDEFGHIJKLMNOPQRSTUVWXYZ} latipac: complement capital rule: [ any latipac here: copy token some capital there: (all[ 4 < length? token insert :there "</strong>" insert :here "<strong>" there: skip :there 16 ]) :there ] parse/all/case txt [some rule]
Volker 27-Jun-2006 [1026]	problem is "Aa", thats aword, but notan all-uppcase-word. so it should be ignored.
Tomc 27-Jun-2006 [1027]	capital: charset {ABCDEFGHIJKLMNOPQRSTUVWXYZ} latipac: complement capital ws: charset { ^/^-} rule: [ any latipac here: copy token some capital there: opt [some ws (all[ 4 < length? token insert :there "</strong>" insert :here "<strong>" there: skip :there 16] ) ] :there ] parse/all/case txt [some rule]
BrianH 27-Jun-2006 [1028x2]	Fails on "aA".
BrianH 27-Jun-2006 [1028x2]	The inserts are a nice touch though.
Tomc 28-Jun-2006 [1030]	capital: charset {ABCDEFGHIJKLMNOPQRSTUVWXYZ} ws: charset { ^/^-} latipac: difference complement capital ws sub-rule: [ some capital there: [ws \| end] (all[ 4 < length? copy/part :here :there insert :there "</strong>" insert :here "<strong>" there: skip :there 17] ) ] rule: [ any latipac [ some ws here: sub-rule ]\|[skip there:] :there ] parse/all/case txt [here: opt sub-rule some rule]
BrianH 28-Jun-2006 [1031]	Doesn't take into account punctuation in the ws charset. This would fail on "HELLO, WORLD!"
Tomc 28-Jun-2006 [1032]	left as an exercise for the reader
BrianH 28-Jun-2006 [1033x2]	:-)
BrianH 28-Jun-2006 [1033x2]	Of course mine doesn't handle words with apostrophes or hyphens in them either. Easy fix though, just add ' and - to the capitals charset.
Graham 28-Jun-2006 [1035]	Actually my further spec for this requires the parser to detect spaces between capitalised words :)
BrianH 28-Jun-2006 [1036]	And do what?
Graham 28-Jun-2006 [1037]	treat the two capitalised words as one so <strong>HELLO DOLLY</strong>
BrianH 28-Jun-2006 [1038]	What about "HELLO, DOLLY!" or such?
older newer	first last