r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Graham
27-Jun-2006
[989]
Yeah ... it was a way to mark up text wherever a sequence of CAPS 
occurs
JaimeVargas
27-Jun-2006
[990]
Any how you got the idea. This type of problem could actually use 
the rewrite rules engine from gabriele. The principle is the same.
BrianH
27-Jun-2006
[991]
; Sorry, more fixes
capitals: charset ["#"A" - #"Z"]
alpha: charset ["#"A" - #"Z" #"a" - #"z"]
non-alpha: complement alpha
parse/all/case [any [

    any non-alpha a: 5 capitals any capitals b: [non-alpha | end] (

        b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b
    ) :b |
    some alpha
] to end]
Graham
27-Jun-2006
[992x2]
Brian ... your rules are incorrect.
you have extra " in the charset defiitions
BrianH
27-Jun-2006
[994]
Right. I was running this in my head, as I don't have test data. 
REBOL usually catches syntax errors :)
Graham
27-Jun-2006
[995]
Actually I would like to add a parse problem to the weeklyblog and 
get people to submit answers :)
BrianH
27-Jun-2006
[996]
I use parse quite a bit. It's funny, I've never needed the GUI of 
View, but I use parse daily.
Graham
27-Jun-2006
[997x2]
And give a prize for the shortest answer
say a copy of Microsoft VB :)
BrianH
27-Jun-2006
[999]
By shortest, go for most efficient. Otherwise variable naming becomes 
an issue.
JaimeVargas
27-Jun-2006
[1000]
Shorter may not be clearer or abstract enough. I prefer the something 
that can become an API an reused. But we will need to exclude the 
rewrite-rule of Gabriel. ;-)
BrianH
27-Jun-2006
[1001]
Hey, I worked on those rules, they're pretty good :)
Graham
27-Jun-2006
[1002]
shortest .. I mean the least number of words, and operators - not 
in length
JaimeVargas
27-Jun-2006
[1003]
Hahaha, no offense Brian. ;-)
Graham
27-Jun-2006
[1004]
Can't use this problem though .. this group is web public!
BrianH
27-Jun-2006
[1005]
It's too simple anyways.
Graham
27-Jun-2006
[1006x2]
I am hoping that it will be instructive ... as well.
so simple things are good to start with ... we can have harder ones 
once we see how people respond
BrianH
27-Jun-2006
[1008]
Seriously though, three charsets and two temporary variables, there's 
got to be a more efficient way.
Volker
27-Jun-2006
[1009]
should non-alpha be non-captitals? (running in head too)
BrianH
27-Jun-2006
[1010x2]
; Sorry, more fixes
capitals: charset [#"A" - #"Z"]
alpha: charset [#"A" - #"Z" #"a" - #"z"]
non-alpha: complement alpha
parse/all/case [any [to alpha [
    a: 5 capitals any capitals b: [non-alpha | end] (

        b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b
    ) :b |
    some alpha
]] to end]
No, because that would allow words like this: ABCDEFghij
Volker
27-Jun-2006
[1012x2]
it would collect as long as there are capitals?
and "g" is none
BrianH
27-Jun-2006
[1014]
Well, that would be up to Graham. His original description would 
seem to exclude such words.
Volker
27-Jun-2006
[1015]
means all uppercase?
BrianH
27-Jun-2006
[1016]
As far as I can tell.
Graham
27-Jun-2006
[1017]
Yeah .. all uppercase ..
Volker
27-Jun-2006
[1018]
because " a: 5 capitals any capitals b:" stops at "g" and friends.
BrianH
27-Jun-2006
[1019]
More importantly, it fails at "g" and friends, backtracks and proceeds 
to the next alternate action, some alpha.
Volker
27-Jun-2006
[1020]
Late, but got it. it would enclose "ABCDEF" but should ignore it 
because of the small letters..
BrianH
27-Jun-2006
[1021x2]
Yup. Parse is fun.
You can drop one charset by changing [non-alpha | end] to [alpha 
end skip | end | none] .
Volker
27-Jun-2006
[1023]
would alpha break work?
BrianH
27-Jun-2006
[1024]
No, that would break out of the enclosing all loop. The end skip 
will always fail and proceed to the next alternate.
Tomc
27-Jun-2006
[1025]
capital: charset {ABCDEFGHIJKLMNOPQRSTUVWXYZ}
latipac: complement capital
rule: [
    any latipac here:
    copy token some capital there:
    (all[ 4 < length? token
        insert :there "</strong>"
        insert :here "<strong>"
        there: skip :there 16
    ])
    :there
]
parse/all/case txt [some rule]
Volker
27-Jun-2006
[1026]
problem is "Aa", thats aword, but notan all-uppcase-word. so it should 
be ignored.
Tomc
27-Jun-2006
[1027]
capital: charset {ABCDEFGHIJKLMNOPQRSTUVWXYZ}
latipac:  complement capital
ws: charset { ^/^-}
rule: [
    any latipac here:
    copy token some capital there:
    opt [some ws
        (all[ 4 < length? token
            insert :there "</strong>"
            insert :here  "<strong>"
            there: skip :there 16]
        )
    ]
    :there
]
parse/all/case txt [some rule]
BrianH
27-Jun-2006
[1028x2]
Fails on "aA".
The inserts are a nice touch though.
Tomc
28-Jun-2006
[1030]
capital: charset {ABCDEFGHIJKLMNOPQRSTUVWXYZ}
ws: charset { ^/^-}
latipac: difference complement capital ws


sub-rule: [
	some capital there:
	[ws | end]
	(all[ 4 < length? copy/part :here :there
		insert :there "</strong>"
		insert :here  "<strong>"
		there: skip :there 17]
	)
]
rule: [
	any latipac 
	[	some ws here:
		sub-rule
	]|[skip there:]
	:there
]
parse/all/case txt [here: opt sub-rule some rule]
BrianH
28-Jun-2006
[1031]
Doesn't take into account punctuation in the ws charset. This would 
fail on "HELLO, WORLD!"
Tomc
28-Jun-2006
[1032]
left as an exercise for the reader
BrianH
28-Jun-2006
[1033x2]
:-)
Of course mine doesn't handle words with apostrophes or hyphens in 
them either. Easy fix though, just add ' and - to the capitals charset.
Graham
28-Jun-2006
[1035]
Actually my further spec for this requires the parser to detect spaces 
between capitalised words :)
BrianH
28-Jun-2006
[1036]
And do what?
Graham
28-Jun-2006
[1037]
treat the two capitalised words as one so <strong>HELLO DOLLY</strong>
BrianH
28-Jun-2006
[1038]
What about "HELLO, DOLLY!" or such?