r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Graham
27-Jun-2006
[1007]
so simple things are good to start with ... we can have harder ones 
once we see how people respond
BrianH
27-Jun-2006
[1008]
Seriously though, three charsets and two temporary variables, there's 
got to be a more efficient way.
Volker
27-Jun-2006
[1009]
should non-alpha be non-captitals? (running in head too)
BrianH
27-Jun-2006
[1010x2]
; Sorry, more fixes
capitals: charset [#"A" - #"Z"]
alpha: charset [#"A" - #"Z" #"a" - #"z"]
non-alpha: complement alpha
parse/all/case [any [to alpha [
    a: 5 capitals any capitals b: [non-alpha | end] (

        b: change/part a rejoin ["<strong>" copy/part a b "</strong>"] b
    ) :b |
    some alpha
]] to end]
No, because that would allow words like this: ABCDEFghij
Volker
27-Jun-2006
[1012x2]
it would collect as long as there are capitals?
and "g" is none
BrianH
27-Jun-2006
[1014]
Well, that would be up to Graham. His original description would 
seem to exclude such words.
Volker
27-Jun-2006
[1015]
means all uppercase?
BrianH
27-Jun-2006
[1016]
As far as I can tell.
Graham
27-Jun-2006
[1017]
Yeah .. all uppercase ..
Volker
27-Jun-2006
[1018]
because " a: 5 capitals any capitals b:" stops at "g" and friends.
BrianH
27-Jun-2006
[1019]
More importantly, it fails at "g" and friends, backtracks and proceeds 
to the next alternate action, some alpha.
Volker
27-Jun-2006
[1020]
Late, but got it. it would enclose "ABCDEF" but should ignore it 
because of the small letters..
BrianH
27-Jun-2006
[1021x2]
Yup. Parse is fun.
You can drop one charset by changing [non-alpha | end] to [alpha 
end skip | end | none] .
Volker
27-Jun-2006
[1023]
would alpha break work?
BrianH
27-Jun-2006
[1024]
No, that would break out of the enclosing all loop. The end skip 
will always fail and proceed to the next alternate.
Tomc
27-Jun-2006
[1025]
capital: charset {ABCDEFGHIJKLMNOPQRSTUVWXYZ}
latipac: complement capital
rule: [
    any latipac here:
    copy token some capital there:
    (all[ 4 < length? token
        insert :there "</strong>"
        insert :here "<strong>"
        there: skip :there 16
    ])
    :there
]
parse/all/case txt [some rule]
Volker
27-Jun-2006
[1026]
problem is "Aa", thats aword, but notan all-uppcase-word. so it should 
be ignored.
Tomc
27-Jun-2006
[1027]
capital: charset {ABCDEFGHIJKLMNOPQRSTUVWXYZ}
latipac:  complement capital
ws: charset { ^/^-}
rule: [
    any latipac here:
    copy token some capital there:
    opt [some ws
        (all[ 4 < length? token
            insert :there "</strong>"
            insert :here  "<strong>"
            there: skip :there 16]
        )
    ]
    :there
]
parse/all/case txt [some rule]
BrianH
27-Jun-2006
[1028x2]
Fails on "aA".
The inserts are a nice touch though.
Tomc
28-Jun-2006
[1030]
capital: charset {ABCDEFGHIJKLMNOPQRSTUVWXYZ}
ws: charset { ^/^-}
latipac: difference complement capital ws


sub-rule: [
	some capital there:
	[ws | end]
	(all[ 4 < length? copy/part :here :there
		insert :there "</strong>"
		insert :here  "<strong>"
		there: skip :there 17]
	)
]
rule: [
	any latipac 
	[	some ws here:
		sub-rule
	]|[skip there:]
	:there
]
parse/all/case txt [here: opt sub-rule some rule]
BrianH
28-Jun-2006
[1031]
Doesn't take into account punctuation in the ws charset. This would 
fail on "HELLO, WORLD!"
Tomc
28-Jun-2006
[1032]
left as an exercise for the reader
BrianH
28-Jun-2006
[1033x2]
:-)
Of course mine doesn't handle words with apostrophes or hyphens in 
them either. Easy fix though, just add ' and - to the capitals charset.
Graham
28-Jun-2006
[1035]
Actually my further spec for this requires the parser to detect spaces 
between capitalised words :)
BrianH
28-Jun-2006
[1036]
And do what?
Graham
28-Jun-2006
[1037]
treat the two capitalised words as one so <strong>HELLO DOLLY</strong>
BrianH
28-Jun-2006
[1038]
What about "HELLO, DOLLY!" or such?
Graham
28-Jun-2006
[1039]
I think that punctuation is part of a word
BrianH
28-Jun-2006
[1040]
For that matter, what about words in quotes?
Graham
28-Jun-2006
[1041]
only if capitalised
BrianH
28-Jun-2006
[1042]
So, no difference.
Graham
28-Jun-2006
[1043x6]
I'll explain the purpose of all this.
A person is writing a text file.  It has headings which are denoted 
by caps, and terminating in ":".
But some headings are two or more words ... with the last terminating 
in ":" only.
Words inside the text, even in caps should not normally be highlighted.
that's the more complete spec.
Anyway, i have a working version now :)
BrianH
28-Jun-2006
[1049]
Well, I hope I helped :)
Graham
28-Jun-2006
[1050]
Yep .. thanks all.
Tomc
28-Jun-2006
[1051]
replace/all "</strong> <strong>" ""
[unknown: 9]
28-Jun-2006
[1052]
What is the best description of Parse?  I would like to point some 
people to Parse as an example of the power of Rebol
Henrik
28-Jun-2006
[1053]
reichart, I wrote one in the wikibook, don't know if it's useful.
[unknown: 9]
28-Jun-2006
[1054]
Since you wrote one, do you know of a better one?  This is not a 
reflection on yours, but it is a great way to know what you considered 
the next best thing.
Tomc
28-Jun-2006
[1055x2]
salvation from regular expressions
I may have added some the the rebol wikibook