r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

BrianH
27-Jun-2006
[1014]
Well, that would be up to Graham. His original description would 
seem to exclude such words.
Volker
27-Jun-2006
[1015]
means all uppercase?
BrianH
27-Jun-2006
[1016]
As far as I can tell.
Graham
27-Jun-2006
[1017]
Yeah .. all uppercase ..
Volker
27-Jun-2006
[1018]
because " a: 5 capitals any capitals b:" stops at "g" and friends.
BrianH
27-Jun-2006
[1019]
More importantly, it fails at "g" and friends, backtracks and proceeds 
to the next alternate action, some alpha.
Volker
27-Jun-2006
[1020]
Late, but got it. it would enclose "ABCDEF" but should ignore it 
because of the small letters..
BrianH
27-Jun-2006
[1021x2]
Yup. Parse is fun.
You can drop one charset by changing [non-alpha | end] to [alpha 
end skip | end | none] .
Volker
27-Jun-2006
[1023]
would alpha break work?
BrianH
27-Jun-2006
[1024]
No, that would break out of the enclosing all loop. The end skip 
will always fail and proceed to the next alternate.
Tomc
27-Jun-2006
[1025]
capital: charset {ABCDEFGHIJKLMNOPQRSTUVWXYZ}
latipac: complement capital
rule: [
    any latipac here:
    copy token some capital there:
    (all[ 4 < length? token
        insert :there "</strong>"
        insert :here "<strong>"
        there: skip :there 16
    ])
    :there
]
parse/all/case txt [some rule]
Volker
27-Jun-2006
[1026]
problem is "Aa", thats aword, but notan all-uppcase-word. so it should 
be ignored.
Tomc
27-Jun-2006
[1027]
capital: charset {ABCDEFGHIJKLMNOPQRSTUVWXYZ}
latipac:  complement capital
ws: charset { ^/^-}
rule: [
    any latipac here:
    copy token some capital there:
    opt [some ws
        (all[ 4 < length? token
            insert :there "</strong>"
            insert :here  "<strong>"
            there: skip :there 16]
        )
    ]
    :there
]
parse/all/case txt [some rule]
BrianH
27-Jun-2006
[1028x2]
Fails on "aA".
The inserts are a nice touch though.
Tomc
28-Jun-2006
[1030]
capital: charset {ABCDEFGHIJKLMNOPQRSTUVWXYZ}
ws: charset { ^/^-}
latipac: difference complement capital ws


sub-rule: [
	some capital there:
	[ws | end]
	(all[ 4 < length? copy/part :here :there
		insert :there "</strong>"
		insert :here  "<strong>"
		there: skip :there 17]
	)
]
rule: [
	any latipac 
	[	some ws here:
		sub-rule
	]|[skip there:]
	:there
]
parse/all/case txt [here: opt sub-rule some rule]
BrianH
28-Jun-2006
[1031]
Doesn't take into account punctuation in the ws charset. This would 
fail on "HELLO, WORLD!"
Tomc
28-Jun-2006
[1032]
left as an exercise for the reader
BrianH
28-Jun-2006
[1033x2]
:-)
Of course mine doesn't handle words with apostrophes or hyphens in 
them either. Easy fix though, just add ' and - to the capitals charset.
Graham
28-Jun-2006
[1035]
Actually my further spec for this requires the parser to detect spaces 
between capitalised words :)
BrianH
28-Jun-2006
[1036]
And do what?
Graham
28-Jun-2006
[1037]
treat the two capitalised words as one so <strong>HELLO DOLLY</strong>
BrianH
28-Jun-2006
[1038]
What about "HELLO, DOLLY!" or such?
Graham
28-Jun-2006
[1039]
I think that punctuation is part of a word
BrianH
28-Jun-2006
[1040]
For that matter, what about words in quotes?
Graham
28-Jun-2006
[1041]
only if capitalised
BrianH
28-Jun-2006
[1042]
So, no difference.
Graham
28-Jun-2006
[1043x6]
I'll explain the purpose of all this.
A person is writing a text file.  It has headings which are denoted 
by caps, and terminating in ":".
But some headings are two or more words ... with the last terminating 
in ":" only.
Words inside the text, even in caps should not normally be highlighted.
that's the more complete spec.
Anyway, i have a working version now :)
BrianH
28-Jun-2006
[1049]
Well, I hope I helped :)
Graham
28-Jun-2006
[1050]
Yep .. thanks all.
Tomc
28-Jun-2006
[1051]
replace/all "</strong> <strong>" ""
[unknown: 9]
28-Jun-2006
[1052]
What is the best description of Parse?  I would like to point some 
people to Parse as an example of the power of Rebol
Henrik
28-Jun-2006
[1053]
reichart, I wrote one in the wikibook, don't know if it's useful.
[unknown: 9]
28-Jun-2006
[1054]
Since you wrote one, do you know of a better one?  This is not a 
reflection on yours, but it is a great way to know what you considered 
the next best thing.
Tomc
28-Jun-2006
[1055x2]
salvation from regular expressions
I may have added some the the rebol wikibook
BrianH
29-Jun-2006
[1057x3]
To use the simpler of the CS terms:


Parse is a rule-based, recursive-descent string and structure parser 
with backtracking. It is not a parser generator (like Lex/Yacc) or 
compiler (like most regex engines) - the engine follows the rules 
directly. Since Parse is recursive-descent it can handle patterns 
that regular expressions wouldn't be able to. Since Parse backtracks 
it can handle patterns that ordinary recursive-descent parsers can't.


Basically, it puts the text and structure processing abilities of 
Perl 5 to shame, let alone those of the lesser regex engines.


In theory, Perl 6 has caught up with REBOL, but Perl 6 only exists 
in theory for now. By the time it becomes actual REBOL should surpass 
it (especially if I have anything to say about it).
It's pretty easy to demonstrate patterns that regular expressions 
can't handle. It's only somewhat difficult to demonstrate patterns 
that can't be handled by a recursive descent parser without backtracking 
or unlimited lookahead.


I have never run into a pattern that can't be handled by Parse in 
theory - its only limits are in implementation (available memory 
and recursion depth). I am not qualified to describe its limits. 
Still, you have to be careful about how you write the rules or they 
will trip you up.
A little dry as explanations go, I suppose. You may get better luck 
by showing some magic parse code tricks :)
Volker
29-Jun-2006
[1060]
Somewhat buzzy: Its a simplified compiler-compiler. Could be used 
to build a java-compiler (eg such complex syntax), but its also as 
easy as regex for simpler things. But still readable. (less buzzy: 
not always that easy due to the poorer lockahead).
BrianH
29-Jun-2006
[1061]
Volker, it's more like it can do what a compiler-compiler can do 
without needing to compile :)

And backtracking is about the same as unlimited lookahead, but more 
powerful.
[unknown: 9]
29-Jun-2006
[1062]
Thanks Brian, but as is the theme with questions I ask, I don't ask 
for myself, but rather that the "world" can learn what "we" know. 
 So perhaps you should add your 2 cents to Henriks, and Tom's in 
a public forum of the Wikibook.
Volker
29-Jun-2006
[1063]
the compiling is no big argument  as compiler-compilers are for compiled 
languages anyway ;) the point is, you can mix a grammar and actions 
for semantics easy.