World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Graham 13-Aug-2005 [303] | sometimes I find it easier to change the data than to change the rule :) |
BrianW 13-Aug-2005 [304x2] | Hey, as long as it works. |
Working on a textile parser over here to build my 'parse skills and make it easier to build my website with Rebol | |
Graham 13-Aug-2005 [306] | not that you shouldn't do it, but I use http://www.rebol.it/~steel/retools/remark/ |
BrianW 13-Aug-2005 [307] | I'm <>-phobic ;-) |
Graham 13-Aug-2005 [308] | Remark takes care of all of those taggy things |
BrianW 13-Aug-2005 [309] | and all of my pages are already in textile format, and I think a few of my friends would be more interested in Rebol if I had a textile parser for them |
Graham 13-Aug-2005 [310] | what's textile ? A type of fabric ? |
BrianW 13-Aug-2005 [311] | http://hobix.com/textile/ |
Graham 13-Aug-2005 [312] | A structured text variant ... |
BrianW 13-Aug-2005 [313x2] | yep |
I like some of the different structured text formatting systems | |
shadwolf 14-Aug-2005 [315] | Volker thank you it works great now and the code rule is tiny ;) |
Volker 14-Aug-2005 [316] | :) |
BrianW 18-Aug-2005 [317] | Any parse suggestions for trying to find #"(" without a matching #")" in text that might also have proper pairs of parens? |
Henrik 18-Aug-2005 [318] | you probably need to count them and see where you end up after finding all parens. I'm not sure if it can be used to see which are missing... |
BrianW 18-Aug-2005 [319] | That would probably work fine. This if for the textile parser, where a declaration like "p(." means a paragraph with left margin of 1em, repeated for additional ems of margin. Counting will be quite useful. |
Henrik 18-Aug-2005 [320] | count one up on #"(" and one down on #")". If correct, the end result is zero. |
BrianW 18-Aug-2005 [321x3] | thanks |
Perfect, Henrik. That took me exactly where I needed to go for this feature. | |
Gonna have to work on my test-simple.r script soon to provide better summaries. The number of tests that are passing in this thing is getting rather large! | |
BrianW 22-Aug-2005 [324] | Any tips on how to convert " *text* " to " <strong>text</strong>"? |
Sunanda 22-Aug-2005 [325] | One way: replace text "*" <strong> replace text "*" </strong> If there are multiple pairs of "*", repeat in a loop until the length no longer changes |
Graham 22-Aug-2005 [326x2] | You should look at make-doc text to see how it parses stuff. I believe it's a similar problem. |
source not text. | |
Geomol 22-Aug-2005 [328] | Brian, you can look at my NicomDoc format http://home.tiscali.dk/john.niclasen/nicomdoc/ Look for the 'magic' in "nicomdoc.r", where you'll find rules for such things. (I guess, you have to handle multiple ****. |
BrianW 22-Aug-2005 [329] | ah, thanks. Sunanda, that solution won't quite work if a #"*" appears without a match. I'll go look at NicomDoc |
BrianH 22-Aug-2005 [330x2] | parse/all data [any [to "*" a: skip b: to "*" c: skip d: :a (change/part a rejoin ["<strong>" copy/part b c "</strong>"] d)] to end] |
You can make it a little more complicated to add more markup types, but the basic structure is the same. The trick is the :a before the paren - otherwise it won't work, and you can crash older versions of REBOL. | |
Tomc 22-Aug-2005 [332x2] | something along the lines of (untested) |
;;; make the word set more restrictive if no space etc ;;; but this is most permissive for your example word: complement charset "*" rule: [ skip to "*" [copy item some word "*"(append output join[<tag> item </tag>])] | skip | |
BrianW 22-Aug-2005 [334x2] | That works nicely too! I'll look more at NicomDoc later, but BrianH's tip makes tests for "*test*" and "*test" pass |
I'll have to explore Tomc's solution when I get back from my meeting. Thanks, folks | |
BrianH 22-Aug-2005 [336x5] | markup-chars: charset "*~" non-markup: complement markup-chars tag1: ["*" "<strong>" "~" "<i>"] tag2: ["*" "</strong>" "~" "</i>"] parse/all data [ any non-markup any [ ["*" a: skip b: to "*" c: skip d: | "~" a: skip b: to "~" c: skip d: ] :a ( change/part a rejoin [ select tag1 copy/part a b copy/part b c select tag2 copy/part c d ] d ) any non-markup ] to end ] |
No nesting, but with a little recursion and different start and end tags, this can be adapted to handle that too. | |
If you want to determine whether there have been any replacements, change the second any to some and parse will return true only when replacements have been made. Be careful to avois use of the markup characters in your replacement text. | |
avios: avoid | |
Whoops, an error. Change: ["*" a: skip b: to "*" c: skip d: | "~" a: skip b: to "~" c: skip d: ] :a ( to: [a: "*" b: to "*" c: skip d: | a: "~" b: to "~" c: skip d: ] :a ( Silly me :( | |
Tomc 22-Aug-2005 [341] | w: complement charset "*" rule: [ to "*" here: "*" opt[ copy item some w "*" there: (change/part :here join "" [<strong> item </strong>] :there) ] ] parse/all str [some rule] |
BrianH 22-Aug-2005 [342] | Tomc, that will crash older versions of REBOL, and not work on newer versions. You need to reset the parse position to before the change, before the paren where you make the change. Otherwise parse will be referencing a point off the end of the string at the end of the paren, before you can reset it. This used to crash REBOL so bad the interpreter disappeared. |
Tomc 22-Aug-2005 [343] | brianh please supply a str that fails on current versions, so I can see what you mean |
BrianH 22-Aug-2005 [344] | To fix your example, put a :here after the first there: in your rule. |
Tomc 22-Aug-2005 [345] | still havent found a string that fails , trying all the combos of *'s at the beginning end , middle ... |
BrianH 22-Aug-2005 [346] | In your case you might not have a crash, because you are replacing a short text with a longer one. Still, it's good to remember that bug for future reference. It really tripped me up when I first came across it, back when it still used to crash REBOL. |
Tomc 22-Aug-2005 [347] | yes, shortening the string you are parsing would pull the rug out from under the interperter, (and I was aware that the string was being lengthened) note: setting the parse pointer back to :here will position you before the "*" you may be better off with :here skip to gaurentee progress in the case the change fails |
BrianH 22-Aug-2005 [348x5] | OK, I tried this: parse "abc" [to "bc" a: "bc" (change/part a "b" 2)] It returns true on View 1.3 and Core 2.6, but false on View 1.2 and Core 2.5.0. |
If the change fails it will throw an error. The trick is to put off the paren performing the change until you have gone through enough rules to ensure that the paren contents will succeed. | |
Remember, for many platforms, Core 2.5.0 is the current version. | |
Here's a simplified version of my example that can handle multiple instances of multiple markup types and be adapted to different end tags (thanks Tomc for the idea!): markup-chars: charset "*~" non-markup: complement markup-chars tag1: ["*" "<strong>" "~" "<i>"] tag2: ["*" "</strong>" "~" "</i>"] parse/all data [ any non-markup any [ ; This next block can be generated if you have many markup types... [a: copy b "*" copy c to "*" copy d "*" e: | a: copy b "~" copy c to "~" copy d "~" e: ] :a (change/part a rejoin [tag1/:b c tag2/:d] e) any non-markup ] to end ] | |
Tomc: "you may be better off with :here skip to gaurentee progress" Put the skip after the paren and I may agree with you there. Of course you would skip the number of chars in the replacement text then. | |
older newer | first last |