World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Ladislav 27-Apr-2011 [5690] | Yes, John, handling of such values has been discussed a while ago. That is why in R3 the QUOTE directive has been defined. |
Geomol 28-Apr-2011 [5691x2] | Nice! |
In http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse/Parse_expressions#Parse_idioms The idiom Description: "Range of times operator" Operation: a: [m n b] Idiom: a: [m b (k: n - m) [k [b | c: fail] | :c]] only seem to be true, when n >= m. When n < m, parse works as if the rule was a: [n b] | |
Ladislav 28-Apr-2011 [5693x4] | That is somewhat surprising, do you see any difference? |
(I don't) | |
aha, sorry, you are right | |
Corrected, should be better now. | |
Sunanda 29-Apr-2011 [5697] | Can an R2 parse expert help me with an efficient parse, please? I've got a set of bbcode-type tags, eg: tags: [ "[a]" "[b]" "[cc]" ] And I've got a data string that includes those (and other) tags, eg: data: "xxxx[a]aa aa[b]xxxx[a] yyyy[d]yyy[cc]dd[e]ddd[b][A]zz[zz" What I'd like is the data string split at the designated tags, eg: [ "[a]" "aa aa" "[b]" "xxxx" "[a]" " yyyy[d]yyy" "[cc]" "dd[e]ddd" "[b]" "" "[A]" "zz[zz" ] Thanks! |
Maxim 29-Apr-2011 [5698] | rebol [] =tags=: [ "[a]" | "[b]" | "[cc]" ] data: "xxxx[a]aa aa[b]xxxx[a] yyyy[d]yyy[cc]dd[e]ddd[b][A]zz[zz" blk: [] parse/all data [ start: any [ here: copy tag =tags= there: ( append blk copy/part start here append blk tag ) start: | skip ] (append blk start) ] ?? blk ask "" |
Steeve 29-Apr-2011 [5699] | should be better including [ to "[" ] at the right place |
Maxim 29-Apr-2011 [5700x2] | no since notice that he's not loading all [tags] just those he really wants. |
(maybe I misunderstood why you'd want a [ to "[" ] :-) | |
Steeve 29-Apr-2011 [5702x2] | even if, just replace >skip by > skip opt to "[" (not tested) |
better in the sense: faster | |
Sunanda 29-Apr-2011 [5704] | Steeve, that looks good, thanks! Only difference from my "expected results" is that you've also returned the "pre-tag" "xxxx" .... that's okay -- incidental issues like that are completely negotiable in the search for a solution. |
Steeve 29-Apr-2011 [5705x2] | >[skip to "[" | to end] should be even better |
> [skip to "[" | end skip] skip an extra loop by exiting with a fail | |
Maxim 29-Apr-2011 [5707] | sunanda, wrt first elemetn, I thought it was a typo on your part ;-) |
Sunanda 29-Apr-2011 [5708] | :) --- in the real-life app, I'd insert a dummy tag at the start to hoover up any pre-tag data. |
Steeve 29-Apr-2011 [5709] | Maxim can alter its parser to avoid such ack, easly task :-) |
Geomol 29-Apr-2011 [5710] | In R2: >> parse [a b c [d e f] g h i] [to [d e f] mark: (probe mark) to end] [[d e f] g h i] == true Here the block after TO isn't a sub-rule, but a value to search for (a block of words). Doing the same in R3: >> parse [a b c [d e f] g h i] [to [d e f] mark: (probe mark) to end] ** Script error: PARSE - invalid rule or usage of rule: e Is the block a sub-rule here? I've tried to search the docs, but haven't found an explanation. |
BrianH 29-Apr-2011 [5711x3] | TO and THRU were changed to support multi-rules, so they aren't really comparable to their R2 versions. And there are some bugs in the implementation where some rules that don't match the acceptable syntax are just treated as not matching instead of triggering an error the way they should. This has made it difficult to properly document their current behavior. |
PARSE is definitely something I wish was more open, because there are bugs I would like to fix. | |
I think that there is no direct equivalent in R3 to R2's TO/THRU inline block. R3's TO/THRU inline block treats the block as its sub-dialect for TO/THRU multi, and that doesn't allow complex values or more than one value in a single alternate. The direct R3 equivalent of what you are requesting would be this, but it doesn't work: >> parse [a b c [d e f] g h i] [to [[d e f]] mark: (probe mark) to end] ** Script error: PARSE - invalid rule or usage of rule: [d e f] Instead you have to do a trick with to block! in a loop and then match the block to quote [d e f] explicitly, keeping looking if it doesn't match. It's annoying. | |
Geomol 29-Apr-2011 [5714] | PARSE is definitely something I wish was more open I have done a bit of work on a function version of PARSE. Maybe having PARSE as a normal REBOL function could help in fixing bugs? My version is not quite ready to publish. Are there a set of PARSE tests somewhere, that I could test my version against? I would prefer R2 tests to start with. I'm doing my own tests, but maybe we have a more complete set of tests somewhere, like in the R3-alpha world (I think, was the name), where we did a lot of tests on different things. |
onetom 29-Apr-2011 [5715] | I would be happy to use a function! version of PARSE since i never had to do time critical parsing. |
Maxim 29-Apr-2011 [5716x2] | did you do any kind of speed differences? |
(tests) | |
Geomol 29-Apr-2011 [5718x3] | not yet, I maybe could do a quick test... |
>> dt [loop 100000 [bparse [a b c] ['a 'b 'c]]] == 0:00:00.965689 >> dt [loop 100000 [parse [a b c] ['a 'b 'c]]] == 0:00:00.235949 bparse is my block parse function. | |
>> dt [loop 10000 [bparse [a b c a b c] [2 thru 'b 'c]]] == 0:00:00.133237 >> dt [loop 10000 [parse [a b c a b c] [2 thru 'b 'c]]] == 0:00:00.029891 So a factor 4 or so. | |
Maxim 29-Apr-2011 [5721] | not bad actually. |
Ladislav 30-Apr-2011 [5722] | Geomol: "Are there a set of PARSE tests somewhere, that I could test my version against?" - there are the core tests at https://github.com/rebolsource/rebol-test , that contain a couple of PARSE tests in the functions/series/parse.r section. It would be nice if you added some tests. |
Geomol 30-Apr-2011 [5723] | Thanks, I'll look into it. |
Geomol 1-May-2011 [5724] | What's the opinion on this? >> parse [a b] [set w ['a 'b]] == true >> ? w W is a word of value: a It seems to work the same as: parse [a b] [set w 'a 'b] Same in R2 and R3. |
BrianH 1-May-2011 [5725] | It seems like an error that is improperly not triggered. SET is supposed to set to a single value, not a series of values - an embedded block is a single value. |
Ladislav 1-May-2011 [5726x2] | I think it is OK. Set just sets the word to the first value matched. |
I do not think it makes any sense to trigger an error. | |
BrianH 1-May-2011 [5728x2] | It doesn't make sense to trigger an error if the data is weird, but triggering errors if the rules are weird is critical for debugging, especially for generated rules. Triggered errors are the programmer's best friend - that's the R3 policy. |
For instance, R3's TO and THRU are extremely difficult to debug right now because they don't trigger most of the errors they should trigger. | |
Ladislav 1-May-2011 [5730x2] | This is a simple rule: set w rule sets the word 'w to the first value matched. No error. |
It is quite obvious what the first value matched is. | |
onetom 1-May-2011 [5732x2] | so, no way to match a complex rule? |
s/match/set | |
Ladislav 1-May-2011 [5734] | RULE might be complex, but what is so strange about setting 'w to the first value matched? |
onetom 1-May-2011 [5735x2] | it's not transparent what is the 1st value if 'rule is defined somewhere else and not inlined |
imagine, i define "my own type", like address! | |
BrianH 1-May-2011 [5737] | That is not the error I was talking about. This is the error: >> parse [a b] [set w ['a 'b]] == true >> ? w W is a word of value: a It is the attempt to set the value to a complex rule that is the error. It wouldn't be an error to do this: parse [a b] [set w 'a 'b] If we keep the current behavior, there needs to be a lot of strongly worded warnings about the potential gotcha in the PARSE SET docs. |
Ladislav 1-May-2011 [5738x2] | It does not matter where the RULE is defined. The first value matched is the value at the current position of the cursor, if the match occurs, that is. |
As said, it does not matter what the RULE is. The first value is the first value. | |
older newer | first last |