World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Geomol 27-Apr-2011 [5687]	Found a trick to parse integers in blocks. Let's say, I want to parse this block: [year 2011] The rule can't be ['year 2011], because 2011 in this case is a counter for number of next element (none here). So normally, I would do something like ['year set y integer! ( ... )] and checking the y variable and create a fail rule, in case it's not 2011. But this is the trick: >> parse [year 2011] ['year 1 1 2011] == true Two numbers mean repeat the next pattern a number of times, and in this case, the pattern can be an integer itself.
onetom 27-Apr-2011 [5688]	:) nice
Gregg 27-Apr-2011 [5689]	I wouldn't call it a trick John, just a non-obvious syntax. I haven't used it much, but I wrote a func a long time ago when I needed it for something. literalize-int-rules: func [template /local mark] [ ; Turn a single integer value into a quantity-of-one integer ; rule for parse (e.g. 1 becomes 1 1 1, 4 becomes 1 1 4). rule: [ any [ into rule \| mark: integer! (insert mark [1 1]) 2 skip \| skip ] ] parse template rule template ]
Ladislav 27-Apr-2011 [5690]	Yes, John, handling of such values has been discussed a while ago. That is why in R3 the QUOTE directive has been defined.
Geomol 28-Apr-2011 [5691x2]	Nice!
Geomol 28-Apr-2011 [5691x2]	In http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse/Parse_expressions#Parse_idioms The idiom Description: "Range of times operator" Operation: a: [m n b] Idiom: a: [m b (k: n - m) [k [b \| c: fail] \| :c]] only seem to be true, when n >= m. When n < m, parse works as if the rule was a: [n b]
Ladislav 28-Apr-2011 [5693x4]	That is somewhat surprising, do you see any difference?
	(I don't)
	aha, sorry, you are right
	Corrected, should be better now.
Sunanda 29-Apr-2011 [5697]	Can an R2 parse expert help me with an efficient parse, please? I've got a set of bbcode-type tags, eg: tags: [ "[a]" "[b]" "[cc]" ] And I've got a data string that includes those (and other) tags, eg: data: "xxxx[a]aa aa[b]xxxx[a] yyyy[d]yyy[cc]dd[e]ddd[b][A]zz[zz" What I'd like is the data string split at the designated tags, eg: [ "[a]" "aa aa" "[b]" "xxxx" "[a]" " yyyy[d]yyy" "[cc]" "dd[e]ddd" "[b]" "" "[A]" "zz[zz" ] Thanks!
Maxim 29-Apr-2011 [5698]	rebol [] =tags=: [ "[a]" \| "[b]" \| "[cc]" ] data: "xxxx[a]aa aa[b]xxxx[a] yyyy[d]yyy[cc]dd[e]ddd[b][A]zz[zz" blk: [] parse/all data [ start: any [ here: copy tag =tags= there: ( append blk copy/part start here append blk tag ) start: \| skip ] (append blk start) ] ?? blk ask ""
Steeve 29-Apr-2011 [5699]	should be better including [ to "[" ] at the right place
Maxim 29-Apr-2011 [5700x2]	no since notice that he's not loading all [tags] just those he really wants.
Maxim 29-Apr-2011 [5700x2]	(maybe I misunderstood why you'd want a [ to "[" ] :-)
Steeve 29-Apr-2011 [5702x2]	even if, just replace >skip by > skip opt to "[" (not tested)
Steeve 29-Apr-2011 [5702x2]	better in the sense: faster
Sunanda 29-Apr-2011 [5704]	Steeve, that looks good, thanks! Only difference from my "expected results" is that you've also returned the "pre-tag" "xxxx" .... that's okay -- incidental issues like that are completely negotiable in the search for a solution.
Steeve 29-Apr-2011 [5705x2]	>[skip to "[" \| to end] should be even better
Steeve 29-Apr-2011 [5705x2]	> [skip to "[" \| end skip] skip an extra loop by exiting with a fail
Maxim 29-Apr-2011 [5707]	sunanda, wrt first elemetn, I thought it was a typo on your part ;-)
Sunanda 29-Apr-2011 [5708]	:) --- in the real-life app, I'd insert a dummy tag at the start to hoover up any pre-tag data.
Steeve 29-Apr-2011 [5709]	Maxim can alter its parser to avoid such ack, easly task :-)
Geomol 29-Apr-2011 [5710]	In R2: >> parse [a b c [d e f] g h i] [to [d e f] mark: (probe mark) to end] [[d e f] g h i] == true Here the block after TO isn't a sub-rule, but a value to search for (a block of words). Doing the same in R3: >> parse [a b c [d e f] g h i] [to [d e f] mark: (probe mark) to end] ** Script error: PARSE - invalid rule or usage of rule: e Is the block a sub-rule here? I've tried to search the docs, but haven't found an explanation.
BrianH 29-Apr-2011 [5711x3]	TO and THRU were changed to support multi-rules, so they aren't really comparable to their R2 versions. And there are some bugs in the implementation where some rules that don't match the acceptable syntax are just treated as not matching instead of triggering an error the way they should. This has made it difficult to properly document their current behavior.
	PARSE is definitely something I wish was more open, because there are bugs I would like to fix.
	I think that there is no direct equivalent in R3 to R2's TO/THRU inline block. R3's TO/THRU inline block treats the block as its sub-dialect for TO/THRU multi, and that doesn't allow complex values or more than one value in a single alternate. The direct R3 equivalent of what you are requesting would be this, but it doesn't work: >> parse [a b c [d e f] g h i] [to [[d e f]] mark: (probe mark) to end] ** Script error: PARSE - invalid rule or usage of rule: [d e f] Instead you have to do a trick with to block! in a loop and then match the block to quote [d e f] explicitly, keeping looking if it doesn't match. It's annoying.
Geomol 29-Apr-2011 [5714]	PARSE is definitely something I wish was more open I have done a bit of work on a function version of PARSE. Maybe having PARSE as a normal REBOL function could help in fixing bugs? My version is not quite ready to publish. Are there a set of PARSE tests somewhere, that I could test my version against? I would prefer R2 tests to start with. I'm doing my own tests, but maybe we have a more complete set of tests somewhere, like in the R3-alpha world (I think, was the name), where we did a lot of tests on different things.
onetom 29-Apr-2011 [5715]	I would be happy to use a function! version of PARSE since i never had to do time critical parsing.
Maxim 29-Apr-2011 [5716x2]	did you do any kind of speed differences?
Maxim 29-Apr-2011 [5716x2]	(tests)
Geomol 29-Apr-2011 [5718x3]	not yet, I maybe could do a quick test...
	>> dt [loop 100000 [bparse [a b c] ['a 'b 'c]]] == 0:00:00.965689 >> dt [loop 100000 [parse [a b c] ['a 'b 'c]]] == 0:00:00.235949 bparse is my block parse function.
	>> dt [loop 10000 [bparse [a b c a b c] [2 thru 'b 'c]]] == 0:00:00.133237 >> dt [loop 10000 [parse [a b c a b c] [2 thru 'b 'c]]] == 0:00:00.029891 So a factor 4 or so.
Maxim 29-Apr-2011 [5721]	not bad actually.
Ladislav 30-Apr-2011 [5722]	Geomol: "Are there a set of PARSE tests somewhere, that I could test my version against?" - there are the core tests at https://github.com/rebolsource/rebol-test , that contain a couple of PARSE tests in the functions/series/parse.r section. It would be nice if you added some tests.
Geomol 30-Apr-2011 [5723]	Thanks, I'll look into it.
Geomol 1-May-2011 [5724]	What's the opinion on this? >> parse [a b] [set w ['a 'b]] == true >> ? w W is a word of value: a It seems to work the same as: parse [a b] [set w 'a 'b] Same in R2 and R3.
BrianH 1-May-2011 [5725]	It seems like an error that is improperly not triggered. SET is supposed to set to a single value, not a series of values - an embedded block is a single value.
Ladislav 1-May-2011 [5726x2]	I think it is OK. Set just sets the word to the first value matched.
Ladislav 1-May-2011 [5726x2]	I do not think it makes any sense to trigger an error.
BrianH 1-May-2011 [5728x2]	It doesn't make sense to trigger an error if the data is weird, but triggering errors if the rules are weird is critical for debugging, especially for generated rules. Triggered errors are the programmer's best friend - that's the R3 policy.
BrianH 1-May-2011 [5728x2]	For instance, R3's TO and THRU are extremely difficult to debug right now because they don't trigger most of the errors they should trigger.
Ladislav 1-May-2011 [5730x2]	This is a simple rule: set w rule sets the word 'w to the first value matched. No error.
Ladislav 1-May-2011 [5730x2]	It is quite obvious what the first value matched is.
onetom 1-May-2011 [5732x2]	so, no way to match a complex rule?
onetom 1-May-2011 [5732x2]	s/match/set
Ladislav 1-May-2011 [5734]	RULE might be complex, but what is so strange about setting 'w to the first value matched?
onetom 1-May-2011 [5735x2]	it's not transparent what is the 1st value if 'rule is defined somewhere else and not inlined
onetom 1-May-2011 [5735x2]	imagine, i define "my own type", like address!
older newer	first last