r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Geomol
27-Apr-2011
[5687]
Found a trick to parse integers in blocks. Let's say, I want to parse 
this block: [year 2011]

The rule can't be ['year 2011], because 2011 in this case is a counter 
for number of next element (none here). So normally, I would do something 
like ['year set y integer! ( ... )] and checking the y variable and 
create a fail rule, in case it's not 2011. But this is the trick:

>> parse [year 2011] ['year 1 1 2011]
== true


Two numbers mean repeat the next pattern a number of times, and in 
this case, the pattern can be an integer itself.
onetom
27-Apr-2011
[5688]
:) nice
Gregg
27-Apr-2011
[5689]
I wouldn't call it a trick John, just a non-obvious syntax. I haven't 
used it much, but I wrote a func a long time ago when I needed it 
for something.

literalize-int-rules: func [template /local mark] [
; Turn a single integer value into a quantity-of-one integer
; rule for parse (e.g. 1 becomes 1 1 1, 4 becomes 1 1 4).
	rule: [
		any [
			into rule
			| mark: integer! (insert mark [1 1]) 2 skip 
			| skip
		]
	]
	parse template rule
	template
]
Ladislav
27-Apr-2011
[5690]
Yes, John, handling of such values has been discussed a while ago. 
That is why in R3 the QUOTE directive has been defined.
Geomol
28-Apr-2011
[5691x2]
Nice!
In http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse/Parse_expressions#Parse_idioms

The idiom
Description: "Range of times operator"
Operation: 	a: [m n b]
Idiom:	a: [m b (k: n - m) [k [b | c: fail] | :c]]


only seem to be true, when n >= m. When n < m, parse works as if 
the rule was
a: [n b]
Ladislav
28-Apr-2011
[5693x4]
That is somewhat surprising, do you see any difference?
(I don't)
aha, sorry, you are right
Corrected, should be better now.
Sunanda
29-Apr-2011
[5697]
Can an R2 parse expert help me with an efficient parse, please?

I've got a set of bbcode-type tags, eg:
    tags: [ "[a]" "[b]" "[cc]" ] 
    

And I've got a data string that includes those (and other) tags, 
eg:

    data: "xxxx[a]aa aa[b]xxxx[a] yyyy[d]yyy[cc]dd[e]ddd[b][A]zz[zz"


What I'd like is the data string split at the designated tags, eg:

    [ "[a]" "aa aa" "[b]" "xxxx" "[a]" " yyyy[d]yyy" "[cc]" "dd[e]ddd" 
    "[b]" "" "[A]" "zz[zz" ]
    
Thanks!
Maxim
29-Apr-2011
[5698]
rebol []

=tags=: [ "[a]" | "[b]" | "[cc]" ] 
    
data: "xxxx[a]aa aa[b]xxxx[a] yyyy[d]yyy[cc]dd[e]ddd[b][A]zz[zz"


blk: []
parse/all data [
	start:
	any [
		here: copy tag =tags= there: (
			append blk copy/part start here
			append blk tag
		) start:

		| skip
	]
	(append blk start)
]

?? blk

ask ""
Steeve
29-Apr-2011
[5699]
should be better including [ to "[" ] at the right place
Maxim
29-Apr-2011
[5700x2]
no since notice that he's not loading all [tags]  just those he really 
wants.
(maybe I misunderstood why you'd want a [ to "[" ] :-)
Steeve
29-Apr-2011
[5702x2]
even if, just replace 
>skip
by 
> skip opt to "["
(not tested)
better in the sense: faster
Sunanda
29-Apr-2011
[5704]
Steeve, that looks good, thanks!


Only difference from my "expected results" is that you've also returned 
the "pre-tag" "xxxx" .... that's okay -- incidental issues like that 
are completely negotiable in the search for a solution.
Steeve
29-Apr-2011
[5705x2]
>[skip to "[" | to end]
should be even better
> [skip to "[" | end skip]
skip an extra loop by exiting with a fail
Maxim
29-Apr-2011
[5707]
sunanda, wrt first elemetn, I thought it was a typo on your part 
 ;-)
Sunanda
29-Apr-2011
[5708]
:) --- in the real-life app, I'd insert a dummy tag at the start 
to hoover up any pre-tag data.
Steeve
29-Apr-2011
[5709]
Maxim can alter its parser to avoid such ack, easly task :-)
Geomol
29-Apr-2011
[5710]
In R2:

>> parse [a b c [d e f] g h i] [to [d e f] mark: (probe mark) to 
end]
[[d e f] g h i]
== true


Here the block after TO isn't a sub-rule, but a value to search for 
(a block of words). Doing the same in R3:

>> parse [a b c [d e f] g h i] [to [d e f] mark: (probe mark) to 
end]    
** Script error: PARSE - invalid rule or usage of rule: e


Is the block a sub-rule here? I've tried to search the docs, but 
haven't found an explanation.
BrianH
29-Apr-2011
[5711x3]
TO and THRU were changed to support multi-rules, so they aren't really 
comparable to their R2 versions. And there are some bugs in the implementation 
where some rules that don't match the acceptable syntax are just 
treated as not matching instead of triggering an error the way they 
should. This has made it difficult to properly document their current 
behavior.
PARSE is definitely something I wish was more open, because there 
are bugs I would like to fix.
I think that there is no direct equivalent in R3 to R2's TO/THRU 
inline block. R3's TO/THRU inline block treats the block as its sub-dialect 
for TO/THRU multi, and that doesn't allow complex values or more 
than one value in a single alternate. The direct R3 equivalent of 
what you are requesting would be this, but it doesn't work:

>> parse [a b c [d e f] g h i] [to [[d e f]] mark: (probe mark) to 
end]
** Script error: PARSE - invalid rule or usage of rule: [d e f]

Instead you have to do a trick with to block! in a loop and then 
match the block to quote [d e f] explicitly, keeping looking if it 
doesn't match. It's annoying.
Geomol
29-Apr-2011
[5714]
PARSE is definitely something I wish was more open

I have done a bit of work on a function version of PARSE. Maybe having 
PARSE as a normal REBOL function could help in fixing bugs? My version 
is not quite ready to publish. Are there a set of PARSE tests somewhere, 
that I could test my version against? I would prefer R2 tests to 
start with. I'm doing my own tests, but maybe we have a more complete 
set of tests somewhere, like in the R3-alpha world (I think, was 
the name), where we did a lot of tests on different things.
onetom
29-Apr-2011
[5715]
I would be happy to use a function! version of PARSE since i never 
had to do time critical parsing.
Maxim
29-Apr-2011
[5716x2]
did you do any kind of speed differences?
(tests)
Geomol
29-Apr-2011
[5718x3]
not yet, I maybe could do a quick test...
>> dt [loop 100000 [bparse [a b c] ['a 'b 'c]]]
== 0:00:00.965689
>> dt [loop 100000 [parse [a b c] ['a 'b 'c]]] 
== 0:00:00.235949

bparse is my block parse function.
>> dt [loop 10000 [bparse [a b c a b c] [2 thru 'b 'c]]]
== 0:00:00.133237
>> dt [loop 10000 [parse [a b c a b c] [2 thru 'b 'c]]] 
== 0:00:00.029891

So a factor 4 or so.
Maxim
29-Apr-2011
[5721]
not bad actually.
Ladislav
30-Apr-2011
[5722]
Geomol: "Are there a set of PARSE tests somewhere, that I could test 
my version against?" - there are the core tests at

https://github.com/rebolsource/rebol-test


, that contain a couple of PARSE tests in the functions/series/parse.r 
section. It would be nice if you added some tests.
Geomol
30-Apr-2011
[5723]
Thanks, I'll look into it.
Geomol
1-May-2011
[5724]
What's the opinion on this?

>> parse [a b] [set w ['a 'b]]
== true
>> ? w                      
W is a word of value: a

It seems to work the same as: parse [a b] [set w 'a 'b]
Same in R2 and R3.
BrianH
1-May-2011
[5725]
It seems like an error that is improperly not triggered. SET is supposed 
to set to a single value, not a series of values - an embedded block 
is a single value.
Ladislav
1-May-2011
[5726x2]
I think it is OK. Set just sets the word to the first value matched.
I do not think it makes any sense to trigger an error.
BrianH
1-May-2011
[5728x2]
It doesn't make sense to trigger an error if the data is weird, but 
triggering errors if the rules are weird is critical for debugging, 
especially for generated rules. Triggered errors are the programmer's 
best friend - that's the R3 policy.
For instance, R3's TO and THRU are extremely difficult to debug right 
now because they don't trigger most of the errors they should trigger.
Ladislav
1-May-2011
[5730x2]
This is a simple rule:

set w rule

sets the word 'w to the first value matched. No error.
It is quite obvious what the first value matched is.
onetom
1-May-2011
[5732x2]
so, no way to match a complex rule?
s/match/set
Ladislav
1-May-2011
[5734]
RULE might be complex, but what is so strange about setting 'w to 
the first value matched?
onetom
1-May-2011
[5735x2]
it's not transparent what is the 1st value if 'rule is defined somewhere 
else and not inlined
imagine, i define "my own type", like address!