r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

BrianH
29-Apr-2011
[5711x3]
TO and THRU were changed to support multi-rules, so they aren't really 
comparable to their R2 versions. And there are some bugs in the implementation 
where some rules that don't match the acceptable syntax are just 
treated as not matching instead of triggering an error the way they 
should. This has made it difficult to properly document their current 
behavior.
PARSE is definitely something I wish was more open, because there 
are bugs I would like to fix.
I think that there is no direct equivalent in R3 to R2's TO/THRU 
inline block. R3's TO/THRU inline block treats the block as its sub-dialect 
for TO/THRU multi, and that doesn't allow complex values or more 
than one value in a single alternate. The direct R3 equivalent of 
what you are requesting would be this, but it doesn't work:

>> parse [a b c [d e f] g h i] [to [[d e f]] mark: (probe mark) to 
end]
** Script error: PARSE - invalid rule or usage of rule: [d e f]

Instead you have to do a trick with to block! in a loop and then 
match the block to quote [d e f] explicitly, keeping looking if it 
doesn't match. It's annoying.
Geomol
29-Apr-2011
[5714]
PARSE is definitely something I wish was more open

I have done a bit of work on a function version of PARSE. Maybe having 
PARSE as a normal REBOL function could help in fixing bugs? My version 
is not quite ready to publish. Are there a set of PARSE tests somewhere, 
that I could test my version against? I would prefer R2 tests to 
start with. I'm doing my own tests, but maybe we have a more complete 
set of tests somewhere, like in the R3-alpha world (I think, was 
the name), where we did a lot of tests on different things.
onetom
29-Apr-2011
[5715]
I would be happy to use a function! version of PARSE since i never 
had to do time critical parsing.
Maxim
29-Apr-2011
[5716x2]
did you do any kind of speed differences?
(tests)
Geomol
29-Apr-2011
[5718x3]
not yet, I maybe could do a quick test...
>> dt [loop 100000 [bparse [a b c] ['a 'b 'c]]]
== 0:00:00.965689
>> dt [loop 100000 [parse [a b c] ['a 'b 'c]]] 
== 0:00:00.235949

bparse is my block parse function.
>> dt [loop 10000 [bparse [a b c a b c] [2 thru 'b 'c]]]
== 0:00:00.133237
>> dt [loop 10000 [parse [a b c a b c] [2 thru 'b 'c]]] 
== 0:00:00.029891

So a factor 4 or so.
Maxim
29-Apr-2011
[5721]
not bad actually.
Ladislav
30-Apr-2011
[5722]
Geomol: "Are there a set of PARSE tests somewhere, that I could test 
my version against?" - there are the core tests at

https://github.com/rebolsource/rebol-test


, that contain a couple of PARSE tests in the functions/series/parse.r 
section. It would be nice if you added some tests.
Geomol
30-Apr-2011
[5723]
Thanks, I'll look into it.
Geomol
1-May-2011
[5724]
What's the opinion on this?

>> parse [a b] [set w ['a 'b]]
== true
>> ? w                      
W is a word of value: a

It seems to work the same as: parse [a b] [set w 'a 'b]
Same in R2 and R3.
BrianH
1-May-2011
[5725]
It seems like an error that is improperly not triggered. SET is supposed 
to set to a single value, not a series of values - an embedded block 
is a single value.
Ladislav
1-May-2011
[5726x2]
I think it is OK. Set just sets the word to the first value matched.
I do not think it makes any sense to trigger an error.
BrianH
1-May-2011
[5728x2]
It doesn't make sense to trigger an error if the data is weird, but 
triggering errors if the rules are weird is critical for debugging, 
especially for generated rules. Triggered errors are the programmer's 
best friend - that's the R3 policy.
For instance, R3's TO and THRU are extremely difficult to debug right 
now because they don't trigger most of the errors they should trigger.
Ladislav
1-May-2011
[5730x2]
This is a simple rule:

set w rule

sets the word 'w to the first value matched. No error.
It is quite obvious what the first value matched is.
onetom
1-May-2011
[5732x2]
so, no way to match a complex rule?
s/match/set
Ladislav
1-May-2011
[5734]
RULE might be complex, but what is so strange about setting 'w to 
the first value matched?
onetom
1-May-2011
[5735x2]
it's not transparent what is the 1st value if 'rule is defined somewhere 
else and not inlined
imagine, i define "my own type", like address!
BrianH
1-May-2011
[5737]
That is not the error I was talking about. This is the error:
>> parse [a b] [set w ['a 'b]]
== true
>> ? w                      
W is a word of value: a


It is the attempt to set the value to a complex rule that is the 
error. It wouldn't be an error to do this: parse [a b] [set w 'a 
'b]

If we keep the current behavior, there needs to be a lot of strongly 
worded warnings about the potential gotcha in the PARSE SET docs.
Ladislav
1-May-2011
[5738x3]
It does not matter where the RULE is defined. The first value matched 
is the value at the current position of the cursor, if the match 
occurs, that is.
As said, it does not matter what the RULE is. The first value is 
the first value.
You cannot find anywhere a formulation like "set the value to a rule". 
That is not what can happen.
onetom
1-May-2011
[5741]
address! [string! tuple! hash!]
parse ["cat" 1.2.3 #4] [set addr1 address!]

if im reading this address! looks just like a value reference...
BrianH
1-May-2011
[5742]
It could be considered a useful feature, since the whole match needs 
to match before the SET is performed. However, the docs need to be 
*extremely* precise about this because the "set the value to a rule" 
interpretation is a common misconception among newbie PARSE writers. 
It would be good for the docs to give an example of the type of code 
this allows you to do, explaining the difference.
onetom
1-May-2011
[5743x2]
>> parse [1 2] [set x [integer! issue!]] 
== false
>> x                                    
== 1


this is not i would expect for sure... what are the brackets for 
if they don't have any effect?...
it gives
** Script error: x has no value
in r3 though
Geomol
1-May-2011
[5745]
But we have COPY to do, what you want, if I understand:

>> address!: [string! tuple! issue!]           
== [string! tuple! issue!]
>> parse ["cat" 1.2.3 #4] [copy addr1 address!]
== true
>> addr1
== ["cat" 1.2.3 #4]
BrianH
1-May-2011
[5746]
onetom, It's not "set the variable to a rule", it is instead "match 
a rule then set the variable to the value at the current position 
in the data".
Ladislav
1-May-2011
[5747]
as stated in the "Idioms" section, I think, that

    a: [set b c]

shall be equivalent to:

    f: [(set/any [b] if lesser? index? e index? d [e])]
    a: [and [c d:] e: f :d]
BrianH
1-May-2011
[5748x2]
By "It could be considered a useful feature" I mean that if this 
were not allowed, you would have to write this:
    parse data [set x ['a 'b]]
like this instead:
    parse data [and ['a 'b] set x skip skip]
Sorry, that's what Ladislav said, with different phrasing.
Ladislav
1-May-2011
[5750]
Yes, Brian, you just wrote it in a simpler way
BrianH
1-May-2011
[5751x2]
I was thinking of phrasing for the examples in the docs to explain 
the SET feature.
Those full equivalences are great for someone who really needs to 
know how things work internally (such as when they need to clone 
PARSE), but you need simple examples first in docs for people who 
just want to use PARSE properly. Btw, has anyone started a set of 
full PARSE docs in DocBase? The parse project page could be raided 
for information, but it really doesn't serve as a full parse manual.
Ladislav
1-May-2011
[5753]
Do I understand correctly, that you did not read the


http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse

article yet?
BrianH
1-May-2011
[5754]
I haven't read the REBOL wikibook yet.
Ladislav
1-May-2011
[5755]
This is the formulation used to document the SET directive:


If the subexpression match succeeds, the set operation sets the given 
variable to the first matched value, while the copy operation copies 
the whole part of the input matched by the given subexpression. For 
a more detailed description see the Parse idioms section.
BrianH
1-May-2011
[5756]
Sounds accurate, if a little intimidating to newbies. I wish we had 
a really good PARSE manual that could turn newbies into experts.
Ladislav
1-May-2011
[5757]
Any newbies feeling intimidated by the formulation?
BrianH
1-May-2011
[5758]
It could be fine, depending on where you put it in the manual. The 
early parts would need to explain the terminology so the latter parts 
can use it.
Ladislav
1-May-2011
[5759]
Well, certainly it should be read, otherwise it is useless.
Geomol
1-May-2011
[5760]
PARSE in R2 seems to have less support for combined keywords than 
R3, as can be seen in this example:

>> parse [] [opt some 'a]
** Script Error: Invalid argument: some

But there is no error, when combining OPT and THRU:

>> parse [] [opt thru 'a]
== false


Should that trigger an error? If no error, it should return true, 
right?