r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Ladislav
27-Apr-2011
[5616x4]
Sorry, but it is not a question of opinion.
There may be just a correct implementation or a bug.
The fact, that there is no "advance one character" is quite obvious. 
Every rule matching advances as many characters as the rule being 
matched prescribes. For example, when matching

    parse "aaaaa" [before: "aaa" after: to end]
    index? before ; == 1
    index? after ; == 4


the rule matches a three character string and, therefore, the correct 
position after the match is three characters past the before position 
(not one character, as you incorrectly stated)
Being at it, the following case reveals a PARSE implementation bug:

     parse "aaaaa" ["" to end] ; == false

, since the empty string should match.
Maxim
27-Apr-2011
[5620x2]
first of all, your before after parse rule has nothing to do with 
the to/thru handling.


yes, this second rule shows an actual bug... "" and none rules are 
conceptually equivalent.
to/thru are not matching rules, they are skipping rules.  matching 
rules always return *past* the match. not *at* the match, like to 
will do.
Ladislav
27-Apr-2011
[5622x3]
OK, checking the situation of the NONE rule:

    parse "aaaaa" [before: none after: to end]
    index? before; == 1
    index? after ; == 1
So, the parse position before and after the match remained the same, 
not "one character past the match"
So, generally, the positions before the match and after the match 
can differ, optionally, but the difference is not prescribed to be 
exactly one character.
Maxim
27-Apr-2011
[5625x2]
no the cursor did not advance.   there are two concepts at play here... 
the notion of index, and the notion of "slots"   a single slot has 
two positions, its start and end, but it has only one index.
a better name for slot, probably is segment... just like in video 
editing.
Ladislav
27-Apr-2011
[5627]
Now, check the PARSE documentation at:

http://www.rebol.com/r3/docs/concepts/parsing-summary.html

and then we can continue the discussion
Maxim
27-Apr-2011
[5628]
a matching rule, will expand the segment's area, but not its index. 
 rules are stacked based on end-to segments.  if a rule has a segment 
of size 0 (as in the none rule) there is no index change in the next 
rule segment. i.e. it shares its index since its index is previous 
index + 0
Ladislav
27-Apr-2011
[5629x3]
to/thru are not matching rules

 - sorry, once again, an incorrect opinion. They can be used to match 
 the input like every other PARSE rule.
...and they can either succeed or fail
Exactly like the idiom a: [b | skip a], which you did not even try
Maxim
27-Apr-2011
[5632]
well... to/thru are listed alongside  skip under "skipping input" 
in both r2 and r3 docs...  they cannot use sub rules, since they 
only do a find on the input.
Ladislav
27-Apr-2011
[5633x4]
Did you really read the documentation?
You could notice: "thru rule" "scan forward in input for matching 
rules, advance input to tail of the match "
(no purported "one character")
And, the fact, that the rules are listed under "skipping input" does 
not support your incorrect "are not matching rules".
Maxim
27-Apr-2011
[5637x2]
yes, the new docs reflect what now happens in R3.
to/thru do not match subrules... they only do a find.
Ladislav
27-Apr-2011
[5639x4]
Which is the same which happens in R2, in fact, except for the bugs
to/thru do not match subrules

 - yes, that is a correct observation, although unrelated to the subject 
 of the discussion, and, actually, just a detail of the implementation, 
 that can easily change at any time, especially taking into account 
 the user preferences
Although, TO/THRU actually match subrules, when I think about it, 
just a limited set of them.
In string parsing, we can use either character subrules, or string 
subrules, in block parsing we can use value subrules, datatype subrules, 
and some other special subrules
Maxim
27-Apr-2011
[5643x4]
it really just does a find (even the docs use the term scan).  the 
actual things we can to/thru are exactly the same as what can be 
used in find.
the only little shortcut, is that it can be used to search multiple 
things at once.
but its flawed, you know how.
so its not as usefull as it could be .
Geomol
27-Apr-2011
[5647x2]
Wow! And I thought, I knew about parse. :-)

A)

As I understand it, matching a rule like [end] is valid and parse 
will return true, if you're at the end, and in this special case, 
the curser isn't advanced further (becuase it's the end). And I understand 
the rule [thru end] as trying to advance the curser past the end, 
which isn't possible, so it fails.

B)

If the rule [end] advance the curser past the end, and this is valid, 
so parse returns true, then the rule [thru end] also should return 
true. But but but this can't be the case, as then this would not 
return true:
parse [a] [word!]

which it does. The reason, it shouldn't return true, is because we're 
at the end, not past the end. Unless both being at the end and past 
the end should return true.


I guess, it's a matter of implementation (as this isn't well documented 
afaik). I prefer the A) situation, as the B) situation is more confusing. 
Don't you agree?
Think of the end of a series as an internal marker, which the user 
shouldn't see as an element in the series, if you ask me.
Ladislav
27-Apr-2011
[5649]
You are missing the point, why don't you read the THRU documentation 
instead of speculations?
Geomol
27-Apr-2011
[5650]
Ok, I will. I did long time ago, but maybe it changed, or I missed 
something the first time, or I forgot, how it works!? :-)
Ladislav
27-Apr-2011
[5651]
See the above reference to the doc article
Geomol
27-Apr-2011
[5652x3]
Oh, I got my understanding from http://www.rebol.com/docs/core23/rebolcore-15.html#section-4
The little example there is a good way to understand it:

page: read http://www.rebol.com/
parse page [thru <title> copy text to </title>]
print text
REBOL Technologies


I was thinking R2, maybe you guys talked about R3 only? Has this 
changed?
I think, the docs match pretty well.
thru	advance input thru a value or datatype


You're right, taking this strictly, we should be able to advance 
thru the end. But this doesn't make much sense, so my guess is, most 
people wouldn't take this strictly, when talking the end of a series.
The end specifies that nothing follows in the input stream. The entire 
input has been parsed.


I read it, as there isn't anymore to parse. So is it possible to 
parse past the end? I would say no.
Ladislav
27-Apr-2011
[5655x2]
This is just a speculative interpretation of a text of one example. 
See the documentation.
http://www.rebol.com/r3/docs/concepts/parsing-summary.html
Geomol
27-Apr-2011
[5657]
Keywords that accept a repeat count are:
...
end


So the advancing must stop somehow, as we can parse end multiple 
times.

>> parse [a b c] [to end]      
== true
>> parse [a b c] [to end 5 end]
== true


Anyway, this may be a pointless discussion, if the language isn't 
clearly defined in such detail.
Ladislav
27-Apr-2011
[5658x8]
Why don't you read the documentation?
It is explained in there, and it is the *only* place where it is 
explained in general, not just using one example.
The fact is, that when a rule matches, the cursor may (optionally) 
advance, but it does not need to.
In case the cursor advances, the head of the rule match is distinct 
from the tail of the rule match. As opposed to that, when the cursor 
does not advance, the head of the match is identical with the tail 
of the match.
Which is the case of e.g. the above NONE rule.
But, many rules can have this property, even in R2
And, surely, one of the rules having this property is the END rule.
So, while the TO rule advances to the head of the subrule match, 
the THRU rule advances to the tail of the subrule match, which happen 
to be identical in case the subrule match does not advance the cursor.