r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Geomol
1-May-2011
[5766]
And by that I mean [opt some] might mean [opt [some none]]. :-) It's 
hard, this!
Ladislav
1-May-2011
[5767]
aha, sorry, the combinations of keywords like opt thru are undocumented, 
and the behaviour really is unexpected
Geomol
1-May-2011
[5768x3]
Ok, makes sense. Problem is probably too little documentation (design) 
in the details in the first place.
I think, I'll make those errors in my function version of parse. 
Almost done (in the first version similar to R2 version)
Kinda same problem, when combining them this way (still in R2):

>> parse [] [thru opt 'a]
== false
>> parse [a] [thru opt 'a]
== false
BrianH
1-May-2011
[5771]
In R3 that would be an improperly untriggered error, since TO and 
THRU are defined to not take the full gamut of rules, only a subset. 
Probably the same for R2, but a different subset.
Geomol
1-May-2011
[5772x3]
Ok, version 1.0.0 of BPARSE is found here:

http://www.fys.ku.dk/~niclasen/rebol/libs/bparse.r


It's a function version of PARSE, and can only parse blocks for now, 
not strings. It can do more or less all PARSE in R2 can do when parsing 
blocks. I've tried to trigger errors, R2 PARSE doesn't. The purpose 
is to play around with parsing to maybe make a better version than 
the native version and without bugs.
It's not as fast as the timings, I gave here earlier with a very 
early version.
I've thought some more about [thru end], which return false in the 
R2 version, but return true in R3. My version return false as R2, 
but I better understand the R3 way, now I've programmed it. It can 
be seen as, how THRU should be understood (, as also Ladislav said 
something about)? Do we think of

[thru rule]
as
[to rule rule] or
[to rule skip]

? If the TO keyword can handle complex rules like:
parse [a b] [to ['a 'b] ['a 'b]]

then the first might make better sense, and [thru end] should return 
true. But we can't parse like that in R2, so maybe we think more 
of it as the second, and then [thru end] should return false. But 
if you look in my version, I have to catch this special case, where 
END follows THRU, so it takes more code, which isn't good.


In any case, Ladislav's suggestion to use [end skip] as a fail rule 
is much better. If you're not at the end, the first word (end) will 
give false, else the next will fail, as you can't skip past end.
BrianH
1-May-2011
[5775]
END is a zero-length repeatable rule, like NONE, so TO END and THRU 
END should be equivalent.
Geomol
1-May-2011
[5776]
Makes sense, it's just hard to grasp, when used to how R2 parse works.
BrianH
1-May-2011
[5777]
I'd consider that an error in R2's PARSE, but not a fixable one because 
it would change the semantics.
Geomol
1-May-2011
[5778]
Now you have your own function version of parse, that you can make 
work exactly as you wish. :-) And then maybe, when you're satisfied, 
give it to Carl.


It should now also be easier to make C versions of parse for those, 
who make alternatives to REBOL. At least you have a REBOL function 
to start with.
Maxim
1-May-2011
[5779]
did you try it with complex rules?
Geomol
1-May-2011
[5780]
Yes, parsing a dialect I have to produce PDF output.
Maxim
1-May-2011
[5781]
wow, cool.
Geomol
1-May-2011
[5782]
Tried it on rebps2pdf.r and the example found here:
http://www.fys.ku.dk/~niclasen/postscript/
BrianH
1-May-2011
[5783]
Having an R2-compatible PARSE that you can run in R3 would be useful 
for large sets of parse rules that you haven't had the time to migrate 
yet.
Geomol
1-May-2011
[5784]
Ah yes, good idea. Haven't thought about that yet.
Maxim
1-May-2011
[5785]
did you start work on a string parser?
Geomol
1-May-2011
[5786]
nope
Maxim
1-May-2011
[5787]
oki.
Geomol
1-May-2011
[5788]
Some day probably. Let's see, how it goes with bparse first.
BrianH
1-May-2011
[5789]
It would also be useful to have an R3-compatible PARSE for R2. And 
both for Red.
Maxim
1-May-2011
[5790x2]
bah, I'd just stick with R3 parsing for Red.  it'll be a good incentive 
for some to upgrade.
(to red or R3 parse, depending on how you see "upgrade"  ;-)
Geomol
1-May-2011
[5792]
I think about downgrading. :-) You know, keep it simple. Like dropping 
SKIP as it's the same as any-type! etc. If I want SKIP, I can just 
define it then: skip: :any-type!
Maxim
1-May-2011
[5793]
I'd drop any-type!  :-)
Geomol
1-May-2011
[5794]
Having skip as a keyword mean, you can't use that word as a variable.
BrianH
1-May-2011
[5795]
That doesn't work with string parsing.
Geomol
1-May-2011
[5796]
ok
BrianH
1-May-2011
[5797]
Most people tend to not use 'skip as a variable anyways, because 
of the SKIP function.
Geomol
1-May-2011
[5798]
I in general very much like the idea, that many rebol functions can 
take different datatypes and work anyway. But I was thinking, if 
parsing blocks and parsing strings is so different, that it should 
be two functions?
Maxim
1-May-2011
[5799x2]
and I always prefix my rules to have them stand out from keywords.
nah, it would just use up another word.  there is no ambiguity in 
the case of parse, as lets say ADD.  where the same datatype may 
mean two things.
BrianH
1-May-2011
[5801x2]
For the mezzanine version, two functions might be better, though 
they can share code in the same module. Maybe just have one exported 
word for a dispatch function though.
(or the context equivalent of modules for R2)
Geomol
1-May-2011
[5803x2]
yes
When programming it, I also wondered, why the or keyword is | and 
not OR. Do you know the reason?
BrianH
1-May-2011
[5805]
Parsing tradition. And it's not really OR, it's backtracking alternation.
Geomol
1-May-2011
[5806]
Right, just wondered, now rebol call e.g. floats for decimals etc. 
many attempts to make the language more humane.
BrianH
1-May-2011
[5807]
Considering that the space character is the closest thing to AND 
if | is OR, we should consider ourselves to have gotten off lucky 
:)
Geomol
1-May-2011
[5808]
parse [a b c] ['aAND'bAND'cEND]
hmm, yeah, you've got a point.
BrianH
1-May-2011
[5809]
We used up that luck though when we called the lookahead-match operation 
AND, and the lookahead-non-match operation NOT.
Geomol
1-May-2011
[5810]
& and ! maybe?
BrianH
1-May-2011
[5811]
We're probably fine with the wording we got. Though strangely enough, 
| is the ELSE of the IF operation. ELSE is a more descriptive name 
for | than OR in general.
Ladislav
1-May-2011
[5812]
Geomol:

    [to rule skip]

does not mean the same as

    [thru rule]

, as can be demonstrated when comparing the behaviour of

   thru rule

for 

    rule = "abc"

It is quite a surprise for me, that you don't see the difference.
Geomol
2-May-2011
[5813]
In R2 parsing a block:

>> parse ["abc"] [to "abc" skip]
== true
>> parse ["abc"] [thru "abc"]   
== true


I know, it's different when parsing a string instead of a block. 
My comparison of [thru rule] to the alternatives was meant as a loose 
comparison, not to be taken literally. So it's easy to think of THRU 
to work this way, because it does in many cases, therefore the confusion.
Ladislav
2-May-2011
[5814]
because it does in many cases

 - should rather be "because THRU is so limited, that it is unable 
 to handle many cases"
Geomol
2-May-2011
[5815]
yeah :)