r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Maxim
1-May-2011
[5790x2]
bah, I'd just stick with R3 parsing for Red.  it'll be a good incentive 
for some to upgrade.
(to red or R3 parse, depending on how you see "upgrade"  ;-)
Geomol
1-May-2011
[5792]
I think about downgrading. :-) You know, keep it simple. Like dropping 
SKIP as it's the same as any-type! etc. If I want SKIP, I can just 
define it then: skip: :any-type!
Maxim
1-May-2011
[5793]
I'd drop any-type!  :-)
Geomol
1-May-2011
[5794]
Having skip as a keyword mean, you can't use that word as a variable.
BrianH
1-May-2011
[5795]
That doesn't work with string parsing.
Geomol
1-May-2011
[5796]
ok
BrianH
1-May-2011
[5797]
Most people tend to not use 'skip as a variable anyways, because 
of the SKIP function.
Geomol
1-May-2011
[5798]
I in general very much like the idea, that many rebol functions can 
take different datatypes and work anyway. But I was thinking, if 
parsing blocks and parsing strings is so different, that it should 
be two functions?
Maxim
1-May-2011
[5799x2]
and I always prefix my rules to have them stand out from keywords.
nah, it would just use up another word.  there is no ambiguity in 
the case of parse, as lets say ADD.  where the same datatype may 
mean two things.
BrianH
1-May-2011
[5801x2]
For the mezzanine version, two functions might be better, though 
they can share code in the same module. Maybe just have one exported 
word for a dispatch function though.
(or the context equivalent of modules for R2)
Geomol
1-May-2011
[5803x2]
yes
When programming it, I also wondered, why the or keyword is | and 
not OR. Do you know the reason?
BrianH
1-May-2011
[5805]
Parsing tradition. And it's not really OR, it's backtracking alternation.
Geomol
1-May-2011
[5806]
Right, just wondered, now rebol call e.g. floats for decimals etc. 
many attempts to make the language more humane.
BrianH
1-May-2011
[5807]
Considering that the space character is the closest thing to AND 
if | is OR, we should consider ourselves to have gotten off lucky 
:)
Geomol
1-May-2011
[5808]
parse [a b c] ['aAND'bAND'cEND]
hmm, yeah, you've got a point.
BrianH
1-May-2011
[5809]
We used up that luck though when we called the lookahead-match operation 
AND, and the lookahead-non-match operation NOT.
Geomol
1-May-2011
[5810]
& and ! maybe?
BrianH
1-May-2011
[5811]
We're probably fine with the wording we got. Though strangely enough, 
| is the ELSE of the IF operation. ELSE is a more descriptive name 
for | than OR in general.
Ladislav
1-May-2011
[5812]
Geomol:

    [to rule skip]

does not mean the same as

    [thru rule]

, as can be demonstrated when comparing the behaviour of

   thru rule

for 

    rule = "abc"

It is quite a surprise for me, that you don't see the difference.
Geomol
2-May-2011
[5813]
In R2 parsing a block:

>> parse ["abc"] [to "abc" skip]
== true
>> parse ["abc"] [thru "abc"]   
== true


I know, it's different when parsing a string instead of a block. 
My comparison of [thru rule] to the alternatives was meant as a loose 
comparison, not to be taken literally. So it's easy to think of THRU 
to work this way, because it does in many cases, therefore the confusion.
Ladislav
2-May-2011
[5814]
because it does in many cases

 - should rather be "because THRU is so limited, that it is unable 
 to handle many cases"
Geomol
2-May-2011
[5815]
yeah :)
Ladislav
2-May-2011
[5816]
But, the recursive description:

    a: [b | skip a]

is quite natural.
Geomol
2-May-2011
[5817]
Yes, and that should work in all cases, if the b rule is found, complex 
or not. And this will return true, if b is END, because  END is a 
repeatable rule (you can't go past it with SKIP).


NONE is also repeatable, and if you look in the code, I have to take 
care of this too separately. This mean, we can't parse none of datatype 
none! by using the NONE keyword, but we can using a datatype:

>> parse reduce [none] [none] 
== false
>> parse reduce [none] [none!]
== true


So it raises the question, if the NONE keyword should be there? What 
are the consequences, if we drop NONE as a keyword? And are there 
other repeatable rules beside END and NONE? In R2 or R3.
Ladislav
2-May-2011
[5818]
The "empty string rule" (represented by the NONE keyword in REBOL) 
is absolutely necessary to have. All other members of the Top Down 
Parsing Language family have it as well.
Geomol
2-May-2011
[5819]
Ok, what is a good source of information to read about parsing in 
general? The Top Down Parsing Language family etc.?
Ladislav
2-May-2011
[5820]
You can find something in the Wikipedia:

http://en.wikipedia.org/wiki/Parsing_expression_grammar¨

http://en.wikipedia.org/wiki/Top-down_parsing_language
Geomol
2-May-2011
[5821]
Is the "empty string rule" covered by butting a | without anything 
after it? Like in:

>> parse [] ['a |] 
== true
>> parse [] ['a | none]
== true
Ladislav
2-May-2011
[5822]
Hmm, as it looks, we could do without the empty string, we could 
use the rule like:

    empty: []
Geomol
2-May-2011
[5823]
It could be interesting to creat an absolutely minimal PARSE function, 
that can handle all we expect from such a function but with as little 
code as possible (as few keywords as possible).
Ladislav
2-May-2011
[5824x2]
For strings, the

    empty: ""

should work as well, but it does not.
Another variant that comes to mind is

    empty: quote ()
Geomol
2-May-2011
[5826]
From your idioms it can also be seen, that OPT can be dropped easily.
Ladislav
2-May-2011
[5827]
BTW (looks a unlucky to me), do you know, that in REBOL the NONE 
rule can fail?
Geomol
2-May-2011
[5828]
Can't remember. Give me an example.
Ladislav
2-May-2011
[5829]
Nevermind, I do not remember. The NONE rule is described in the wikibook, 
so it can be found in there, I guess.
Geomol
2-May-2011
[5830]
Maybe the last section here:

http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse/Parse_expressions#Troubleshooting
Ladislav
2-May-2011
[5831x3]
That is not related
Nevertheless, I messed it up. The NONE rule probably cannot fail, 
but it can consume some input.
(which does not look good as well)
Geomol
2-May-2011
[5834]
With bparse, this hangs:

bparse [a b c] [some [none]]

but it can be stopped by hitting <Esc>.
Ladislav
2-May-2011
[5835x2]
Yes, but that is OK, it is just an infinite cycle
Nobody should expect an infinite cycle to stop.
Geomol
2-May-2011
[5837x3]
It can't be stopped using PARSE, it seems.
In parse, NONE is a keyword unless it comes after TO or THRU, then 
it's looked up.

>> parse [#[none!]] [none]     	; as a keyword
== false
>> parse [#[none!]] [thru none]	; looked up
== true

Same behaviour in R2 and R3.
Maybe it would be a good idea to make all these combination trigger 
an invalid argument error?

any end
some end
opt end
into end
set end ...
copy end ...
thru end

and then only let
to end
be valid.