World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Ladislav 2-May-2011 [5814] | because it does in many cases - should rather be "because THRU is so limited, that it is unable to handle many cases" |
Geomol 2-May-2011 [5815] | yeah :) |
Ladislav 2-May-2011 [5816] | But, the recursive description: a: [b | skip a] is quite natural. |
Geomol 2-May-2011 [5817] | Yes, and that should work in all cases, if the b rule is found, complex or not. And this will return true, if b is END, because END is a repeatable rule (you can't go past it with SKIP). NONE is also repeatable, and if you look in the code, I have to take care of this too separately. This mean, we can't parse none of datatype none! by using the NONE keyword, but we can using a datatype: >> parse reduce [none] [none] == false >> parse reduce [none] [none!] == true So it raises the question, if the NONE keyword should be there? What are the consequences, if we drop NONE as a keyword? And are there other repeatable rules beside END and NONE? In R2 or R3. |
Ladislav 2-May-2011 [5818] | The "empty string rule" (represented by the NONE keyword in REBOL) is absolutely necessary to have. All other members of the Top Down Parsing Language family have it as well. |
Geomol 2-May-2011 [5819] | Ok, what is a good source of information to read about parsing in general? The Top Down Parsing Language family etc.? |
Ladislav 2-May-2011 [5820] | You can find something in the Wikipedia: http://en.wikipedia.org/wiki/Parsing_expression_grammar¨ http://en.wikipedia.org/wiki/Top-down_parsing_language |
Geomol 2-May-2011 [5821] | Is the "empty string rule" covered by butting a | without anything after it? Like in: >> parse [] ['a |] == true >> parse [] ['a | none] == true |
Ladislav 2-May-2011 [5822] | Hmm, as it looks, we could do without the empty string, we could use the rule like: empty: [] |
Geomol 2-May-2011 [5823] | It could be interesting to creat an absolutely minimal PARSE function, that can handle all we expect from such a function but with as little code as possible (as few keywords as possible). |
Ladislav 2-May-2011 [5824x2] | For strings, the empty: "" should work as well, but it does not. |
Another variant that comes to mind is empty: quote () | |
Geomol 2-May-2011 [5826] | From your idioms it can also be seen, that OPT can be dropped easily. |
Ladislav 2-May-2011 [5827] | BTW (looks a unlucky to me), do you know, that in REBOL the NONE rule can fail? |
Geomol 2-May-2011 [5828] | Can't remember. Give me an example. |
Ladislav 2-May-2011 [5829] | Nevermind, I do not remember. The NONE rule is described in the wikibook, so it can be found in there, I guess. |
Geomol 2-May-2011 [5830] | Maybe the last section here: http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse/Parse_expressions#Troubleshooting |
Ladislav 2-May-2011 [5831x3] | That is not related |
Nevertheless, I messed it up. The NONE rule probably cannot fail, but it can consume some input. | |
(which does not look good as well) | |
Geomol 2-May-2011 [5834] | With bparse, this hangs: bparse [a b c] [some [none]] but it can be stopped by hitting <Esc>. |
Ladislav 2-May-2011 [5835x2] | Yes, but that is OK, it is just an infinite cycle |
Nobody should expect an infinite cycle to stop. | |
Geomol 2-May-2011 [5837x3] | It can't be stopped using PARSE, it seems. |
In parse, NONE is a keyword unless it comes after TO or THRU, then it's looked up. >> parse [#[none!]] [none] ; as a keyword == false >> parse [#[none!]] [thru none] ; looked up == true Same behaviour in R2 and R3. | |
Maybe it would be a good idea to make all these combination trigger an invalid argument error? any end some end opt end into end set end ... copy end ... thru end and then only let to end be valid. | |
BrianH 2-May-2011 [5840x2] | [set var end] sets the var to none; [copy var end] sets to none in R2, the empty string/block in R3; [thru end] doesn't match, so it should just get a warning in case the rules were written to expect that; [opt end] is definitely legit; perhaps [any end] and [some end] should get warnings for R2, but keep in mind that rules like [any [end]] and [some [end]] are much more common, have the same effect, and are more difficult to detect; [into end] properly trigers an error in R2 and R3 because the end is not in a block, while [into [end]] is legit and safe. |
So you want to allow COPY, SET and OPT. Warn about THRU (because of the bug), ANY and SOME, because of R3 compatibility. Trigger an error for INTO if its argument rule isn't a block or a word referring to a block, but nothing special if that rule is END. | |
Geomol 4-May-2011 [5842x2] | [any end]Êand [some end] As we don't have warnings, I suggest these to produce errors. They can produce endless loops, and that should be pointed out in the docs, if they don't produce errors. [opt end] Yes, it's legit, but what's the point of this combination? At best, the programmer knows, what she does, and the combination will do nothing else than slowing the program down. At worst, the programmer misinterpret this combination, and since it doesn't produce an error or anything, it's a source of confusion. I suggest to make it produce an error. [into end] Produces an error today, so fine. [set end ...] and [copy end ...] I wasn't thinking of [set var end], but about setting a var named end to something, like [set end integer!]. Problem with this is, that now the var, end, can be used and looks exactly like the keyword, end, maybe leading to confusion. But after a second thought, maybe this being allowed is ok. [thru end] Making this produce an error will solve the problem with the confusion around, what this combination mean. And in the first place, it's a bad way to produce a 'fail' rule (in R2, in R3 it has the value true, and parsing continues). It's slow compared to e.g. [end skip]. |
These are just suggestions to make a better PARSE. I've learnt, it's a good idea to not allow most combinations of keywords in R2 parse. Another example: >> parse [] [opt into ['a]] == false >> bparse [] [opt into ['a]] ** User Error: Invalid argument: into The PARSE result is wrong, as I see it. My BPARSE produce an error. Better? | |
Ladislav 4-May-2011 [5844x4] | [any end]and [some end]As we don't have warnings, I suggest these to produce errors. - it is impossible to trigger errors every time an infinite loop is encountered - this case has been discussed and the solution was found already |
[opt end] ...I suggest to make it produce an error. - not reasonable, the rule *is* legitimate, as you noted | |
What you suggest is just a bunch of exceptions in the behaviour, which is always bad | |
You should rather look up how the "infinite loop problem" when using ANY and SOME was solved | |
Geomol 4-May-2011 [5848x2] | Here: http://www.rebol.com/r3/docs/concepts/parsing-summary.html#section-11 Input position must change . And the solution was to invent a new keyword, WHILE. Hm... |
I try to keep it simple. | |
Ladislav 4-May-2011 [5850] | This is much simpler than your exception: - actually working, your exception does not - not slowing down parsing |
Geomol 4-May-2011 [5851] | ok :) |
Ladislav 4-May-2011 [5852x2] | As to the WHILE keyword: some people may never use it, being content with SOME and AND as they work in R3 |
I mean SOME and ANY | |
BrianH 4-May-2011 [5854] | If you're going to make a better parse, it might be good to take into account the efforts that have already started to improve it in R3. The R3 improvements need a little work in some cases, but the thought that went into the process is quite valuable. [set end ...] or [copy end ...]: In R3, using any PARSE keyword (not just 'end) in a rule for other reasons triggers an error. >> parse [a] [set end skip] ** Script error: PARSE - command cannot be used as variable: end [any end] or [some end]: What Ladislav said. [opt end]: The point of the combination is [opt [end (do something)]]. [opt anything] is no more useless than [opt end]. Don't exclude something that has no effect just for that reason. Remember, [none] has no effect as well, but it's still valuable for making rules more readable. |
onetom 12-May-2011 [5855] | >> parse/all "/docs/rfq/" "/" == ["" "docs" "rfq"] shouldn't this be either ["docs" "rfq"] or ["" "docs" "rfq" ""] for the sake of consistency? |
Maxim 12-May-2011 [5856] | yes it should. :-( |
Geomol 13-May-2011 [5857] | Maxim, you asked for a function version of string parse. Was that because of situations like this? |
Maxim 13-May-2011 [5858] | its because I do A LOT more parsing on strings than on blocks.... one of the reasons is that Carl won't allow us to ignore commas in string data. so the vast majority of data which could be read directly by rebol is incompatible. this is still one of my pet peeves in rebol. trying to be pure, sometimes, just isn't usefull in real life. PARSE is about managing external data, I hate the fact that PARSE isn't trying to be friendly with the vast majority of data out there. |
Geomol 13-May-2011 [5859] | Do you mean, you want to be able to parse like this? >> parse [hello, world!] [2 word!] |
Maxim 13-May-2011 [5860x2] | its happened often yes. less lately, since I'm dealing more with XML and less with raw data. |
more like: parse load/all "hello, world!" [2 word!] | |
Geomol 13-May-2011 [5862x2] | I have wondered sometimes, what effects it would have, if such commas was just ignored. We need commas in numbers, but maybe commas could just be ignored beside that. |
So do you suggest, load/all "hello, world!" should return [hello world!] ? (Notice no comma.) | |
older newer | first last |