World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Pekr 3-Oct-2009 [4401x3] | I hope we get rest too ... USE, OF, LIMIT look all interesting. |
BrianH: has Carl noticed n BREAK? It is not in priority list, and it could escape Carl's radar, no? | |
I added it to the priority list too .... | |
Ladislav 3-Oct-2009 [4404] | Re N Break: I don't think, that even Break is "organic" to Parse, N Break is even more of a mess |
Steeve 3-Oct-2009 [4405x2] | And you all missed my (N Fail) proposal. |
I just rewrote the math expressions resolver. digit: charset "0123456789" num: [some digit opt [#"." any digit]] term: [num | #"(" any lv1 term #")" | #"-" any lv3 term] calc: [ remove [copy num1 term copy op skip copy num2 term] (expr: do reform select [ "+" [num1 op num2] "-" [num1 op num2] "*" [num1 op num2] "/" [num1 op num2] "^^" [num1 "**" num2] "%" [num1 "//" num2] ] op) stay insert expr (probe e) ] lv4: [term #"%" term then fail | break | calc] lv3: [any lv4 term #"^^" any lv4 term then fail | break | calc] lv2: [any lv3 term [#"*" | #"/"] any lv3 term then fail | break | calc] lv1: [any lv2 term [#"+" | #"-"] any lv2 term then fail | break | calc] I just think it's more clear like that. Moreover, it's prepared to use the further AND command. Because this nasty trick i use: [rule THEN FAIL | BREAK | calc] will be replaced by: [AND rule calc] | |
Pekr 4-Oct-2009 [4407] | What is your take on simple mode parsing? It is handy for simple CSV parsing, and the idiom is common: parse/all row ";" The trouble is, that if there is no data in last column, parse mistakenly makes the resulting block shorter, so you have to use common idiom: rec: parse/all append row ";" ";" I always wondered, if it could be regarded being a parse bug? |
Henrik 4-Oct-2009 [4408x2] | I wonder now if PARSE could automatically discern newlines, rather than having to deal with that in your parser. It would be cool, if strings could be considered line-based without specifically having to code for that. |
PARSE/LINES ? Maybe not. | |
Pekr 4-Oct-2009 [4410x4] | what would be the advantage? |
btw - remember we have deline/enlice natives in R3 now ... | |
enline | |
those should replace read/lines iirc | |
Henrik 4-Oct-2009 [4414x3] | the advantage would be to avoid skipping newlines. now that I think of it, you don't want it if you want to parse across a newline, but you wouldn't do that for CSV parsing. |
enline and deline will help somewhat. | |
well, my argument seems to be weak. but now the idea is there for further study. :-) | |
Pekr 4-Oct-2009 [4417] | Ladislav - in comment to ticket #1248, you write: According to the documentation, that can be found in http://www.rebol.net/wiki/Parse_Project parse "b" [not #"a"] yields FALSE correctly. If you want to obtain TRUE, you can try e.g.: parse "b" [not #"a" to end] My question is - what it the advantage to actually not advance the input on the rule match? It does not look natural and I would expect it to match the rule and hence move past it: >> parse "b" [not #"a" ??] end!: "b" == false ... as can be seen, it does not advance ... |
Steeve 4-Oct-2009 [4418] | i see, but it's impossible to advance i guess. NOT (as a pre-rule) is applied on the result of the following rule. So, #"a" failed (it's not advancing at all). Then, NOT #"a", reverse the state result. FAIL become MATCH. That's all |
Ladislav 4-Oct-2009 [4419x4] | What is the advantage?: 1) by not consuming input this would be a direct inversion of the rule. Example: parse ""a" [not end ...] is a meaningful rule, and it is quite trivial to see, that any rule consuming input would not be a direct inversion of this rule. NOT SOMETHING actually means, that at the current input position the SOMETHING rule shall not match. That does not give us any information, that NOT should skip any input (how far should it?). 2) This version of NOT is compatible with PEG 3) It is consistent with the AND operation: [AND rule] is equivalent to [NOT [NOT rule]] |
Yet another example: [NOT skip] is equivalent to the [END] rule and is meaningful only, when NOT does not skip any input | |
...I would expect it to match the rule and hence move past it... - that is trivially wrong. If the RULE matches, the [NOT RULE] cannot match, therefore it cannot even advance. The only case, when (theoretically) we could think of advancing is, when the rule does not match. But then, it is not known, how far. | |
Some may prefer Steeve's explanation, which looks very good to me. | |
Maxim 5-Oct-2009 [4423] | pekr, I had the same initial reaction, then realized that it would not be consistent wrt fail or no fail... when NOT would succeed a match (and fail the rule), the input would be beyond what the not is usefull for. when I started thinking about it, if you really want you can simply use a set word/get word pair to advance when the not finds a match to ignore a rule, but then its like not using 'NOT in the first place, so its pointless :-) |
Pekr 5-Oct-2009 [4424] | not advancing NOT is not that much useful imo. I know that it can't be technically done, but anyway ... |
Steeve 6-Oct-2009 [4425] | Guys, i think your opinion about NOT is a little harsh. In the case of complementing a charset, you just have to SKIP after the NOT rule. In the other cases, not advancing is of better use. At least that's what I see while rewriting some scripts as I do now. |
Pekr 6-Oct-2009 [4426] | The case is, that advancing can't be done in fact. It is just some psychological apsect, which leads some of us to think, that the output should be advanced, because we are looking for the complementing feature ... |
Steeve 6-Oct-2009 [4427x2] | Ok but could you give a case (other than complemented charsets which can be easly skiped) where you found that advancing is more convenient with a complemented rule. I mean, i can't conceive that such complemented rule would be actually easier to read. |
Or easier to write... | |
PeterWood 6-Oct-2009 [4429x2] | Skip is very slow compared with complemented charsets. |
If you are parsing a string skip only advances one character at a time. When I wrote make-word-list for rebol.org using a complemented charset gave a big performance improvement. Though you may be able to suggest better ways to optimise it - http://www.rebol.org/view-script.r?script=make-word-list.r | |
Steeve 6-Oct-2009 [4431] | I can have a look, but the purpose of NOT is not to have better perfs than complemented charset, but to allow some simplification when writing rules. Actually, It's the case of most other improvements, easier to write, not inevitably faster. And don't forget that safe complemented charset in R3 are a pain in the ass to construct, because of UTF-8 |
PeterWood 6-Oct-2009 [4432x2] | Which is why I was dissapointed that I apparently misunderstood from Carl's blog: Changes that are critical, but not highly complicated. For example, providing a NOT command seems easy enough, and it is now critical because using complemented charsets is problematic (due to the Unicode enhancements). |
This is the link to the blog - http://www.rebol.net/r3blogs/0155.html | |
Steeve 6-Oct-2009 [4434x2] | Well, i saw your script, i don't know if it can be faster, i only can say I would have written it differently. Probaby, using parse and load/next for all normal rebol values. I can see that your rule about matching binaries are false. Cause [#{" thru #"}"] is wrong (what if the the binary contains the #"^}" char ?) |
Forget my last remark, a binary can't contain #"^}" but a multi string yes | |
BrianH 6-Oct-2009 [4436x2] | Peter, Steeve, the original problem that started the parse proposals was the problem of complimenting charsets. However, it quickly changed to improving PARSE in general. Then, while we were waiting for the parse proposals to come up on the todo list, we came up with a better solution to complimenting charsets, which is not yet implemented and which is not limited to PARSE. |
Or maybe it's complementing? How would you compliment a charset? Say it has good coverage? | |
Pekr 6-Oct-2009 [4438] | BrainH: what is the method to get complementing like feature back? I havent seen anything discussed in that manner yet ... |
BrianH 6-Oct-2009 [4439x2] | Using a bit in the charset that would mark it as "complemented", and then all of its matching algorithms would do an internal not. |
The discussion was primarily in the comments of the relevant tickets in CureCode, though there was some in R3 chat. Not recently. | |
Pekr 6-Oct-2009 [4441] | If parse was already rewritten from the ground-up, couldn't it be directly adapted to allow streamed parsing? :-) You said it could be added later, when rewrite happens, so ... |
BrianH 6-Oct-2009 [4442] | Yeah, but by "later" I meant after 3.0 comes out. I haven't even discussed with Carl what would be involved yet. |
Pekr 6-Oct-2009 [4443] | ok |
BrianH 6-Oct-2009 [4444x2] | I want to write more port code first and refine the model based on what I learn. |
Plus, Carl's rewrite didn't change the basic algorithm of PARSE, just some details. I don't yet know whether my port PARSE is that easy. | |
Maxim 7-Oct-2009 [4446x2] | even with a complement bit, if you use a few chained union's & complements, etc, you'll eventually need to bake it... in the end, all the complement bit is usefull for is to keep the charset to below half the maximum size of the complete encoding. |
(which is still pretty big AFAICT) | |
BrianH 12-Oct-2009 [4448] | Behavior of BREAK, ANY and SOME decided, finally: http://www.rebol.net/r3blogs/0270.html |
Steeve 12-Oct-2009 [4449x2] | Yesssss, the return of BREAK !!!! |
i definitly didn't love the previous way to exit from a loop | |
older newer | first last |