World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Maxim 17-May-2009 [3772] | the header-lbl rule in my example could be changed so it matches up to the first colon, but then, there is a flaw in that the text can also include something that LOOKS like a header and then you can have a stray value in the object... in the original example data you posted... this would be hard to tackle... Penicillin - allergy: |
Graham 17-May-2009 [3773x2] | That was my original way of doing things. |
I built the rule from the object and then parsed the data .. but my way relied on the headers being in the correct order. | |
Maxim 17-May-2009 [3775] | I started on steeve's course and had similar new-line issues, which is why I decided to parse liine by line. |
Steeve 17-May-2009 [3776x3] | can't be the headers be prefixed, it would be so easy to treat... |
Parsing line by line is not the solution (neither the problem) there. All you can do line by line can be enrolled in only one parsing flow. It's just matter of your skills in using parse. | |
i saw many people proposing to parse line by line in many topics here. I don't get it. It's slower and wasting memory for nothing. They seem to be afraid of the use of any/some parsing loops, i don't understand why. | |
Maxim 17-May-2009 [3779] | its just MUCH easier in doing it line by line because the context of the parse isn't the same. a parse rule going astray in multi-line doesn't react the same as for a single line which has a context of "this has a header" | "this doesn't" I'm not saying my solution can't be done using only one parse, only that the rules are that much simpler. in my first tests, handling the first and last headers needed special treatment, ultimately forcing me to add new rules, and generally making the whole much more complex. |
Steeve 17-May-2009 [3780] | i never had to cut data into lines when parsing, and i will never have to |
Maxim 17-May-2009 [3781x2] | steeve I did a 4000 line parse rule... outperforming C code. but I'm pragmatic. if the rules are going to be 50% smaller, and 100% bug free. then that's the better solution. |
I find parse is very suited to very complex systems. strangely, the more complex the rules, the better they are at being parsed. | |
Steeve 17-May-2009 [3783] | i don't get your point, i've done a lot of parsing scripts too. Never saw that it could be bug free or smaller using parse line by line. It's just wasting time and memory. |
Maxim 17-May-2009 [3784] | it took me about 30 seconds to solve it with lines. with a single parse rule, after 15m I was still trying to corner a simple detail that meant rewriting the whole rules, or adding a new rule, just for one specific situation. Had I started with another rule setup, I'd encountered another nagging situation (like yours has tumbled upon). my time / hour is worth more than 2 milliseconds my of my computer consuming 1/4 watt of electricity. Using 500 bytes more of ram that is recycled, also isn't worth consideration. like I said, I'm pragmatic, that's all there is to it. |
Steeve 17-May-2009 [3785] | My... The problen in the method i proposed has nothing to do with the line by line approach. Can't you figure that ? It's only because i try do recognise headers whitout knowing them. I can rewrite your solution without using your line by line approach in 5 minutes (you didn't do yours in 30 secs btw). It will be smaller and faster than yours. But i don't see the interest, i thougth anyone could figure that. |
Maxim 17-May-2009 [3786x2] | sorry... I should have been more preicse: solved <> writting down the code. it did take me a bit more time writting it down than solving, testing, cleaning and submitting, it. |
on the other hand, I do see that parsing in the rebol scene seems to be the cause for bragging rights. its a complex system reserved for a select few who have spent time and effort learning how to come to grips with it. looking at working rules, makes it seem simple, but the deeper knowledge of how it works ... really isn't. | |
Graham 17-May-2009 [3788] | Given time and repeated use, parse should be able to be learnt by most programmers .. but many of us use it infrequently, and so don't retain the skills we might have learnt. |
Steeve 17-May-2009 [3789] | You may be rigth on that point. I think many rebolers well knowledged in most of practices, don't use parse at his full power. Whereas parse is most powerful feature in Rebol to my mind. |
Graham 17-May-2009 [3790x4] | I've come across too many situations where parse has broken on me .... |
because the rules I wrote weren't comprehensive enough | |
block parsing can never be used on real world data ... | |
( exaggeration on my part ) | |
Steeve 17-May-2009 [3794] | ;-) |
Maxim 17-May-2009 [3795x2] | remark v1: uses series handling, funcs, and a lot of code to get it to work. prbably about 200 lines. remark v2: 20 line parse rule + 5line stack context object. v2 is 50 times faster, and does twice as more, while being much more flexible in many api aspects. parse is powerfull, but it took me 4 years to understand parse well enough in order to rewrite remark. |
block parsing really is only to create friends in the rebol community ;-) | |
Steeve 17-May-2009 [3797x2] | ahah |
or enemies | |
Maxim 17-May-2009 [3799] | enemies? |
Graham 17-May-2009 [3800] | I think he means energetic discussions |
Steeve 17-May-2009 [3801x2] | yes we fight by throwing snipsets |
take that ! >>forever [wait 1000000] | |
Maxim 17-May-2009 [3803] | ok, well we're still friends then, since this was string parsing ;-D |
Henrik 31-May-2009 [3804] | I haven't kept up with the latest parse bugs, but I was wondering about this: >> parse/all {"abc","def"^/"ghi","jkl"} "^/" == ["abc" {,"def"} "ghi" {,"jkl"}] According to my logical sense, it should only split at the newline. |
Maxim 31-May-2009 [3805] | strange bug |
Henrik 31-May-2009 [3806] | it's the quotes that do it: >> parse/all {""""""} "^/" == ["" "" ""] |
Graham 31-May-2009 [3807] | parsing thru quotes is always problematic |
Maxim 31-May-2009 [3808x2] | its like if its temporarily switching to block mode within a string mode parsing. :-( |
IIRC carl once said that the simple rule parse was meant to be used to parse CSV... so that might explain it. | |
Henrik 31-May-2009 [3810x2] | strangely enough, it makes parsing CSV with quotes much more difficult, so I had to work around it. |
for proper CSV parsing, we'll need some good functions for R3/Plus instead of trying to do some crappy stuff with PARSE directly. | |
Chris 31-May-2009 [3812] | The only place it seems to be useful is for parsing search or tag strings >> parse {painting "mona lisa" art} none == ["painting" "mona lisa" "art"] But having simple mode act as 'split (in the absence of a 'split function) would be of more value. It's particularly irksome that you can't easily 'split using newlines... |
Tomc 2-Jun-2009 [3813] | I am in favor of having a simple split function if it helps rationalize parse |
Ladislav 2-Jun-2009 [3814] | Simple split: check http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse#Simply_split |
Henrik 2-Jun-2009 [3815] | That is not simple. :-) |
Ladislav 2-Jun-2009 [3816x2] | what? just use it |
;-) | |
Henrik 2-Jun-2009 [3818x2] | anyhoo, SPLIT could be backported from R3, if BrianH has not already done that. |
although with upcoming parse changes it might need to be rewritten. SPLIT is rather big. | |
BrianH 2-Jun-2009 [3820] | I haven't gone over the code in SPLIT yet. Something about the API seems wrong, though not as bad as FORMAT. Once it iss more settled I'll backport SPLIT to R2 and put it in R2-Forward. |
Pekr 5-Jun-2009 [3821] | I am trying to create primitive script, which investigates user/group/system rights on our filesystem (no Identity Management system here). The trouble is, that MS programmers have some weak days probably too :-) They forgot to add one stupid newline to the output of ICACLS, so I get following kind of outputs: L:\Sprava\Personalni usek WALMARK\RUR:(OI)(CI)(F) L:\Sprava\Personalni usek (OI)(CI)(F) L:\Sprava\Personalni usek NT AUTHORITY\RUR:(OI)(CI)(F) L:\Sprava\Personalni usek BUILTIN\RUR:(OI)(CI)(F) I need to come-up with rules, which will allow me to filter out path from the first user/group/rights info. The problem is, that space is regular character in path. So how to easily create rule for above cases? The path is - "L:\Sprava\Personalni usek" |
older newer | first last |