r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Steeve
17-May-2009
[3777x2]
Parsing line by line is not the solution (neither the problem) there.

All you can do line by line can be enrolled in only one parsing flow. 
It's just matter of your skills in using parse.
i saw many people proposing to parse line by line in many topics 
here.
I don't get it. 
It's slower and wasting memory for nothing.

They seem to be afraid of the use of any/some parsing loops, i don't 
understand why.
Maxim
17-May-2009
[3779]
its just MUCH easier in doing it line by line because the context 
of the parse isn't the same.  a parse rule going astray in multi-line 
doesn't react the same as for a single line which has a context of 
"this has a header" | "this doesn't"


I'm not saying my solution can't be done using only one parse, only 
that the rules are that much simpler.  in my first tests, handling 
the first and last headers needed special treatment, ultimately forcing 
me to add new rules, and generally making the whole much more complex.
Steeve
17-May-2009
[3780]
i never had to cut data into lines when parsing, and i will never 
have to
Maxim
17-May-2009
[3781x2]
steeve I did a 4000 line parse rule... outperforming C code.  but 
I'm pragmatic.  if the rules are going to be 50% smaller, and 100% 
bug free. then that's the better solution.
I find parse is very suited to very complex systems.  strangely, 
the more complex the rules, the better they are at being parsed.
Steeve
17-May-2009
[3783]
i don't get your point, i've done a lot of parsing scripts too. 

Never saw that it could be bug free or smaller using parse line by 
line.
It's just wasting time and memory.
Maxim
17-May-2009
[3784]
it took me about 30 seconds to solve it with lines.  with a single 
parse rule, after 15m  I was still trying to corner a simple detail 
that meant rewriting the whole rules, or adding a new rule, just 
for one specific situation.  Had I started with another rule setup, 
I'd encountered another nagging situation (like yours has tumbled 
upon).


my time / hour is worth more than 2 milliseconds my of my computer 
consuming 1/4 watt of electricity.  Using 500 bytes more of ram that 
is recycled, also isn't worth consideration.

like I said, I'm pragmatic, that's all there is to it.
Steeve
17-May-2009
[3785]
My...

The problen in the method i proposed has nothing to do with the line 
by line approach.

Can't you figure that ? It's only because i try do recognise headers 
whitout knowing them.


I can rewrite your solution without using your line by line approach 
in 5 minutes (you didn't do yours in 30 secs btw).
It will be smaller and faster than yours.
But i don't see the interest, i thougth anyone could figure that.
Maxim
17-May-2009
[3786x2]
sorry... I should have been more preicse:


solved <> writting down the code.  it did take me a bit more time 
writting it down than solving, testing, cleaning and submitting, 
it.
on the other hand, I do see that parsing in the rebol scene seems 
to be the cause for bragging rights.  its a complex system reserved 
for a select few who have spent time and effort learning how to come 
to grips with it.


looking at working rules, makes it seem simple, but the deeper knowledge 
of how it works ... really isn't.
Graham
17-May-2009
[3788]
Given time and repeated use, parse should be able to be learnt by 
most programmers .. but many of us use it infrequently, and so don't 
retain the skills we might have learnt.
Steeve
17-May-2009
[3789]
You may be rigth on that point.

I think many rebolers well knowledged in most of practices, don't 
use parse at  his full power.
Whereas parse is most powerful feature in Rebol to my mind.
Graham
17-May-2009
[3790x4]
I've come across too many situations where parse has broken on me 
....
because the rules I wrote weren't comprehensive enough
block parsing can never be used on real world data ...
( exaggeration on my part )
Steeve
17-May-2009
[3794]
;-)
Maxim
17-May-2009
[3795x2]
remark v1: uses series handling, funcs, and a lot of code to get 
it to work.  

prbably about 200 lines.

remark v2:   20 line parse rule + 5line stack context object.


v2 is 50 times faster, and does twice as more, while being much more 
flexible in many api aspects.


parse is powerfull, but it took me 4 years to understand parse well 
enough in order to rewrite remark.
block parsing really is only to create friends in the rebol community 
 ;-)
Steeve
17-May-2009
[3797x2]
ahah
or enemies
Maxim
17-May-2009
[3799]
enemies?
Graham
17-May-2009
[3800]
I think he means energetic discussions
Steeve
17-May-2009
[3801x2]
yes we fight by throwing snipsets
take that !
>>forever [wait 1000000]
Maxim
17-May-2009
[3803]
ok, well we're still friends then, since this was string parsing 
 ;-D
Henrik
31-May-2009
[3804]
I haven't kept up with the latest parse bugs, but I was wondering 
about this:

>> parse/all {"abc","def"^/"ghi","jkl"} "^/"
== ["abc" {,"def"} "ghi" {,"jkl"}]


According to my logical sense, it should only split at the newline.
Maxim
31-May-2009
[3805]
strange bug
Henrik
31-May-2009
[3806]
it's the quotes that do it:

>> parse/all {""""""} "^/"
== ["" "" ""]
Graham
31-May-2009
[3807]
parsing thru quotes is always problematic
Maxim
31-May-2009
[3808x2]
its like if its temporarily switching to block mode within a string 
mode parsing.  :-(
IIRC carl once said that the simple rule parse was meant to be used 
to parse CSV... so that might explain it.
Henrik
31-May-2009
[3810x2]
strangely enough, it makes parsing CSV with quotes much more difficult, 
so I had to work around it.
for proper CSV parsing, we'll need some good functions for R3/Plus 
instead of trying to do some crappy stuff with PARSE directly.
Chris
31-May-2009
[3812]
The only place it seems to be useful is for parsing search or tag 
strings

	>> parse {painting "mona lisa" art} none
	== ["painting" "mona lisa" "art"]


But having simple mode act as 'split (in the absence of a 'split 
function) would be of more value.  It's particularly irksome that 
you can't easily 'split using newlines...
Tomc
2-Jun-2009
[3813]
I am in favor of having a simple split function if it helps rationalize 
parse
Ladislav
2-Jun-2009
[3814]
Simple split: check http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse#Simply_split
Henrik
2-Jun-2009
[3815]
That is not simple. :-)
Ladislav
2-Jun-2009
[3816x2]
what? just use it
;-)
Henrik
2-Jun-2009
[3818x2]
anyhoo, SPLIT could be backported from R3, if BrianH has not already 
done that.
although with upcoming parse changes it might need to be rewritten. 
SPLIT is rather big.
BrianH
2-Jun-2009
[3820]
I haven't gone over the code in SPLIT yet. Something about the API 
seems wrong, though not as bad as FORMAT. Once it iss more settled 
I'll backport SPLIT to R2 and put it in R2-Forward.
Pekr
5-Jun-2009
[3821]
I am trying to create primitive script, which investigates user/group/system 
rights on our filesystem (no Identity Management system here). The 
trouble is, that MS programmers have some weak days probably too 
:-) They forgot to add one stupid newline to the output of ICACLS, 
so I get following kind of outputs:

L:\Sprava\Personalni usek WALMARK\RUR:(OI)(CI)(F)
L:\Sprava\Personalni usek (OI)(CI)(F)
L:\Sprava\Personalni usek NT AUTHORITY\RUR:(OI)(CI)(F)
L:\Sprava\Personalni usek BUILTIN\RUR:(OI)(CI)(F)


I need to come-up with rules, which will allow me to filter out path 
from the first user/group/rights info. The problem is, that space 
is regular character in path. So how to easily create rule for above 
cases? The path is - "L:\Sprava\Personalni usek"
BrianH
5-Jun-2009
[3822]
If you know the path ahead of time you can skip past its length plus 
one, then start parsing.
Pekr
5-Jun-2009
[3823x2]
no, I have few megabytes, done from one call to ICACLS command line 
.... but never mind - ICACLS is not good tool. I just wanted to use 
REBOL here. I will have to start using VBScript for such stuff ...
The programmer which did the output has to be pretty much idiot though 
...
BrianH
5-Jun-2009
[3825]
Agreed. I mean, each line starts with a path - is it the same path 
every time, or a different one?
Pekr
5-Jun-2009
[3826]
different one ...