r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Graham
17-May-2009
[3790x4]
I've come across too many situations where parse has broken on me 
....
because the rules I wrote weren't comprehensive enough
block parsing can never be used on real world data ...
( exaggeration on my part )
Steeve
17-May-2009
[3794]
;-)
Maxim
17-May-2009
[3795x2]
remark v1: uses series handling, funcs, and a lot of code to get 
it to work.  

prbably about 200 lines.

remark v2:   20 line parse rule + 5line stack context object.


v2 is 50 times faster, and does twice as more, while being much more 
flexible in many api aspects.


parse is powerfull, but it took me 4 years to understand parse well 
enough in order to rewrite remark.
block parsing really is only to create friends in the rebol community 
 ;-)
Steeve
17-May-2009
[3797x2]
ahah
or enemies
Maxim
17-May-2009
[3799]
enemies?
Graham
17-May-2009
[3800]
I think he means energetic discussions
Steeve
17-May-2009
[3801x2]
yes we fight by throwing snipsets
take that !
>>forever [wait 1000000]
Maxim
17-May-2009
[3803]
ok, well we're still friends then, since this was string parsing 
 ;-D
Henrik
31-May-2009
[3804]
I haven't kept up with the latest parse bugs, but I was wondering 
about this:

>> parse/all {"abc","def"^/"ghi","jkl"} "^/"
== ["abc" {,"def"} "ghi" {,"jkl"}]


According to my logical sense, it should only split at the newline.
Maxim
31-May-2009
[3805]
strange bug
Henrik
31-May-2009
[3806]
it's the quotes that do it:

>> parse/all {""""""} "^/"
== ["" "" ""]
Graham
31-May-2009
[3807]
parsing thru quotes is always problematic
Maxim
31-May-2009
[3808x2]
its like if its temporarily switching to block mode within a string 
mode parsing.  :-(
IIRC carl once said that the simple rule parse was meant to be used 
to parse CSV... so that might explain it.
Henrik
31-May-2009
[3810x2]
strangely enough, it makes parsing CSV with quotes much more difficult, 
so I had to work around it.
for proper CSV parsing, we'll need some good functions for R3/Plus 
instead of trying to do some crappy stuff with PARSE directly.
Chris
31-May-2009
[3812]
The only place it seems to be useful is for parsing search or tag 
strings

	>> parse {painting "mona lisa" art} none
	== ["painting" "mona lisa" "art"]


But having simple mode act as 'split (in the absence of a 'split 
function) would be of more value.  It's particularly irksome that 
you can't easily 'split using newlines...
Tomc
2-Jun-2009
[3813]
I am in favor of having a simple split function if it helps rationalize 
parse
Ladislav
2-Jun-2009
[3814]
Simple split: check http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse#Simply_split
Henrik
2-Jun-2009
[3815]
That is not simple. :-)
Ladislav
2-Jun-2009
[3816x2]
what? just use it
;-)
Henrik
2-Jun-2009
[3818x2]
anyhoo, SPLIT could be backported from R3, if BrianH has not already 
done that.
although with upcoming parse changes it might need to be rewritten. 
SPLIT is rather big.
BrianH
2-Jun-2009
[3820]
I haven't gone over the code in SPLIT yet. Something about the API 
seems wrong, though not as bad as FORMAT. Once it iss more settled 
I'll backport SPLIT to R2 and put it in R2-Forward.
Pekr
5-Jun-2009
[3821]
I am trying to create primitive script, which investigates user/group/system 
rights on our filesystem (no Identity Management system here). The 
trouble is, that MS programmers have some weak days probably too 
:-) They forgot to add one stupid newline to the output of ICACLS, 
so I get following kind of outputs:

L:\Sprava\Personalni usek WALMARK\RUR:(OI)(CI)(F)
L:\Sprava\Personalni usek (OI)(CI)(F)
L:\Sprava\Personalni usek NT AUTHORITY\RUR:(OI)(CI)(F)
L:\Sprava\Personalni usek BUILTIN\RUR:(OI)(CI)(F)


I need to come-up with rules, which will allow me to filter out path 
from the first user/group/rights info. The problem is, that space 
is regular character in path. So how to easily create rule for above 
cases? The path is - "L:\Sprava\Personalni usek"
BrianH
5-Jun-2009
[3822]
If you know the path ahead of time you can skip past its length plus 
one, then start parsing.
Pekr
5-Jun-2009
[3823x2]
no, I have few megabytes, done from one call to ICACLS command line 
.... but never mind - ICACLS is not good tool. I just wanted to use 
REBOL here. I will have to start using VBScript for such stuff ...
The programmer which did the output has to be pretty much idiot though 
...
BrianH
5-Jun-2009
[3825]
Agreed. I mean, each line starts with a path - is it the same path 
every time, or a different one?
Pekr
5-Jun-2009
[3826x3]
different one ...
I don't want to put output here, as this group is web public ...
ICACLS L:\my-path\*. /T > result.txt ...... /T means recursion ... 
so it was easy job at first sight ...
Ladislav
5-Jun-2009
[3829]
but, how do you *know* where the path ends, then?
Pekr
5-Jun-2009
[3830]
exactly :-) That is why I can see it as a bug on programmer's side. 
OK, here's one example:

L:\Some-path\Some subidr name here WALMARK\RUR:(OI)(CI)(F)
                          BUILTIN\Administrators:(OI)(CI)(F)
                          WALMARK\User1:(CI)(RX)
                          WALMARK\Some group:(OI)(CI)(M)
BrianH
5-Jun-2009
[3831]
What info do you need, the path or what comes after it? If the data 
after it is only a limited set of possible answers, you can try to 
skip to those in turn.
Pekr
5-Jun-2009
[3832]
So I start from right, making longer rule as [rights-section | doman-section 
user-section rights-section]
Ladislav
5-Jun-2009
[3833]
...makes no sense to define a rule, if you don't actually know where 
the path ends, as I see it
Pekr
5-Jun-2009
[3834]
There is one exception - "NT AUTHORITY" ... I would break both hands 
of the designer, which allowed this one exception - space in domain 
name is not normally allowed :-)
BrianH
5-Jun-2009
[3835]
parse/all/case line [[to "WALMARK" | to "BUILTIN"] a: (do something)]
Ladislav
5-Jun-2009
[3836]
aha, so, you actually know, where the path ends?, you didn't tell
BrianH
5-Jun-2009
[3837]
Or to "NT AUTHORITY"
Pekr
5-Jun-2009
[3838x2]
But you can define following rule: 

domain-chars: charset [#"A" - #"Z" "-"]
domain-rule: [
    "NT AUTHORITY\" (domain: "NT AUTHORITY")
    |
     copy domain some domain-chars "\"  
]


domain-user-rights: [rights-rule | domain-rule user-rule rights-rule]
So except the NT AUTHORITY, there can't be any space. So I filtered 
out the when there is only rights on the first line (OI)(CI) etc. 
and the second case - DOMAIN\USER-GROUP:(RIGHTS)