r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Paul
5-Jun-2009
[3889]
;Pekr, to avoide subdirectories with the spaces you can use this 
instead of my earlier example:


copy/part path find/reverse find/reverse find/reverse find path "(" 
"\" " " " "
Pekr
5-Jun-2009
[3890]
Paul - that will not work. Because there is one exceptiong - NT AUTHORITY, 
which contains space ...
BrianH
5-Jun-2009
[3891]
Which is a keyword. BUILTIN is another keyword.
Pekr
5-Jun-2009
[3892]
But there can be also any domain name, not just keyword ....
BrianH
5-Jun-2009
[3893]
Ah, but the list of domain names in your network is a fixed list. 
You can use that list to generate the look-for-a-domain rule.
Paul
5-Jun-2009
[3894]
Right Pekr, forgot about that.
Pekr
5-Jun-2009
[3895]
I got it working. I use the following trick - I identify DOMAIN\USER:(RIGHT) 
or (RIGHT) sections first. Then I put weirdly markers around and 
catch the rest with the skip. The file is "clean", so actually what 
do I skip is either spaces, or path. I do check in emit function:

emit: does [
 if find tmp: trim copy/part p-start p-end ":\" [path: tmp]
 print [path domain user rights]
]

;--- rules - spaces, tabs, newlines

spacer-chars: charset [#" " #"^-" #"^/"]
spacers: [some spacer-chars]

;--- user-rights rules
;--- would be easier, if filesystem would not allow () ...

right-char: charset [#"A" - #"Z"]
right-rule: ["(" 1 2 right-char ")" ]

rights-rule: [r-start: some right-rule r-end: (rights: copy/part 
r-start r-end)]

;--- rule to identify user part

user-chars: complement charset {".,;:\/*}
user-rule: [copy user some user-chars ":" ]


;--- rule to identify domain - I expect it being typed in CAPITAL, 
can contain "-"
;--- the exception is "NT AUTHORITY" - contains space

domain-chars: charset [#"A" - #"Z" "-"]
domain-rule: [
    "NT AUTHORITY\" (domain: "NT AUTHORITY")
    |
     copy domain some domain-chars "\"  
]


;--- rules for combinations of: rights only (RIGHT), or DOMAIN\USER:(RIGT)
domain-user-rights: [
     rights-rule 
    |
     domain-rule
     user-rule
     rights-rule
]


parse/all str: read from-file [p-start: any [ p-end: domain-user-rights 
(emit) p-start: | skip ] to end]
Paul
5-Jun-2009
[3896x3]
lcase: charset "abcdefghijklmnopqrstuvwxyz"

copy/part path find/reverse find/reverse find/reverse find/reverse 
find path "(" "\" " " lcase " "
I'm assuming all usernames will be lowercase.
Windows doesn't use case specific usernames.
BrianH
5-Jun-2009
[3899]
No, but they use case-preserving.
Paul
5-Jun-2009
[3900]
Even for the output of icalcs?
BrianH
5-Jun-2009
[3901]
Probably. Only the domains are uppercased.
Paul
5-Jun-2009
[3902x2]
Then with the find command I don't think it will be possible.
At least not without a table lookup.
Graham
14-Jun-2009
[3904]
What's the most economical way to do this.  I have a line of text, 
and I want to classify each line.  So, if I find the word "tablet" 
in it, I class this as a U2, and if I find "capsule", it's AV.

I can do a sequence of finds inside a case statement, or I can use 
a parse.  But in the first instance I have multiple find statements, 
but in the latter I have mutliple assignments in my code.
Gregg
14-Jun-2009
[3905]
Economical how, in space, speed, or complexity? e.g., repeated FINDs 
can seem inelegant, but are easy to understand and maintain.

If lines match more than one rule, what is the desired behavior?


Assuming you have test data, and if performance is the key, have 
you done any quick tests?
Graham
15-Jun-2009
[3906x3]
elegant in looks :)
the lines are so few in number that it won't make any practical difference 
... just wondering if there were a preference on how to do this without 
code duplication.
I decided that since the way I was going to use parse, or case meant 
the code was mixed in with the data .. it was better to do it differently.
Tomc
15-Jun-2009
[3909x3]
you need to maintain a map of keywords and codes   to that  in its 
own file and read it in to build your rules
sort it by keyword length  longest first
before building the rules 
then when codes change or mor are added you just update your map
Graham
15-Jun-2009
[3912]
basically what  ended up doing  :)
PeterWood
16-Jun-2009
[3913]
I'm puzzled about the difference result when using [to end end] and 
[thru end}. Anybody know why?

 >> parse "123456789" [to end end]
== true


>> parse "123456789" [thru end]

== false
Maxim
16-Jun-2009
[3914x5]
note: 

parse "123456789" [to end]
== true
this has also puzzled me, since:

>> parse "123456789" [thru end here:] index? here
== 10
>> parse "123456789" [to end here:] index? here
== 10
maybe the rule thru fails because you can't actually go past the 
end.
just like this fails too. even though we are at the end:

>> print parse "123456789" [9 skip here:] index? here
true
== 10
>> print parse "123456789" [10 skip here:] index? here
false
== 10
it does make sense, and its consistent with parse... it only returns 
true when the last rule ends Exactly AT the end.
PeterWood
16-Jun-2009
[3919]
maybe the rule thru fails because you can't actually go past the 
end
 - but does [thru end] go past the end?
Maxim
16-Jun-2009
[3920]
yes it goes one past the end. it does not stop AT the end.
BrianH
16-Jun-2009
[3921]
end has no length, so to end and thru end mean the same thing.
PeterWood
16-Jun-2009
[3922]
I guess you could answer that end is past the end of the input. But 
the behavior seems inconsistent:

>> parse "123456789" [thru "8" "9" end
]
== true

>> parse "123456789" [thru "9" end]
  
== true
>> parse "123456789" [thru end]
        
== false
Maxim
16-Jun-2009
[3923]
but brian, skipping past the end, still puts you at the end of the 
series, but the parser know you tried to go beyond the end... ITs 
the thru wich is failing, cause it knows you are trying to go beyond 
the end.
PeterWood
16-Jun-2009
[3924]
It's different in R3 :-)

>> parse "123456789" [thru end]

== true
Maxim
16-Jun-2009
[3925]
thru consumes the end word, and then detects that, as a result, it 
would put you beyond the end.  
really, its quite logical.  


but in practically, thru shouldn't complain.... cause as you say, 
in this specific context, thru and to really do mean the same end.
PeterWood
16-Jun-2009
[3926]
I prefer the R3 behaviour. I really hope that it doesn't change.
BrianH
16-Jun-2009
[3927]
I'll make sure of that, Peter.
PeterWood
16-Jun-2009
[3928]
Thanks.
Ladislav
16-Jun-2009
[3929]
yes, Peter, I am sure R3 behaviour is correct
BrianH
23-Jun-2009
[3930x2]
In R2:

>> parse/all { X X  XX X X} [(prin 'a) some [(prin 'b) "X" (prin 
'c) [(prin 'd) "X" (prin 'e) | (prin 'f) skip (prin 'g)] (prin 'h) 
| (prin 'i) skip (prin 'j)] (prin 'k)]
abijbcdfghbcdfghbijbcdehbijbcdfghbcdfijbik== true

In R3:

>> parse/all { X X  XX X X} [(prin 'a) some [(prin 'b) "X" (prin 
'c) [(prin 'd) "X" (prin 'e) | (prin 'f) skip (prin 'g)] (prin 'h) 
| (prin 'i) skip (prin 'j)] (prin 'k)]
abijbcdfghbcdfghbijbcdehbijbcdfghbcdfijk== true


In both cases the fij near the end should should be fgh - a bug in 
PARSE.
Never mind, I missed that the last X is at the end of the string. 
No bugs.
shadwolf
30-Jun-2009
[3932x2]
the more i try to understand parse the less i understand it
i want to try to make a tutorial about parse but my knowledge of 
it is poor 

 so as we have a wiki we could start a project to write a documentation 
  with the goal to make people understand what is the interrest of 
 parse
Sunanda
30-Jun-2009
[3934]
Parse question on stackoverflow (unanswered as yet:)

http://stackoverflow.com/questions/1060727/rebol-parse-dealing-with-whitespace-and-copy-var
BrianH
30-Jun-2009
[3935]
Answered :)
Sunanda
30-Jun-2009
[3936]
Fast!
shadwolf
30-Jun-2009
[3937]
http://www.rebolfrance.info/articles/allaboutparse
documentation to put all what we want to now about parse
BrianH
30-Jun-2009
[3938]
There's also a lot of documentation about parse's behavior (in theory) 
at the beginning of the Parse Proposals page on DocBase.