World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Paul 5-Jun-2009 [3889]	;Pekr, to avoide subdirectories with the spaces you can use this instead of my earlier example: copy/part path find/reverse find/reverse find/reverse find path "(" "\" " " " "
Pekr 5-Jun-2009 [3890]	Paul - that will not work. Because there is one exceptiong - NT AUTHORITY, which contains space ...
BrianH 5-Jun-2009 [3891]	Which is a keyword. BUILTIN is another keyword.
Pekr 5-Jun-2009 [3892]	But there can be also any domain name, not just keyword ....
BrianH 5-Jun-2009 [3893]	Ah, but the list of domain names in your network is a fixed list. You can use that list to generate the look-for-a-domain rule.
Paul 5-Jun-2009 [3894]	Right Pekr, forgot about that.
Pekr 5-Jun-2009 [3895]	I got it working. I use the following trick - I identify DOMAIN\USER:(RIGHT) or (RIGHT) sections first. Then I put weirdly markers around and catch the rest with the skip. The file is "clean", so actually what do I skip is either spaces, or path. I do check in emit function: emit: does [ if find tmp: trim copy/part p-start p-end ":\" [path: tmp] print [path domain user rights] ] ;--- rules - spaces, tabs, newlines spacer-chars: charset [#" " #"^-" #"^/"] spacers: [some spacer-chars] ;--- user-rights rules ;--- would be easier, if filesystem would not allow () ... right-char: charset [#"A" - #"Z"] right-rule: ["(" 1 2 right-char ")" ] rights-rule: [r-start: some right-rule r-end: (rights: copy/part r-start r-end)] ;--- rule to identify user part user-chars: complement charset {".,;:\/*} user-rule: [copy user some user-chars ":" ] ;--- rule to identify domain - I expect it being typed in CAPITAL, can contain "-" ;--- the exception is "NT AUTHORITY" - contains space domain-chars: charset [#"A" - #"Z" "-"] domain-rule: [ "NT AUTHORITY\" (domain: "NT AUTHORITY") \| copy domain some domain-chars "\" ] ;--- rules for combinations of: rights only (RIGHT), or DOMAIN\USER:(RIGT) domain-user-rights: [ rights-rule \| domain-rule user-rule rights-rule ] parse/all str: read from-file [p-start: any [ p-end: domain-user-rights (emit) p-start: \| skip ] to end]
Paul 5-Jun-2009 [3896x3]	lcase: charset "abcdefghijklmnopqrstuvwxyz" copy/part path find/reverse find/reverse find/reverse find/reverse find path "(" "\" " " lcase " "
	I'm assuming all usernames will be lowercase.
	Windows doesn't use case specific usernames.
BrianH 5-Jun-2009 [3899]	No, but they use case-preserving.
Paul 5-Jun-2009 [3900]	Even for the output of icalcs?
BrianH 5-Jun-2009 [3901]	Probably. Only the domains are uppercased.
Paul 5-Jun-2009 [3902x2]	Then with the find command I don't think it will be possible.
Paul 5-Jun-2009 [3902x2]	At least not without a table lookup.
Graham 14-Jun-2009 [3904]	What's the most economical way to do this. I have a line of text, and I want to classify each line. So, if I find the word "tablet" in it, I class this as a U2, and if I find "capsule", it's AV. I can do a sequence of finds inside a case statement, or I can use a parse. But in the first instance I have multiple find statements, but in the latter I have mutliple assignments in my code.
Gregg 14-Jun-2009 [3905]	Economical how, in space, speed, or complexity? e.g., repeated FINDs can seem inelegant, but are easy to understand and maintain. If lines match more than one rule, what is the desired behavior? Assuming you have test data, and if performance is the key, have you done any quick tests?
Graham 15-Jun-2009 [3906x3]	elegant in looks :)
	the lines are so few in number that it won't make any practical difference ... just wondering if there were a preference on how to do this without code duplication.
	I decided that since the way I was going to use parse, or case meant the code was mixed in with the data .. it was better to do it differently.
Tomc 15-Jun-2009 [3909x3]	you need to maintain a map of keywords and codes to that in its own file and read it in to build your rules
	sort it by keyword length longest first
	before building the rules then when codes change or mor are added you just update your map
Graham 15-Jun-2009 [3912]	basically what ended up doing :)
PeterWood 16-Jun-2009 [3913]	I'm puzzled about the difference result when using [to end end] and [thru end}. Anybody know why? >> parse "123456789" [to end end] == true >> parse "123456789" [thru end] == false
Maxim 16-Jun-2009 [3914x5]	note: parse "123456789" [to end] == true
	this has also puzzled me, since: >> parse "123456789" [thru end here:] index? here == 10 >> parse "123456789" [to end here:] index? here == 10
	maybe the rule thru fails because you can't actually go past the end.
	just like this fails too. even though we are at the end: >> print parse "123456789" [9 skip here:] index? here true == 10 >> print parse "123456789" [10 skip here:] index? here false == 10
	it does make sense, and its consistent with parse... it only returns true when the last rule ends Exactly AT the end.
PeterWood 16-Jun-2009 [3919]	maybe the rule thru fails because you can't actually go past the end - but does [thru end] go past the end?
Maxim 16-Jun-2009 [3920]	yes it goes one past the end. it does not stop AT the end.
BrianH 16-Jun-2009 [3921]	end has no length, so to end and thru end mean the same thing.
PeterWood 16-Jun-2009 [3922]	I guess you could answer that end is past the end of the input. But the behavior seems inconsistent: >> parse "123456789" [thru "8" "9" end ] == true >> parse "123456789" [thru "9" end] == true >> parse "123456789" [thru end] == false
Maxim 16-Jun-2009 [3923]	but brian, skipping past the end, still puts you at the end of the series, but the parser know you tried to go beyond the end... ITs the thru wich is failing, cause it knows you are trying to go beyond the end.
PeterWood 16-Jun-2009 [3924]	It's different in R3 :-) >> parse "123456789" [thru end] == true
Maxim 16-Jun-2009 [3925]	thru consumes the end word, and then detects that, as a result, it would put you beyond the end. really, its quite logical. but in practically, thru shouldn't complain.... cause as you say, in this specific context, thru and to really do mean the same end.
PeterWood 16-Jun-2009 [3926]	I prefer the R3 behaviour. I really hope that it doesn't change.
BrianH 16-Jun-2009 [3927]	I'll make sure of that, Peter.
PeterWood 16-Jun-2009 [3928]	Thanks.
Ladislav 16-Jun-2009 [3929]	yes, Peter, I am sure R3 behaviour is correct
BrianH 23-Jun-2009 [3930x2]	In R2: >> parse/all { X X XX X X} [(prin 'a) some [(prin 'b) "X" (prin 'c) [(prin 'd) "X" (prin 'e) \| (prin 'f) skip (prin 'g)] (prin 'h) \| (prin 'i) skip (prin 'j)] (prin 'k)] abijbcdfghbcdfghbijbcdehbijbcdfghbcdfijbik== true In R3: >> parse/all { X X XX X X} [(prin 'a) some [(prin 'b) "X" (prin 'c) [(prin 'd) "X" (prin 'e) \| (prin 'f) skip (prin 'g)] (prin 'h) \| (prin 'i) skip (prin 'j)] (prin 'k)] abijbcdfghbcdfghbijbcdehbijbcdfghbcdfijk== true In both cases the fij near the end should should be fgh - a bug in PARSE.
BrianH 23-Jun-2009 [3930x2]	Never mind, I missed that the last X is at the end of the string. No bugs.
shadwolf 30-Jun-2009 [3932x2]	the more i try to understand parse the less i understand it
shadwolf 30-Jun-2009 [3932x2]	i want to try to make a tutorial about parse but my knowledge of it is poor so as we have a wiki we could start a project to write a documentation with the goal to make people understand what is the interrest of parse
Sunanda 30-Jun-2009 [3934]	Parse question on stackoverflow (unanswered as yet:) http://stackoverflow.com/questions/1060727/rebol-parse-dealing-with-whitespace-and-copy-var
BrianH 30-Jun-2009 [3935]	Answered :)
Sunanda 30-Jun-2009 [3936]	Fast!
shadwolf 30-Jun-2009 [3937]	http://www.rebolfrance.info/articles/allaboutparse documentation to put all what we want to now about parse
BrianH 30-Jun-2009 [3938]	There's also a lot of documentation about parse's behavior (in theory) at the beginning of the Parse Proposals page on DocBase.
older newer	first last