World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Ladislav 14-Nov-2011 [5928x4]	Sorry for not continuing with it, Sunanda, but when I gave it a second thought, it did not look like a possible speed-up could be worth the source code complication.
	Another Parse discussion subject: It looked to me like a good idea to be able in one Parse pass to sometimes match some strings in a case-sensitive way and other strings in a case-insensitive way. This is not possible using the /CASE refinement, since the refinement makes all comparison case sensitive, or if not used, all comparisons are case insensitive. Wouldn't it be good to be able to adjust the comparison sensitivity on-the-fly during parsing?
	I think, that it should not be overly complicated to achieve the goal e.g. by using a CASE keyword in PARSE.
	(for switching to case-sensitive mode, and e.g. a NO-CASE for switching to case-insensitive mode)
BrianH 14-Nov-2011 [5932x4]	How about a CASE operation that applies to the next rule, which could be a block? No NO-CASE operation required, and better to integrate with backtracking.
	It would be a modifier, like OPT or 1.
	While we're at it, the KEEP operation from Topaz would be useful. I use PARSE wrapped in COLLECT, calling KEEP in parens, quite a bit.
	You'd miss the /into option for incremental collecting and preallocation, but at least you wouldn't need to BIND/copy your rules.
Ladislav 14-Nov-2011 [5936]	How about a CASE operation that applies to the next rule, which could be a block? No NO-CASE operation required - that is an error, even in that case you would need NO-CASE
BrianH 14-Nov-2011 [5937]	OK, but you wouldn't need NO-CASE to end a CASE. It would be another modifier, not a mode. Modes like that don't work with backtracking very well. So it would be like this: case ["a" no-case "b" "c"] not like this: case "a" no-case "b" case "c" no-case The two directives would be implemented as flags, like NOT.
Ladislav 14-Nov-2011 [5938]	OK, but you wouldn't need NO-CASE to end a CASE. - What I did propose was just the existence of such keywords, the exact implementation should be the one that is the simplest to implement, which may well be the one you mention.
BrianH 14-Nov-2011 [5939]	OK, cool. You have to be careful with the "mode" term though. That tripped up some of the last round of parse proposals, such as REVERSE.
Ladislav 14-Nov-2011 [5940]	Hmm, REVERSE has more issues, I think
BrianH 14-Nov-2011 [5941]	The biggest of which is that it hasn't been implemented yet :(
Ladislav 14-Nov-2011 [5942x2]	Well, I am not pushing for it.
Ladislav 14-Nov-2011 [5942x2]	But, CASE should be a simpler case ;-)
BrianH 14-Nov-2011 [5944]	I liked it at the time, at least the bounded modifier version, but of the unimplemented proposals it's not my highest priority.
Ladislav 14-Nov-2011 [5945]	OK, so, do you think I should put the CASE proposal (mentioning your variant) to the article?
BrianH 14-Nov-2011 [5946x4]	Sure :)
	We really should go over that article and note which of the proposals was implemented, in which version, and which were denied and why.
	article -> page
	It's especially important to document the denied proposals, since the reasons for their denial would be instructive.
Ladislav 14-Nov-2011 [5950]	Will have a look, and, will also use one ticket to let Carl know.
BrianH 14-Nov-2011 [5951]	What do you think of the KEEP operation from Topaz? A good idea, or out of scope for PARSE?
Ladislav 14-Nov-2011 [5952x2]	BTW, the limitation of CASE to just the next rule is not exactly necessary. I would like to point you e.g. to the description of the #localize-on #localize-off user-defined directive pair, which is defined so, that it will not have any problem with multitasking or recursion, yet the directives are not limited to just the subsequent value. (Robert plans to publish the source code and the documentation soon)
Ladislav 14-Nov-2011 [5952x2]	Regarding a KEEP keyword: may be a reasonable addition. I surely prefer KEEP, when choosing between KEEP and CHANGE.
BrianH 14-Nov-2011 [5954x3]	I would definitely not make that choice. I need CHANGE too, and the full version with the value you're changing to be an expression in a paren - the last part of the proposal that isn't implemented yet. That's at the top of my list.
	Ladislav, multitasking and recursion is not the same thing as backtracking. We already have backtracking bugs, we don't need to mandate more.
	(bad English grammar day)
Ladislav 15-Nov-2011 [5957x4]	I need CHANGE too, and the full version with the value you're changing to be an expression in a paren - this changing during parsing is known to be O(n), i.e. highly inefficient. For any serious code it is a disaster
	Anyway, I am happy this does not influence my code
	Regarding CASE and backtracking: it is not a problem when the effect of the keyword is limited to the nearest enclosing block.
	(which is exactly the case of the #localize-on / -off directives as well)
BrianH 15-Nov-2011 [5961x2]	O(n) isn't bad if n is small, especially compared to other parts of the process. Most of my apps are bound by database or filesystem speed.
BrianH 15-Nov-2011 [5961x2]	Backtracking often happens within blocks too, but yes, that does limit the scope of the problems caused (it doesn't eliminate the problem, it just limits its scope). Mode operations also don't interact well with flow control operations like OPT, NOT and AND. What would NOT CASE mean if CASE has effect on subsequent code without being tied to it? As a comparison, NOT CASE "a" has a much clearer meaning.
Gregg 15-Nov-2011 [5963]	I like the idea of a CASE option. There haven't been many times I've needed it, but a few. Other things are higher on my priority list for R3, but I wouldn't complain if this made its way in there.
Ladislav 15-Nov-2011 [5964]	Hmm, to not complicate matters and hoping that it is the simpler variant I modified the CASE/NO-CASE proposal to use the CASE RULE and NO-CASE RULE syntax, since it really looks like simpler to implement than other possible alternatives.
Endo 1-Dec-2011 [5965]	I want to keep the digits and remove all the rest, t: "abc56xyz" parse/all t [some [digit (prin "d") \| x: (prin "." remove x)]] print head t this do the work but never finish. If I add a "skip" to the second part the result is "b56y". How do I do?
Geomol 1-Dec-2011 [5966]	Alternative not using parse: >> t: "abc56xyz" == "abc56xyz" >> non-digit: "" == "" >> for c #"a" #"z" 1 [append non-digit c] == "abcdefghijklmnopqrstuvwxyz" >> for c #"A" #"Z" 1 [append non-digit c] == {abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ} >> trim/with t non-digit == "56"
Endo 1-Dec-2011 [5967]	Nice way, thank you. But still curios about how to do it with parse.
Gabriele 1-Dec-2011 [5968x2]	>> s: "abc56xyz" == "abc56xyz" >> digit: charset "1234567890" == make bitset! #{ 000000000000FF03000000000000000000000000000000000000000000000000 } >> non-digit: complement digit == make bitset! #{ FFFFFFFFFFFF00FCFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF } >> parse/all s [(o: copy "") any [mk1: some digit mk2: (insert/part tail o mk1 mk2) \| some non-digit]] o == "56"
Gabriele 1-Dec-2011 [5968x2]	(mm, not sure why the copy/past was messed up. i hope you get the idea anyway.)
Endo 1-Dec-2011 [5970x2]	I just did the same thing: t: "abc56xyz" parse/all t [some [x: non-digit (prin first x remove x x: back x) :x \| skip]] head t
Endo 1-Dec-2011 [5970x2]	a bit more clear: t: "abc56xyz" parse/all t [some [x: non-digit (x: back remove x) :x \| skip]] head t
Gabriele 1-Dec-2011 [5972]	note that copying the whole thing is probably faster than removing multiple times. also, doing several chars at once instead of one at a time is faster.
Endo 1-Dec-2011 [5973x2]	It depends on the input, but if it's a long text with many multiple chars to insert/remove your way will be faster. Thanks
Endo 1-Dec-2011 [5973x2]	Oh I think no need to "back" t: "abc56xyz" parse/all t [some [x: non-digit (remove x) :x \| skip]] head t
Dockimbel 1-Dec-2011 [5975]	Endo: in your first attempt, your second rule in SOME block is not making the input advance when the end of the string is reached because (remove "") == "", so it enters an infinite loop. A simple fix could be: t: "abc56xyz" parse/all t [any [digit (prin "d") \| x: skip (prin "." remove x) :x]] (remember to correctly reset the input cursor when modifying the parsed series) As others have suggested, they are more optimal ways to achieve this trimming.
Endo 1-Dec-2011 [5976x2]	Strange but I tried to remove the whole part in one time, but its slower than the other: aaa: [t: "abc56def7" parse/all t [some [x: some non-digit y: (remove/part x y) :x \| skip]] head t] bbb: [t: "abc56def7" parse/all t [some [x: non-digit (remove x) :x \| skip]] head t] >> benchmark2 aaa bbb ;(executes block 10'000'000 times.) Execution time for the #1 job: 0:00:11.719 Execution time for the #2 job: 0:00:11.265 #1 is slower than #2 by factor ~ 1.04030181979583
Endo 1-Dec-2011 [5976x2]	Doc: Thank you. I tried to do that way (advancing the series position) but couldn't. I may add some more things so I wish to do it by parse instead of other ways. And want to learn parse more :) Thanks for all!
older newer	first last