World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Rebolek 28-May-2007 [1876]	nice to see char is faster than string, so I'm not going to rewrite this ;) Anyway, new version with BrianH's version of regex and bitset-to-string and with endless loop in tail-parse fixed is posted on http://bolek.techno.cz/reb/regex.r
Anton 28-May-2007 [1877]	That's funny, I thought we had determined that string was faster than char.
BrianH 28-May-2007 [1878x2]	It's probably close enough to ignore either way.
BrianH 28-May-2007 [1878x2]	Rebolek, I gather you made the parse go in reverse to handle rules like "a+a" better. How does your reverse code handle "aa+", or "aa+a" - same problem?
Dockimbel 28-May-2007 [1880]	Here's another benchmark: >> data: head insert/dup make string! 10'000'000 #"a" 10'000'000 >> t0: now/time/precise loop 10 [parse data [some "a"]] now/time/precise - t0 == 0:00:06.078 >> t0: now/time/precise loop 10 [parse data [some #"a"]] now/time/precise - t0 == 0:00:04.296 Running this test several times shows that char! matching is, in average, 30 % faster than string! matching.
BrianH 28-May-2007 [1881x2]	Well there you go. That's different numbers than last time, but more dramatic. It's just a #, easy fix :)
BrianH 28-May-2007 [1881x2]	Thanks for answering that question though.
Sunanda 28-May-2007 [1883]	Just tried your benchmark on my machine.....I get similar % faster for char! matching. Though the elapse times vary depending on the version of REBOL.
Dockimbel 28-May-2007 [1884]	Didn't want to sound "dramatic", but just wanted to provide a more accurate measure. Sure whatever datatype is used (char! or string!) in regex.r, that won't change much the overall speed. ;-)
BrianH 28-May-2007 [1885x2]	I'm more concerned about the first and follow sets - doing that wrong could mean a slowdown of orders of magnitude.
BrianH 28-May-2007 [1885x2]	Actually, big O slowdowns.
Rebolek 28-May-2007 [1887]	BrianH: "How does your reverse code handle "aa+", or "aa+a"" - damn... :) Anyway, today I was thinking about different way how to do it, had no time to code it yet. It's slow, I know...
BrianH 28-May-2007 [1888x2]	Are you familiar with the theories behind parser generators? That is what you are doing. I studied the theories in college - hence the talks about first and follow sets.
BrianH 28-May-2007 [1888x2]	I've been thinking about ways to do this too - you got me going. I was also thinking of a way to cache charsets for later reuse.
Maxim 28-May-2007 [1890x3]	didn't you guys try using a charset? I would guess its an order of magnitude faster still... no?
	well, checking with one letter didn't give any changes (bitset and char seem the same) but just change that with an OR of two letter and the char goes to 4 times slower and the bitset stays exactly the same... so do use bitsets as often as you can.
	>> t0: now/time/precise loop 10 [parse data [some #"a"]] now/time/precise - t0 == 0:00:06.389 >> t0: now/time/precise loop 10 [parse data [some [#"b" \| #"a"]]] now/time/precise - t0 == 0:00:25.496 >> b: charset "ab" >> t0: now/time/precise loop 10 [parse data [some b]] now/time/precise - t0 == 0:00:06.379
Rebolek 29-May-2007 [1893]	I'm not sure how can I test for end in case like this: r: ["a" "b" "c" end] parse "abc" [some [r/1 (r: next r)]] anybody?
Oldes 29-May-2007 [1894]	r: ["a" "b" "c" break] parse "abc" [some [r/1 (r: next r)]]
Rebolek 29-May-2007 [1895]	great, thanks
Oldes 29-May-2007 [1896]	Here are C sources for Regex engine used in MySQL5.0.9 http://leithal.cool-tools.co.uk/sourcedoc/mysql509/html/dir_000079.html
Rebolek 29-May-2007 [1897x5]	So I left tail-parse becuase of the "aa+" problem mentioned by BrianH. I wrote 'lazy-parse that works different and it seems to handle both cases "a+a" and "aa+" well.
	>> lazy-parse "aaa" [[#"a"][some #"a"][#"a"]] == true >> lazy-parse "aaa" [[#"a"][some #"a"]] == true >> lazy-parse "aaa" [[some #"a"][#"a"]]
	lazy-parse: func [string rules /local parsed?][ ;does not support ANY yet (wrong results) parsed?: false rules: copy rules parse string [ some [ b: rules/1 e: ( parsed?: true rules: next rules if empty? e [ either empty? rules [ parsed?: true rules: copy [break] ] [ e: next b if empty? e [parsed?: false] ] ] ) :e ] ] parsed? ]
	Warning: this does not suport ANY rule yet.
	Grrr, I should make more test before posting...forget last four posts
Rebolek 31-May-2007 [1902]	'regset is now on rebol.org, rest of parser is still in works.
btiffin 31-May-2007 [1903]	Rebolek...You posted the magic prime 797'th Script. Well Done! :)
Rebolek 31-May-2007 [1904]	Hm thanks, but I was aiming for 800th one... ;)
btiffin 31-May-2007 [1905]	Yep...I've been waiting... :)
Rebolek 7-Jun-2007 [1906]	just a quick idea: FORALL is implemented as mezzanine function. It calls FORSKIP which is mezzanine also. As you can see, it's probably not the fastest method. So here's the idea. Cannnot be FORALL rewritten to make it faster and is it possible to use PARSE to do this? So I tried and came up with this simple function: parall: func [ 'word body /local data ][ data: get :word parse data compose/deep [ some [(to set-word! word) any-type! (to paren! [do bind body :word])] ] ] (parall is just a shortcut for parse version of forall). this is very simple function written in five minutes and not very well checked, it needs some additional work (eg. it does not return same value as 'forall etc). So let's do some test (using Ladislav's %timblk.r): >> n: copy [] repeat i 100 [append n i] == [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4... >> time-block [forall n [b: n/1 + 1]] 0.05 == 3.623046875E-4 >> time-block [parall n [b: n/1 + 1]] 0.05 == 3.814697265625E-6 >> 3.62e-4 / 3.81e-6 == 95.0131233595801 95x faster? whooo.... and what about bigger block? >> n: copy [] repeat i 10000 [append n i] == [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 4... >> time-block [forall n [b: n/1 + 1]] 0.05 == 3.540625E-2 >> time-block [parall n [b: n/1 + 1]] 0.05 == 3.7994384765625E-6 >> 3.54e-2 / 3.8e-6 == 9315.78947368421 9000x ? omg... comments?
Gabriele 7-Jun-2007 [1907]	the devil is in the details... you need to handle break :)
Rebolek 7-Jun-2007 [1908]	oh yes, break...didn't use it for years, so almost completely forget about it :)
BrianH 7-Jun-2007 [1909x4]	Don't forget continue.
	First tip: You don't have to bind the body - it just uses the existing binding of the word.
	Netx tip: What native does forskip decompose down to? Try using that first.
	; Try against this, the forskip code with the skip part taken out forall: func [ "Evaluates a block for every value in a series." [catch throw] 'word [word!] {Word set to each position in series and changed as a result} body [block!] "Block to evaluate each time" /local orig result ][ if not any [ series? get word port? get word ] [throw make error! {forall expected word argument to refer to a series or port!}] orig: get word while [any [not tail? get word (set word orig false)]] [ set/any 'result do body set word next get word get/any 'result ] ]
Rebolek 7-Jun-2007 [1913x2]	BrianH: I know it can be done using while loop in forskip, it was more an experiment on using parse as foreach/forall kind of function.
Rebolek 7-Jun-2007 [1913x2]	to see how fast it can be
BrianH 7-Jun-2007 [1915x3]	Well, if you can figure out how to make it handle break and continue, let us know.
	Or for that matter, ports.
	If you want to test the speed of parse, replace the any-type! with a skip - the forall you are comparing it to doesn't do that test.
Chris 7-Jun-2007 [1918]	What is 'continue?
BrianH 7-Jun-2007 [1919]	Sort of like the opposite of break. It breaks one iteration of a loop and continues at the top of the next iteration. Many languages have it, as does REBOL 3.
Gabriele 8-Jun-2007 [1920]	notice that, forall was initially done that way, and converted to use forskip for simplicity (ie. avoiding having the same code twice in the source)
Rebolek 8-Jun-2007 [1921]	I tried to enclose parse in loop 1 [] and it seems to handle break. I guess you'll probably prove me wrong, Brian :) parall: func [ 'word body /loc data ][ loop 1 [ data: get :word parse data compose/deep [ some [(to set-word! word) skip (to paren! [do body])] ] ] ] re: continue - this is not r3 ;) >> n: [1 2 3 4 5] parall n [if n/1 = 4 [break/return "break"] if n/1 > 4 [print "bad"]] == "break"
Gabriele 8-Jun-2007 [1922x4]	good :) now... if Ladislav was here he'd probably find at least ten wrong things in that function. ;)
	you need to add [throw] for example, and return the last return value of the body
	but... as long as we're not replacing forall... this could be a very fast alternative to use in cases when you don't need all the features...
	and actually... i'd call parse directly in that case ;)
older newer	first last