World: r4wp

[Rebol School] REBOL School

older newer	first last
Endo 9-Aug-2012 [760]	Here is the benchmark results (execution time for 1.000.000 calls) >> benchmark [rle [a a a b b c A A a a]] ;BrianH (the old one, not the crashing ones) == 0:00:29.765 >> benchmark [rle [a a a b b c A A a a]] ;endo == 0:00:32.953
DocKimbel 9-Aug-2012 [761]	Endo: you should rather bench on one long series rather than 1M times on a small one in order to avoid function calls overhead and get a more fair comparison. When I try with a 1M size string with random a,b,c chars, my parse solution is twice faster than the mezz one (Brian's one is crashing so can't test it). I was expecting a greater difference though.
Steeve 9-Aug-2012 [762]	... test
Sunanda 9-Aug-2012 [763]	Talking of test, I am trying to write a simple function that checks if a data item matches a rebol datatype, so for example: print is-it-a? "number?" "12.5" == true print is-it-a? "number?" "xxx" == false print is-it-a? "number?" "?" == false Except my function goes bad on that third example -- it prints the console help text. Any thoughts on how to check incoming values without executing them as code? Thanks is-it-a?: func [ data-type [string!] value [string!] ][ data-type: first load/all data-type error? try [ value: first load/all value return do reduce [data-type value] ] false ]
Steeve 9-Aug-2012 [764]	is-a: func [f v][ not not all [ f: get/any load f any-function? :f f load v ] ]
Sunanda 9-Aug-2012 [765]	Thanks Steeve, that's much more robust than my code :) Just for info ..... ....It needs some error trapping to handle un-loadable values, eg: >> is-a "number?" "33.e" ** Syntax Error: Invalid decimal -- 33.e ....And (like my code) it's not so good with 'true and 'false owing to the way REBOL works: >> is-a "logic?" "true" == false But it'll do the job!
BrianH 9-Aug-2012 [766x4]	Weird, I found the bug in R2 parse that causes the crash: >> parse "" [integer!] == false >> parse "" [unset!] == true ; should be false
	Two rules it is then. This doesn't crash, and is optimized for strings while we're at it. It's probably slower when doing blocks than Doc's, but it handles all datatypes: rle: func [s [series!] /local out emit pos1 pos2 cont][ out: make block! 1 emit: [(repend out [offset? :pos1 :pos2 first :pos1])] parse/case/all :s pick [[ any [pos1: skip (cont: first :pos1) any cont pos2: emit] ] [ any [ pos1: unset! any unset! pos2: emit \| pos1: skip some [ end pos2: emit break \| pos2: unset! :pos2 emit break \| pos2: skip ( cont: unless strict-equal? first :pos1 first :pos2 [[end skip]] ) cont \| pos2: emit break ] ] ]] any-string? :s out ] It also works around the strict-not-equal? bug in pre-2.7.7 R2, and using FIRST instead of path access is another speedup in R2 (path access is faster in R3).
	Change the repend to chained inserts and it gets noticably faster, due to less mezzanine overhead: rle: func [s [series!] /local out emit pos1 pos2 cont][ out: make block! 2 emit: [(out: insert/only insert out offset? :pos1 :pos2 first :pos1)] parse/case/all :s pick [[ any [pos1: skip (cont: first :pos1) any cont pos2: emit] ] [ any [ pos1: unset! any unset! pos2: emit \| pos1: skip some [ end pos2: emit break \| pos2: unset! :pos2 emit break \| pos2: skip ( cont: unless strict-equal? first :pos1 first :pos2 [[end skip]] ) cont \| pos2: emit break ] ] ]] any-string? :s head out ]
	You can get rid of this line too for a slight speedup: end pos2: emit break \| It's a leftover when I was trying to work around the bug in PARSE.
Maxim 9-Aug-2012 [770]	providing an optional output buffer (which the user can pre-allocate to some ideal size) would make a VERY big difference on large inputs. usually, when it goes into the hundreds of thousands, repetitive series re-allocation on growing mutable series, will kill any kind of optimisation you can dream of. rle: func [s [series!] /into out [block!] /local out emit pos1 pos2 cont][ out: any [ out make block! 2 ] ... ] this is especially effective on repetitive calls to the above function and using clear on the given buffer so that it auto-grows to an optimal size and is fast on later calls. just today, I was doing some encryption benchmarking and when I hit strings larger than 1MB it was taking several minutes... thats until I realized that it was my dataset generator (a looped string insert) which was taking 98% of the cpu time. !
BrianH 9-Aug-2012 [771x3]	If you follow the /into option standard you can do chained calls to RLE too: rle: func [s [series!] /into out [any-block!] /local emit pos1 pos2 cont][ unless into [out: make block! 2] emit: [(out: insert/only insert :out offset? :pos1 :pos2 first :pos1)] parse/case/all :s pick [[ any [pos1: skip (cont: first :pos1) any cont pos2: emit] ] [ any [ pos1: unset! any unset! pos2: emit \| pos1: skip some [ pos2: unset! :pos2 emit break \| pos2: skip ( cont: unless strict-equal? first :pos1 first :pos2 [[end skip]] ) cont \| pos2: emit break ] ] ]] any-string? :s either into [:out] [head :out] ]
	>> head rle/into "ddeeee" rle/into "aaabbc" make block! 10 == [3 #"a" 2 #"b" 1 #"c" 2 #"d" 4 #"e"]
	Functions with /into work like INSERT when chained together. This works with other block types too, though OFFSET? will be slow with the list! type.
Maxim 9-Aug-2012 [774]	now that's a pretty nice RLE encoder. :-)
BrianH 9-Aug-2012 [775x3]	R3 version, same /into option: rle: funct ["Run length encode" s [series!] /into output [any-block!]] [ unless into [output: make block! 2] x: none r: either any-block? :s [qr: copy [quote 1] [(qr/2: :x) any qr]] [[any x]] parse/case :s [any [pos1: set x skip r pos2: ( output: reduce/into [subtract index? :pos2 index? :pos1 :x] :output )]] either into [:output] [head :output] ]
	Sorry, same unset problems, have to use POKE: rle: funct ["Run length encode" s [series!] /into output [any-block!]] [ unless into [output: make block! 2] x: none r: either any-block? :s [qr: copy [quote 1] [(poke qr 2 :x) any qr]] [[any x]] parse/case :s [any [pos1: set x skip r pos2: ( output: reduce/into [subtract index? :pos2 index? :pos1 :x] :output )]] either into [:output] [head :output] ]
	>> mold/all rle reduce [() () 'a () 1] == "[2 #[unset!] 1 a 1 #[unset!] 1 1]"
Endo 10-Aug-2012 [778x3]	It is what I like about this community :) I knew that when I write a RLE function, BrianH will come up a much better version. Doc and others joined as well and now we have a very good function. Just like the CSV tools. Thanks.
	Ehm.. what about the decoder? how do I decode unset! values? I was using somthing like: decode-rle: func [b /local r] [r: copy [] foreach [x y] b [loop x [append r y]]]
	decode-rle: func [b /local r i] [ i: 0 r: make block! foreach [x y] b [i: i + x] ;better for big blocks? foreach [x y] b [loop x [append r y]] ]
BrianH 10-Aug-2012 [781x3]	In mezzanine style: decode-rle: func [ "Decode a run length encoded block" rle [any-block!] "Block of [integer value]" /into "Insert into a buffer instead (returns position after insert)" output [series!] "The output buffer (modified)" /local x ] [ unless into [ x: 0 foreach [i v] :rle [x: x + :i] output: make block! x ] foreach [i v] :rle [output: insert/only/dup :output get/any 'v :i] either into [:output] [head :output] ] Instead of testing for strict format compliance of the input block, it uses get-words to keep people from sneaking in functions and then passes the length value to + and INSERT/dup, counting on the type tests of those functions to do the screening for us.
	You're right, having the make block! take the foreach expression as a parameter is safe; I forgot that make block! can take none as a parameter.
	That should work in R3 as well. Though FORSKIP might be faster than FOREACH in R3, the simplicity of the code might be worth it.
Maxim 10-Aug-2012 [784]	I like the detail of using :i to prevent function hacking. I should use it more often.
BrianH 10-Aug-2012 [785x5]	Trick I picked up when securing the mezzanines. It's slightly faster to evaluate too since it does less work.
	The reason I use :output there isn't to prevent function hacking, it's to prevent converting lit-path! values to the path! type.
	Sometimes you want to allow someone to pass in functions and then let them evaluate, as long as you have a good semantic model for what is supposed to happen and are careful about how you call them. The ARRAY, EXTRACT and REPLACE functions in R3 and R2 2.7.7+ are a good example of this.
	Slightly more optimal version for R3, taking advantage of how get-words and get-paths mean GET/any, and how FORSKIP is faster than FOREACH:
	decode-rle: func [ "Decode a run length encoded block" rle [any-block!] "Block of [integer value]" /into "Insert into a buffer instead (returns position after insert)" output [series!] "The output buffer (modified)" /local x ] [ unless into [ x: 0 output: make block! forskip rle 2 [x: x + :rle/1] ] forskip rle 2 [output: insert/only/dup :output :rle/2 :rle/1] either into [:output] [head :output] ]
Maxim 10-Aug-2012 [790]	these would be nice funcs to add to mezz in R3 and R2-forwards
BrianH 10-Aug-2012 [791x2]	Darn, just found a bug in ARRAY for R2 and R3. Litwords are converted to words and litpaths are converted to paths.
BrianH 10-Aug-2012 [791x2]	This is so obscure that I doubt it has affected any existing code though.
Arnold 10-Aug-2012 [793x2]	Discovered that d: 1.1.1 then d/1/2: 0 and d:/1/3: 0 and then d/1/2: d/1/3: 1 results in d == 1.1.0 ?? This keeps me just inches away from releasing my script before my holiday.
Arnold 10-Aug-2012 [793x2]	Only thing to add is 1 small function to reduce the moves when the king is under attack. I discovered some weird VID behaviour too where alert boxes have strange formats they inherited from earlier defines fields.
Steeve 10-Aug-2012 [795]	An alternative for R3 (strings and blocks) rle: func [s /local p e o][ o: copy [] parse/case s [ any [ p: skip any [e: if (p/1 == e/1) skip] (repend o [offset? p e p/1]) ] ] o ]
BrianH 11-Aug-2012 [796x9]	Steeve, that's basically the same as my R2 RLE's block rule, but with the IF workaround replaced with IF. It has a few gotchas: - Executes function values in block data - Doesn't handle unset! or error! values - Converts lit-paths to paths and lit-words to words before comparison and again before putting in the output. - Lots of intermediate block creation overhead - Considers bindings of words when comparing them, not just case-sensitive spelling The first 3 can be handled by using :p/1 and :e/1 instead of p/1 and e/1, and the fourth by using REDUCE/into instead of REPEND. The last one can't be handled by any built-in function or operator in R3 (see http://issue.cc/r3/1834for details) but you could do a combination of functions and operators to get case-sensitive comparison without considering bindings. PARSE/case's QUOTE operation is the fastest method for doing that at the moment. Nice job on neatly bypassing the relaxed bounds checking of R3 blocks. Though the if (p/1 == e/1) would succeed if p/1 is none and e is at the end of the block, the skip would still fail. That trick saves one e: operation.
	R3's == operator handles unset and error values better than R2's though, which is why the explicit unset! testing in the rule can be removed.
	Having a strict line of progression for the R3 equalities turned out to be a bad idea, since the binding check seems to be tripping up case checks.
	Unfortunately, R3 development was put on hold before that could be fixed.
	If you want to consider binding when doing your comparisons, perhaps for more lossless in-memory compression, then Steeve's IF == method is the way to go. If you want true lossless compression then you could even use =? to make sure that only runs of exact references compress.
	The advantages of == or =? comparison over PARSE QUOTE would be lost if you serialize the data and save it to a file or send it over a network. REBOL syntax doesn't keep track of those distinctions.
	The PARSE IF method does let you add a /compare function option though, so you can be as specific as you want. Instead of if (:p/1 == :e/1) you would do if (apply :f [:p/1 :e/1]) then pass :== or :strict-equal? as a parameter..
	Here's a version of my last one above, but with Steeve's trick adapted to make a /compare option. It defaults to its old case-sensitive behavior. rle: func [ "Run length encode to series of [length value]" s [series!] "The series to encode" /into {Insert into a buffer instead (returns position after insert)} output [any-block!] "The output buffer (modified)" /compare "Comparator function for equvilance" comparator [any-function!] /local x r qr b e ] [ unless into [output: make block! 2] x: none r: case [ compare [[any [e: if (apply :comparator [:x :e/1]) skip]]] any-string? :s [[any x]] 'else [qr: copy [quote 1] [(poke qr 2 :x) any qr] ] parse/case :s [any [b: set x skip r e: ( output: reduce/into [offset? :b :e :x] :output )]] either into [:output] [head :output] ]
	Whoops, forgot a bracket: rle: func [ "Run length encode to series of [length value]" s [series!] "The series to encode" /into {Insert into a buffer instead (returns position after insert)} output [any-block!] "The output buffer (modified)" /compare "Comparator function for equvilance" comparator [any-function!] /local x r qr b e ] [ unless into [output: make block! 2] x: none r: case [ compare [[any [e: if (apply :comparator [:x :e/1]) skip]]] any-string? :s [[any x]] 'else [qr: copy [quote 1] [(poke qr 2 :x) any qr]] ] parse/case :s [any [b: set x skip r e: ( output: reduce/into [offset? :b :e :x] :output )]] either into [:output] [head :output] ]
Steeve 11-Aug-2012 [805]	There's no need to store lengths of 1 in the output. Length of 1 can be infered during decoding. The compression ratio woulb be better.
BrianH 11-Aug-2012 [806x2]	Length of 1 can't be inferred in the decoding, not for blocks. >> rle [1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6] == [1 1 2 2 3 3 4 4 5 5 6 6]
BrianH 11-Aug-2012 [806x2]	Unless you make it so it treats integers specially. This would slow down the encoder and decoder, but reduce the compressed size.
Steeve 11-Aug-2012 [808]	and... you can't compress numbers anymore... forget it
BrianH 11-Aug-2012 [809]	You could reduce the compressed size of a string-specific RLE by putting runs of singletons into strings, like this: >> rle "Hello World!" == ["He" 2 #"l" "o World!"]
older newer	first last