World: r4wp
[Rebol School] REBOL School
older newer | first last |
Endo 9-Aug-2012 [760] | Here is the benchmark results (execution time for 1.000.000 calls) >> benchmark [rle [a a a b b c A A a a]] ;BrianH (the old one, not the crashing ones) == 0:00:29.765 >> benchmark [rle [a a a b b c A A a a]] ;endo == 0:00:32.953 |
DocKimbel 9-Aug-2012 [761] | Endo: you should rather bench on one long series rather than 1M times on a small one in order to avoid function calls overhead and get a more fair comparison. When I try with a 1M size string with random a,b,c chars, my parse solution is twice faster than the mezz one (Brian's one is crashing so can't test it). I was expecting a greater difference though. |
Steeve 9-Aug-2012 [762] | ... test |
Sunanda 9-Aug-2012 [763] | Talking of test, I am trying to write a simple function that checks if a data item matches a rebol datatype, so for example: print is-it-a? "number?" "12.5" == true print is-it-a? "number?" "xxx" == false print is-it-a? "number?" "?" == false Except my function goes bad on that third example -- it prints the console help text. Any thoughts on how to check incoming values without executing them as code? Thanks is-it-a?: func [ data-type [string!] value [string!] ][ data-type: first load/all data-type error? try [ value: first load/all value return do reduce [data-type value] ] false ] |
Steeve 9-Aug-2012 [764] | is-a: func [f v][ not not all [ f: get/any load f any-function? :f f load v ] ] |
Sunanda 9-Aug-2012 [765] | Thanks Steeve, that's much more robust than my code :) Just for info ..... ....It needs some error trapping to handle un-loadable values, eg: >> is-a "number?" "33.e" ** Syntax Error: Invalid decimal -- 33.e ....And (like my code) it's not so good with 'true and 'false owing to the way REBOL works: >> is-a "logic?" "true" == false But it'll do the job! |
BrianH 9-Aug-2012 [766x4] | Weird, I found the bug in R2 parse that causes the crash: >> parse "" [integer!] == false >> parse "" [unset!] == true ; should be false |
Two rules it is then. This doesn't crash, and is optimized for strings while we're at it. It's probably slower when doing blocks than Doc's, but it handles all datatypes: rle: func [s [series!] /local out emit pos1 pos2 cont][ out: make block! 1 emit: [(repend out [offset? :pos1 :pos2 first :pos1])] parse/case/all :s pick [[ any [pos1: skip (cont: first :pos1) any cont pos2: emit] ] [ any [ pos1: unset! any unset! pos2: emit | pos1: skip some [ end pos2: emit break | pos2: unset! :pos2 emit break | pos2: skip ( cont: unless strict-equal? first :pos1 first :pos2 [[end skip]] ) cont | pos2: emit break ] ] ]] any-string? :s out ] It also works around the strict-not-equal? bug in pre-2.7.7 R2, and using FIRST instead of path access is another speedup in R2 (path access is faster in R3). | |
Change the repend to chained inserts and it gets noticably faster, due to less mezzanine overhead: rle: func [s [series!] /local out emit pos1 pos2 cont][ out: make block! 2 emit: [(out: insert/only insert out offset? :pos1 :pos2 first :pos1)] parse/case/all :s pick [[ any [pos1: skip (cont: first :pos1) any cont pos2: emit] ] [ any [ pos1: unset! any unset! pos2: emit | pos1: skip some [ end pos2: emit break | pos2: unset! :pos2 emit break | pos2: skip ( cont: unless strict-equal? first :pos1 first :pos2 [[end skip]] ) cont | pos2: emit break ] ] ]] any-string? :s head out ] | |
You can get rid of this line too for a slight speedup: end pos2: emit break | It's a leftover when I was trying to work around the bug in PARSE. | |
Maxim 9-Aug-2012 [770] | providing an optional output buffer (which the user can pre-allocate to some ideal size) would make a VERY big difference on large inputs. usually, when it goes into the hundreds of thousands, repetitive series re-allocation on growing mutable series, will kill any kind of optimisation you can dream of. rle: func [s [series!] /into out [block!] /local out emit pos1 pos2 cont][ out: any [ out make block! 2 ] ... ] this is especially effective on repetitive calls to the above function and using clear on the given buffer so that it auto-grows to an optimal size and is fast on later calls. just today, I was doing some encryption benchmarking and when I hit strings larger than 1MB it was taking several minutes... thats until I realized that it was my dataset generator (a looped string insert) which was taking 98% of the cpu time. ! |
BrianH 9-Aug-2012 [771x3] | If you follow the /into option standard you can do chained calls to RLE too: rle: func [s [series!] /into out [any-block!] /local emit pos1 pos2 cont][ unless into [out: make block! 2] emit: [(out: insert/only insert :out offset? :pos1 :pos2 first :pos1)] parse/case/all :s pick [[ any [pos1: skip (cont: first :pos1) any cont pos2: emit] ] [ any [ pos1: unset! any unset! pos2: emit | pos1: skip some [ pos2: unset! :pos2 emit break | pos2: skip ( cont: unless strict-equal? first :pos1 first :pos2 [[end skip]] ) cont | pos2: emit break ] ] ]] any-string? :s either into [:out] [head :out] ] |
>> head rle/into "ddeeee" rle/into "aaabbc" make block! 10 == [3 #"a" 2 #"b" 1 #"c" 2 #"d" 4 #"e"] | |
Functions with /into work like INSERT when chained together. This works with other block types too, though OFFSET? will be slow with the list! type. | |
Maxim 9-Aug-2012 [774] | now that's a pretty nice RLE encoder. :-) |
BrianH 9-Aug-2012 [775x3] | R3 version, same /into option: rle: funct ["Run length encode" s [series!] /into output [any-block!]] [ unless into [output: make block! 2] x: none r: either any-block? :s [qr: copy [quote 1] [(qr/2: :x) any qr]] [[any x]] parse/case :s [any [pos1: set x skip r pos2: ( output: reduce/into [subtract index? :pos2 index? :pos1 :x] :output )]] either into [:output] [head :output] ] |
Sorry, same unset problems, have to use POKE: rle: funct ["Run length encode" s [series!] /into output [any-block!]] [ unless into [output: make block! 2] x: none r: either any-block? :s [qr: copy [quote 1] [(poke qr 2 :x) any qr]] [[any x]] parse/case :s [any [pos1: set x skip r pos2: ( output: reduce/into [subtract index? :pos2 index? :pos1 :x] :output )]] either into [:output] [head :output] ] | |
>> mold/all rle reduce [() () 'a () 1] == "[2 #[unset!] 1 a 1 #[unset!] 1 1]" | |
Endo 10-Aug-2012 [778x3] | It is what I like about this community :) I knew that when I write a RLE function, BrianH will come up a much better version. Doc and others joined as well and now we have a very good function. Just like the CSV tools. Thanks. |
Ehm.. what about the decoder? how do I decode unset! values? I was using somthing like: decode-rle: func [b /local r] [r: copy [] foreach [x y] b [loop x [append r y]]] | |
decode-rle: func [b /local r i] [ i: 0 r: make block! foreach [x y] b [i: i + x] ;better for big blocks? foreach [x y] b [loop x [append r y]] ] | |
BrianH 10-Aug-2012 [781x3] | In mezzanine style: decode-rle: func [ "Decode a run length encoded block" rle [any-block!] "Block of [integer value]" /into "Insert into a buffer instead (returns position after insert)" output [series!] "The output buffer (modified)" /local x ] [ unless into [ x: 0 foreach [i v] :rle [x: x + :i] output: make block! x ] foreach [i v] :rle [output: insert/only/dup :output get/any 'v :i] either into [:output] [head :output] ] Instead of testing for strict format compliance of the input block, it uses get-words to keep people from sneaking in functions and then passes the length value to + and INSERT/dup, counting on the type tests of those functions to do the screening for us. |
You're right, having the make block! take the foreach expression as a parameter is safe; I forgot that make block! can take none as a parameter. | |
That should work in R3 as well. Though FORSKIP might be faster than FOREACH in R3, the simplicity of the code might be worth it. | |
Maxim 10-Aug-2012 [784] | I like the detail of using :i to prevent function hacking. I should use it more often. |
BrianH 10-Aug-2012 [785x5] | Trick I picked up when securing the mezzanines. It's slightly faster to evaluate too since it does less work. |
The reason I use :output there isn't to prevent function hacking, it's to prevent converting lit-path! values to the path! type. | |
Sometimes you want to allow someone to pass in functions and then let them evaluate, as long as you have a good semantic model for what is supposed to happen and are careful about how you call them. The ARRAY, EXTRACT and REPLACE functions in R3 and R2 2.7.7+ are a good example of this. | |
Slightly more optimal version for R3, taking advantage of how get-words and get-paths mean GET/any, and how FORSKIP is faster than FOREACH: | |
decode-rle: func [ "Decode a run length encoded block" rle [any-block!] "Block of [integer value]" /into "Insert into a buffer instead (returns position after insert)" output [series!] "The output buffer (modified)" /local x ] [ unless into [ x: 0 output: make block! forskip rle 2 [x: x + :rle/1] ] forskip rle 2 [output: insert/only/dup :output :rle/2 :rle/1] either into [:output] [head :output] ] | |
Maxim 10-Aug-2012 [790] | these would be nice funcs to add to mezz in R3 and R2-forwards |
BrianH 10-Aug-2012 [791x2] | Darn, just found a bug in ARRAY for R2 and R3. Litwords are converted to words and litpaths are converted to paths. |
This is so obscure that I doubt it has affected any existing code though. | |
Arnold 10-Aug-2012 [793x2] | Discovered that d: 1.1.1 then d/1/2: 0 and d:/1/3: 0 and then d/1/2: d/1/3: 1 results in d == 1.1.0 ?? This keeps me just inches away from releasing my script before my holiday. |
Only thing to add is 1 small function to reduce the moves when the king is under attack. I discovered some weird VID behaviour too where alert boxes have strange formats they inherited from earlier defines fields. | |
Steeve 10-Aug-2012 [795] | An alternative for R3 (strings and blocks) rle: func [s /local p e o][ o: copy [] parse/case s [ any [ p: skip any [e: if (p/1 == e/1) skip] (repend o [offset? p e p/1]) ] ] o ] |
BrianH 11-Aug-2012 [796x9] | Steeve, that's basically the same as my R2 RLE's block rule, but with the IF workaround replaced with IF. It has a few gotchas: - Executes function values in block data - Doesn't handle unset! or error! values - Converts lit-paths to paths and lit-words to words before comparison and again before putting in the output. - Lots of intermediate block creation overhead - Considers bindings of words when comparing them, not just case-sensitive spelling The first 3 can be handled by using :p/1 and :e/1 instead of p/1 and e/1, and the fourth by using REDUCE/into instead of REPEND. The last one can't be handled by any built-in function or operator in R3 (see http://issue.cc/r3/1834for details) but you could do a combination of functions and operators to get case-sensitive comparison without considering bindings. PARSE/case's QUOTE operation is the fastest method for doing that at the moment. Nice job on neatly bypassing the relaxed bounds checking of R3 blocks. Though the if (p/1 == e/1) would succeed if p/1 is none and e is at the end of the block, the skip would still fail. That trick saves one e: operation. |
R3's == operator handles unset and error values better than R2's though, which is why the explicit unset! testing in the rule can be removed. | |
Having a strict line of progression for the R3 equalities turned out to be a bad idea, since the binding check seems to be tripping up case checks. | |
Unfortunately, R3 development was put on hold before that could be fixed. | |
If you *want* to consider binding when doing your comparisons, perhaps for more lossless in-memory compression, then Steeve's IF == method is the way to go. If you want true lossless compression then you could even use =? to make sure that only runs of exact references compress. | |
The advantages of == or =? comparison over PARSE QUOTE would be lost if you serialize the data and save it to a file or send it over a network. REBOL syntax doesn't keep track of those distinctions. | |
The PARSE IF method does let you add a /compare function option though, so you can be as specific as you want. Instead of if (:p/1 == :e/1) you would do if (apply :f [:p/1 :e/1]) then pass :== or :strict-equal? as a parameter.. | |
Here's a version of my last one above, but with Steeve's trick adapted to make a /compare option. It defaults to its old case-sensitive behavior. rle: func [ "Run length encode to series of [length value]" s [series!] "The series to encode" /into {Insert into a buffer instead (returns position after insert)} output [any-block!] "The output buffer (modified)" /compare "Comparator function for equvilance" comparator [any-function!] /local x r qr b e ] [ unless into [output: make block! 2] x: none r: case [ compare [[any [e: if (apply :comparator [:x :e/1]) skip]]] any-string? :s [[any x]] 'else [qr: copy [quote 1] [(poke qr 2 :x) any qr] ] parse/case :s [any [b: set x skip r e: ( output: reduce/into [offset? :b :e :x] :output )]] either into [:output] [head :output] ] | |
Whoops, forgot a bracket: rle: func [ "Run length encode to series of [length value]" s [series!] "The series to encode" /into {Insert into a buffer instead (returns position after insert)} output [any-block!] "The output buffer (modified)" /compare "Comparator function for equvilance" comparator [any-function!] /local x r qr b e ] [ unless into [output: make block! 2] x: none r: case [ compare [[any [e: if (apply :comparator [:x :e/1]) skip]]] any-string? :s [[any x]] 'else [qr: copy [quote 1] [(poke qr 2 :x) any qr]] ] parse/case :s [any [b: set x skip r e: ( output: reduce/into [offset? :b :e :x] :output )]] either into [:output] [head :output] ] | |
Steeve 11-Aug-2012 [805] | There's no need to store lengths of 1 in the output. Length of 1 can be infered during decoding. The compression ratio woulb be better. |
BrianH 11-Aug-2012 [806x2] | Length of 1 can't be inferred in the decoding, not for blocks. >> rle [1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6] == [1 1 2 2 3 3 4 4 5 5 6 6] |
Unless you make it so it treats integers specially. This would slow down the encoder and decoder, but reduce the compressed size. | |
Steeve 11-Aug-2012 [808] | and... you can't compress numbers anymore... forget it |
BrianH 11-Aug-2012 [809] | You could reduce the compressed size of a string-specific RLE by putting runs of singletons into strings, like this: >> rle "Hello World!" == ["He" 2 #"l" "o World!"] |
older newer | first last |