• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r4wp

[Rebol School] REBOL School

Endo
9-Aug-2012
[760]
Here is the benchmark results (execution time for 1.000.000 calls)


>> benchmark [rle [a a a b b c A A a a]] ;BrianH (the old one, not 
the crashing ones)
== 0:00:29.765

>> benchmark [rle [a a a b b c A A a a]] ;endo
== 0:00:32.953
DocKimbel
9-Aug-2012
[761]
Endo: you should rather bench on one long series rather than 1M times 
on a small one in order to avoid function calls overhead and get 
a more fair comparison. When I try with a 1M size string with random 
a,b,c chars, my parse solution is twice faster than the mezz one 
(Brian's one is crashing so can't test it). I was expecting a greater 
difference though.
Steeve
9-Aug-2012
[762]
... test
Sunanda
9-Aug-2012
[763]
Talking of test, I am trying to write a simple function that checks 
if a data item matches a rebol datatype, so for example:

    print is-it-a? "number?" "12.5"
    == true
    print is-it-a? "number?" "xxx"
    == false
    print is-it-a? "number?" "?"
    == false


Except my function goes bad on that third example -- it prints the 
console help text.


Any thoughts on how to check incoming values without executing them 
as code? Thanks

    is-it-a?: func [
       data-type [string!]
       value [string!]
    ][
      data-type: first load/all data-type  
      error? try [
          value: first load/all value 
          return do reduce [data-type value]
      ]
      false
    ]
Steeve
9-Aug-2012
[764]
is-a: func [f v][
	not not all [
		f: get/any load f
		any-function? :f
		f load v
	]
]
Sunanda
9-Aug-2012
[765]
Thanks Steeve, that's much more robust than my code :)

Just for info .....


....It needs some error trapping to handle un-loadable values, eg:
    >> is-a "number?" "33.e"
    ** Syntax Error: Invalid decimal -- 33.e


....And (like my code) it's not so good with 'true and 'false owing 
to the way REBOL works:
    >> is-a "logic?" "true"
    == false

But it'll do the job!
BrianH
9-Aug-2012
[766x4]
Weird, I found the bug in R2 parse that causes the crash:
>> parse "" [integer!]
== false
>> parse "" [unset!]
== true  ; should be false
Two rules it is then. This doesn't crash, and is optimized for strings 
while we're at it. It's probably slower when doing blocks than Doc's, 
but it handles all datatypes:

rle: func [s [series!] /local out emit pos1 pos2 cont][
	out: make block! 1
	emit: [(repend out [offset? :pos1 :pos2 first :pos1])]
	parse/case/all :s pick [[
		any [pos1: skip (cont: first :pos1) any cont pos2: emit]
	] [
		any [
			pos1: unset! any unset! pos2: emit |
			pos1: skip some [
				end pos2: emit break |
				pos2: unset! :pos2 emit break |
				pos2: skip (

     cont: unless strict-equal? first :pos1 first :pos2 [[end skip]]
				) cont |
				pos2: emit break
			]
		]
	]] any-string? :s
	out
]


It also works around the strict-not-equal? bug in pre-2.7.7 R2, and 
using FIRST instead of path access is another speedup in R2 (path 
access is faster in R3).
Change the repend to chained inserts and it gets noticably faster, 
due to less mezzanine overhead:

rle: func [s [series!] /local out emit pos1 pos2 cont][
	out: make block! 2

 emit: [(out: insert/only insert out offset? :pos1 :pos2 first :pos1)]
	parse/case/all :s pick [[
		any [pos1: skip (cont: first :pos1) any cont pos2: emit]
	] [
		any [
			pos1: unset! any unset! pos2: emit |
			pos1: skip some [
				end pos2: emit break |
				pos2: unset! :pos2 emit break |
				pos2: skip (

     cont: unless strict-equal? first :pos1 first :pos2 [[end skip]]
				) cont |
				pos2: emit break
			]
		]
	]] any-string? :s
	head out
]
You can get rid of this line too for a slight speedup:
				end pos2: emit break |

It's a leftover when I was trying to work around the bug in PARSE.
Maxim
9-Aug-2012
[770]
providing an optional output buffer  (which the user can pre-allocate 
to some ideal size)  would make a VERY big difference on large inputs.


usually, when it goes into the hundreds of thousands, repetitive 
series re-allocation on growing mutable series,  will kill any kind 
of optimisation you can dream of. 


rle: func [s [series!]   /into out  [block!]  /local out emit pos1 
pos2 cont][
	out: any [ out     make block! 2 ]
 ...
]


this is especially effective on repetitive calls to the above function 
and using clear on the given buffer so that it auto-grows to an optimal 
size and is fast on later calls.


just today, I was doing some encryption benchmarking and when I hit 
strings larger than 1MB it was taking several minutes... thats until 
I realized that it was my dataset generator (a looped string insert) 
which was taking 98% of the cpu time.  !
BrianH
9-Aug-2012
[771x3]
If you follow the /into option standard you can do chained calls 
to RLE too:


rle: func [s [series!] /into out [any-block!] /local emit pos1 pos2 
cont][
	unless into [out: make block! 2]

 emit: [(out: insert/only insert :out offset? :pos1 :pos2 first :pos1)]
	parse/case/all :s pick [[
		any [pos1: skip (cont: first :pos1) any cont pos2: emit]
	] [
		any [
			pos1: unset! any unset! pos2: emit |
			pos1: skip some [
				pos2: unset! :pos2 emit break |
				pos2: skip (

     cont: unless strict-equal? first :pos1 first :pos2 [[end skip]]
				) cont |
				pos2: emit break
			]
		]
	]] any-string? :s
	either into [:out] [head :out]
]
>> head rle/into "ddeeee" rle/into "aaabbc" make block! 10
== [3 #"a" 2 #"b" 1 #"c" 2 #"d" 4 #"e"]
Functions with /into work like INSERT when chained together. This 
works with other block types too, though OFFSET? will be slow with 
the list! type.
Maxim
9-Aug-2012
[774]
now that's a pretty nice RLE encoder.     :-)
BrianH
9-Aug-2012
[775x3]
R3 version, same /into option:


rle: funct ["Run length encode" s [series!] /into output [any-block!]] 
[
	unless into [output: make block! 2] x: none

 r: either any-block? :s [qr: copy [quote 1] [(qr/2: :x) any qr]] 
 [[any x]]
	parse/case :s [any [pos1: set x skip r pos2: (

  output: reduce/into [subtract index? :pos2 index? :pos1 :x] :output
	)]]
	either into [:output] [head :output]
]
Sorry, same unset problems, have to use POKE:


rle: funct ["Run length encode" s [series!] /into output [any-block!]] 
[
	unless into [output: make block! 2] x: none

 r: either any-block? :s [qr: copy [quote 1] [(poke qr 2 :x) any qr]] 
 [[any x]]
	parse/case :s [any [pos1: set x skip r pos2: (

  output: reduce/into [subtract index? :pos2 index? :pos1 :x] :output
	)]]
	either into [:output] [head :output]
]
>> mold/all rle reduce [() () 'a () 1]
== "[2 #[unset!] 1 a 1 #[unset!] 1 1]"
Endo
10-Aug-2012
[778x3]
It is what I like about this community :)

I knew that when I write a RLE function, BrianH will come up a much 
better version. Doc and others joined as well and now we have a very 
good function. Just like the CSV tools. Thanks.
Ehm.. what about the decoder? how do I decode unset! values?
I was using somthing like:

decode-rle: func [b /local r] [r: copy [] foreach [x y] b [loop x 
[append r y]]]
decode-rle: func [b /local r i] [

 i: 0  r: make block! foreach [x y] b [i: i + x] ;better for big blocks?
 foreach [x y] b [loop x [append r y]]
]
BrianH
10-Aug-2012
[781x3]
In mezzanine style:

decode-rle: func [
	"Decode a run length encoded block"
	rle [any-block!] "Block of [integer value]"

 /into "Insert into a buffer instead (returns position after insert)"
	output [series!] "The output buffer (modified)"
	/local x
] [
	unless into [
		x: 0  foreach [i v] :rle [x: x + :i]  output: make block! x
	]

 foreach [i v] :rle [output: insert/only/dup :output get/any 'v :i]
	either into [:output] [head :output]
]


Instead of testing for strict format compliance of the input block, 
it uses get-words to keep people from sneaking in functions and then 
passes the length value to + and INSERT/dup, counting on the type 
tests of those functions to do the screening for us.
You're right, having the make block! take the foreach expression 
as a parameter is safe; I forgot that make block! can take none as 
a parameter.
That should work in R3 as well. Though FORSKIP might be faster than 
FOREACH in R3, the simplicity of the code might be worth it.
Maxim
10-Aug-2012
[784]
I like the detail of using  :i   to prevent function hacking.   I 
should use it more often.
BrianH
10-Aug-2012
[785x5]
Trick I picked up when securing the mezzanines. It's slightly faster 
to evaluate too since it does less work.
The reason I use :output there isn't to prevent function hacking, 
it's to prevent converting lit-path! values to the path! type.
Sometimes you want to allow someone to pass in functions and then 
let them evaluate, as long as you have a good semantic model for 
what is supposed to happen and are careful about how you call them. 
The ARRAY, EXTRACT and REPLACE functions in R3 and R2 2.7.7+ are 
a good example of this.
Slightly more optimal version for R3, taking advantage of how get-words 
and get-paths mean GET/any, and how FORSKIP is faster than FOREACH:
decode-rle: func [
	"Decode a run length encoded block"
	rle [any-block!] "Block of [integer value]"

 /into "Insert into a buffer instead (returns position after insert)"
	output [series!] "The output buffer (modified)"
	/local x
] [
	unless into [
		x: 0  output: make block! forskip rle 2 [x: x + :rle/1]
	]
	forskip rle 2 [output: insert/only/dup :output :rle/2 :rle/1]
	either into [:output] [head :output]
]
Maxim
10-Aug-2012
[790]
these would be nice funcs to add to mezz in R3 and R2-forwards
BrianH
10-Aug-2012
[791x2]
Darn, just found a bug in ARRAY for R2 and R3. Litwords are converted 
to words and litpaths are converted to paths.
This is so obscure that I doubt it has affected any existing code 
though.
Arnold
10-Aug-2012
[793x2]
Discovered that d: 1.1.1 then d/1/2: 0 and d:/1/3: 0 and then d/1/2: 
d/1/3: 1 results in d == 1.1.0 ??

This keeps me just inches away from releasing my script before my 
holiday.
Only thing to add is 1 small function to reduce the moves when the 
king is under attack. I discovered some weird VID behaviour too where 
alert boxes have strange formats they inherited from earlier defines 
fields.
Steeve
10-Aug-2012
[795]
An alternative for R3 (strings and blocks)
rle: func [s /local p e o][
	o: copy []
	parse/case s [
		any [

   p: skip any [e: if (p/1 == e/1) skip] (repend o [offset? p e p/1])
		]
	]
	o
]
BrianH
11-Aug-2012
[796x9]
Steeve, that's basically the same as my R2 RLE's block rule, but 
with the IF workaround replaced with IF. It has a few gotchas:
- Executes function values in block data
- Doesn't handle unset! or error! values

- Converts lit-paths to paths and lit-words to words before comparison 
and again before putting in the output.
- Lots of intermediate block creation overhead

- Considers bindings of words when comparing them, not just case-sensitive 
spelling


The first 3 can be handled by using :p/1 and :e/1 instead of p/1 
and e/1, and the fourth by using REDUCE/into instead of REPEND. The 
last one can't be handled by any built-in function or operator in 
R3 (see http://issue.cc/r3/1834for details) but you could do a combination 
of functions and operators to get case-sensitive comparison without 
considering bindings. PARSE/case's QUOTE operation is the fastest 
method for doing that at the moment.


Nice job on neatly bypassing the relaxed bounds checking of R3 blocks. 
Though the if (p/1 == e/1) would succeed if p/1 is none and e is 
at the end of the block, the skip would still fail. That trick saves 
one e: operation.
R3's == operator handles unset and error values better than R2's 
though, which is why the explicit unset! testing in the rule can 
be removed.
Having a strict line of progression for the R3 equalities turned 
out to be a bad idea, since the binding check seems to be tripping 
up case checks.
Unfortunately, R3 development was put on hold before that could be 
fixed.
If you *want* to consider binding when doing your comparisons, perhaps 
for more lossless in-memory compression, then Steeve's IF == method 
is the way to go. If you want true lossless compression then you 
could even use =? to make sure that only runs of exact references 
compress.
The advantages of == or =? comparison over PARSE QUOTE would be lost 
if you serialize the data and save it to a file or send it over a 
network. REBOL syntax doesn't keep track of those distinctions.
The PARSE IF method does let you add a /compare function option though, 
so you can be as specific as you want. Instead of if (:p/1 == :e/1) 
you would do if (apply :f [:p/1 :e/1]) then pass :== or :strict-equal? 
as a parameter..
Here's a version of my last one above, but with Steeve's trick adapted 
to make a /compare option. It defaults to its old case-sensitive 
behavior.

rle: func [
	"Run length encode to series of [length value]"
	s [series!] "The series to encode"

 /into {Insert into a buffer instead (returns position after insert)}
	output [any-block!] "The output buffer (modified)"
	/compare "Comparator function for equvilance"
	comparator [any-function!]
	/local x r qr b e
] [
	unless into [output: make block! 2] x: none
	r: case [
		compare [[any [e: if (apply :comparator [:x :e/1]) skip]]]
		any-string? :s [[any x]]
		'else [qr: copy [quote 1] [(poke qr 2 :x) any qr]
	]
	parse/case :s [any [b: set x skip r e: (
		output: reduce/into [offset? :b :e :x] :output
	)]]
	either into [:output] [head :output]
]
Whoops, forgot a bracket:

rle: func [
	"Run length encode to series of [length value]"
	s [series!] "The series to encode"

 /into {Insert into a buffer instead (returns position after insert)}
	output [any-block!] "The output buffer (modified)"
	/compare "Comparator function for equvilance"
	comparator [any-function!]
	/local x r qr b e
] [
	unless into [output: make block! 2] x: none
	r: case [
		compare [[any [e: if (apply :comparator [:x :e/1]) skip]]]
		any-string? :s [[any x]]
		'else [qr: copy [quote 1] [(poke qr 2 :x) any qr]]
	]
	parse/case :s [any [b: set x skip r e: (
		output: reduce/into [offset? :b :e :x] :output
	)]]
	either into [:output] [head :output]
]
Steeve
11-Aug-2012
[805]
There's no need to store lengths of 1 in the output.
Length of 1 can be infered during decoding.
The compression ratio woulb be better.
BrianH
11-Aug-2012
[806x2]
Length of 1 can't be inferred in the decoding, not for blocks.
>> rle [1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6]
== [1 1 2 2 3 3 4 4 5 5 6 6]
Unless you make it so it treats integers specially. This would slow 
down the encoder and decoder, but reduce the compressed size.
Steeve
11-Aug-2012
[808]
and... you can't compress numbers anymore... forget it
BrianH
11-Aug-2012
[809]
You could reduce the compressed size of a string-specific RLE by 
putting runs of singletons into strings, like this:
>> rle "Hello World!"
== ["He" 2 #"l" "o World!"]