• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r4wp

[Rebol School] REBOL School

BrianH
9-Aug-2012
[769]
You can get rid of this line too for a slight speedup:
				end pos2: emit break |

It's a leftover when I was trying to work around the bug in PARSE.
Maxim
9-Aug-2012
[770]
providing an optional output buffer  (which the user can pre-allocate 
to some ideal size)  would make a VERY big difference on large inputs.


usually, when it goes into the hundreds of thousands, repetitive 
series re-allocation on growing mutable series,  will kill any kind 
of optimisation you can dream of. 


rle: func [s [series!]   /into out  [block!]  /local out emit pos1 
pos2 cont][
	out: any [ out     make block! 2 ]
 ...
]


this is especially effective on repetitive calls to the above function 
and using clear on the given buffer so that it auto-grows to an optimal 
size and is fast on later calls.


just today, I was doing some encryption benchmarking and when I hit 
strings larger than 1MB it was taking several minutes... thats until 
I realized that it was my dataset generator (a looped string insert) 
which was taking 98% of the cpu time.  !
BrianH
9-Aug-2012
[771x3]
If you follow the /into option standard you can do chained calls 
to RLE too:


rle: func [s [series!] /into out [any-block!] /local emit pos1 pos2 
cont][
	unless into [out: make block! 2]

 emit: [(out: insert/only insert :out offset? :pos1 :pos2 first :pos1)]
	parse/case/all :s pick [[
		any [pos1: skip (cont: first :pos1) any cont pos2: emit]
	] [
		any [
			pos1: unset! any unset! pos2: emit |
			pos1: skip some [
				pos2: unset! :pos2 emit break |
				pos2: skip (

     cont: unless strict-equal? first :pos1 first :pos2 [[end skip]]
				) cont |
				pos2: emit break
			]
		]
	]] any-string? :s
	either into [:out] [head :out]
]
>> head rle/into "ddeeee" rle/into "aaabbc" make block! 10
== [3 #"a" 2 #"b" 1 #"c" 2 #"d" 4 #"e"]
Functions with /into work like INSERT when chained together. This 
works with other block types too, though OFFSET? will be slow with 
the list! type.
Maxim
9-Aug-2012
[774]
now that's a pretty nice RLE encoder.     :-)
BrianH
9-Aug-2012
[775x3]
R3 version, same /into option:


rle: funct ["Run length encode" s [series!] /into output [any-block!]] 
[
	unless into [output: make block! 2] x: none

 r: either any-block? :s [qr: copy [quote 1] [(qr/2: :x) any qr]] 
 [[any x]]
	parse/case :s [any [pos1: set x skip r pos2: (

  output: reduce/into [subtract index? :pos2 index? :pos1 :x] :output
	)]]
	either into [:output] [head :output]
]
Sorry, same unset problems, have to use POKE:


rle: funct ["Run length encode" s [series!] /into output [any-block!]] 
[
	unless into [output: make block! 2] x: none

 r: either any-block? :s [qr: copy [quote 1] [(poke qr 2 :x) any qr]] 
 [[any x]]
	parse/case :s [any [pos1: set x skip r pos2: (

  output: reduce/into [subtract index? :pos2 index? :pos1 :x] :output
	)]]
	either into [:output] [head :output]
]
>> mold/all rle reduce [() () 'a () 1]
== "[2 #[unset!] 1 a 1 #[unset!] 1 1]"
Endo
10-Aug-2012
[778x3]
It is what I like about this community :)

I knew that when I write a RLE function, BrianH will come up a much 
better version. Doc and others joined as well and now we have a very 
good function. Just like the CSV tools. Thanks.
Ehm.. what about the decoder? how do I decode unset! values?
I was using somthing like:

decode-rle: func [b /local r] [r: copy [] foreach [x y] b [loop x 
[append r y]]]
decode-rle: func [b /local r i] [

 i: 0  r: make block! foreach [x y] b [i: i + x] ;better for big blocks?
 foreach [x y] b [loop x [append r y]]
]
BrianH
10-Aug-2012
[781x3]
In mezzanine style:

decode-rle: func [
	"Decode a run length encoded block"
	rle [any-block!] "Block of [integer value]"

 /into "Insert into a buffer instead (returns position after insert)"
	output [series!] "The output buffer (modified)"
	/local x
] [
	unless into [
		x: 0  foreach [i v] :rle [x: x + :i]  output: make block! x
	]

 foreach [i v] :rle [output: insert/only/dup :output get/any 'v :i]
	either into [:output] [head :output]
]


Instead of testing for strict format compliance of the input block, 
it uses get-words to keep people from sneaking in functions and then 
passes the length value to + and INSERT/dup, counting on the type 
tests of those functions to do the screening for us.
You're right, having the make block! take the foreach expression 
as a parameter is safe; I forgot that make block! can take none as 
a parameter.
That should work in R3 as well. Though FORSKIP might be faster than 
FOREACH in R3, the simplicity of the code might be worth it.
Maxim
10-Aug-2012
[784]
I like the detail of using  :i   to prevent function hacking.   I 
should use it more often.
BrianH
10-Aug-2012
[785x5]
Trick I picked up when securing the mezzanines. It's slightly faster 
to evaluate too since it does less work.
The reason I use :output there isn't to prevent function hacking, 
it's to prevent converting lit-path! values to the path! type.
Sometimes you want to allow someone to pass in functions and then 
let them evaluate, as long as you have a good semantic model for 
what is supposed to happen and are careful about how you call them. 
The ARRAY, EXTRACT and REPLACE functions in R3 and R2 2.7.7+ are 
a good example of this.
Slightly more optimal version for R3, taking advantage of how get-words 
and get-paths mean GET/any, and how FORSKIP is faster than FOREACH:
decode-rle: func [
	"Decode a run length encoded block"
	rle [any-block!] "Block of [integer value]"

 /into "Insert into a buffer instead (returns position after insert)"
	output [series!] "The output buffer (modified)"
	/local x
] [
	unless into [
		x: 0  output: make block! forskip rle 2 [x: x + :rle/1]
	]
	forskip rle 2 [output: insert/only/dup :output :rle/2 :rle/1]
	either into [:output] [head :output]
]
Maxim
10-Aug-2012
[790]
these would be nice funcs to add to mezz in R3 and R2-forwards
BrianH
10-Aug-2012
[791x2]
Darn, just found a bug in ARRAY for R2 and R3. Litwords are converted 
to words and litpaths are converted to paths.
This is so obscure that I doubt it has affected any existing code 
though.
Arnold
10-Aug-2012
[793x2]
Discovered that d: 1.1.1 then d/1/2: 0 and d:/1/3: 0 and then d/1/2: 
d/1/3: 1 results in d == 1.1.0 ??

This keeps me just inches away from releasing my script before my 
holiday.
Only thing to add is 1 small function to reduce the moves when the 
king is under attack. I discovered some weird VID behaviour too where 
alert boxes have strange formats they inherited from earlier defines 
fields.
Steeve
10-Aug-2012
[795]
An alternative for R3 (strings and blocks)
rle: func [s /local p e o][
	o: copy []
	parse/case s [
		any [

   p: skip any [e: if (p/1 == e/1) skip] (repend o [offset? p e p/1])
		]
	]
	o
]
BrianH
11-Aug-2012
[796x9]
Steeve, that's basically the same as my R2 RLE's block rule, but 
with the IF workaround replaced with IF. It has a few gotchas:
- Executes function values in block data
- Doesn't handle unset! or error! values

- Converts lit-paths to paths and lit-words to words before comparison 
and again before putting in the output.
- Lots of intermediate block creation overhead

- Considers bindings of words when comparing them, not just case-sensitive 
spelling


The first 3 can be handled by using :p/1 and :e/1 instead of p/1 
and e/1, and the fourth by using REDUCE/into instead of REPEND. The 
last one can't be handled by any built-in function or operator in 
R3 (see http://issue.cc/r3/1834for details) but you could do a combination 
of functions and operators to get case-sensitive comparison without 
considering bindings. PARSE/case's QUOTE operation is the fastest 
method for doing that at the moment.


Nice job on neatly bypassing the relaxed bounds checking of R3 blocks. 
Though the if (p/1 == e/1) would succeed if p/1 is none and e is 
at the end of the block, the skip would still fail. That trick saves 
one e: operation.
R3's == operator handles unset and error values better than R2's 
though, which is why the explicit unset! testing in the rule can 
be removed.
Having a strict line of progression for the R3 equalities turned 
out to be a bad idea, since the binding check seems to be tripping 
up case checks.
Unfortunately, R3 development was put on hold before that could be 
fixed.
If you *want* to consider binding when doing your comparisons, perhaps 
for more lossless in-memory compression, then Steeve's IF == method 
is the way to go. If you want true lossless compression then you 
could even use =? to make sure that only runs of exact references 
compress.
The advantages of == or =? comparison over PARSE QUOTE would be lost 
if you serialize the data and save it to a file or send it over a 
network. REBOL syntax doesn't keep track of those distinctions.
The PARSE IF method does let you add a /compare function option though, 
so you can be as specific as you want. Instead of if (:p/1 == :e/1) 
you would do if (apply :f [:p/1 :e/1]) then pass :== or :strict-equal? 
as a parameter..
Here's a version of my last one above, but with Steeve's trick adapted 
to make a /compare option. It defaults to its old case-sensitive 
behavior.

rle: func [
	"Run length encode to series of [length value]"
	s [series!] "The series to encode"

 /into {Insert into a buffer instead (returns position after insert)}
	output [any-block!] "The output buffer (modified)"
	/compare "Comparator function for equvilance"
	comparator [any-function!]
	/local x r qr b e
] [
	unless into [output: make block! 2] x: none
	r: case [
		compare [[any [e: if (apply :comparator [:x :e/1]) skip]]]
		any-string? :s [[any x]]
		'else [qr: copy [quote 1] [(poke qr 2 :x) any qr]
	]
	parse/case :s [any [b: set x skip r e: (
		output: reduce/into [offset? :b :e :x] :output
	)]]
	either into [:output] [head :output]
]
Whoops, forgot a bracket:

rle: func [
	"Run length encode to series of [length value]"
	s [series!] "The series to encode"

 /into {Insert into a buffer instead (returns position after insert)}
	output [any-block!] "The output buffer (modified)"
	/compare "Comparator function for equvilance"
	comparator [any-function!]
	/local x r qr b e
] [
	unless into [output: make block! 2] x: none
	r: case [
		compare [[any [e: if (apply :comparator [:x :e/1]) skip]]]
		any-string? :s [[any x]]
		'else [qr: copy [quote 1] [(poke qr 2 :x) any qr]]
	]
	parse/case :s [any [b: set x skip r e: (
		output: reduce/into [offset? :b :e :x] :output
	)]]
	either into [:output] [head :output]
]
Steeve
11-Aug-2012
[805]
There's no need to store lengths of 1 in the output.
Length of 1 can be infered during decoding.
The compression ratio woulb be better.
BrianH
11-Aug-2012
[806x2]
Length of 1 can't be inferred in the decoding, not for blocks.
>> rle [1 2 2 3 3 3 4 4 4 4 5 5 5 5 5 6 6 6 6 6 6]
== [1 1 2 2 3 3 4 4 5 5 6 6]
Unless you make it so it treats integers specially. This would slow 
down the encoder and decoder, but reduce the compressed size.
Steeve
11-Aug-2012
[808]
and... you can't compress numbers anymore... forget it
BrianH
11-Aug-2012
[809]
You could reduce the compressed size of a string-specific RLE by 
putting runs of singletons into strings, like this:
>> rle "Hello World!"
== ["He" 2 #"l" "o World!"]
Steeve
11-Aug-2012
[810x2]
LZ77 could replace RLE. It would do RLE + patterns compression
I don't think it would be hard to code with parse
BrianH
11-Aug-2012
[812x2]
Agreed. Not that many repetive runs of characters in string data, 
so a better compression method would be preferable.
RLE is better for image data. Any takers?
Steeve
11-Aug-2012
[814x2]
I've some code when I studied 8bit computer data crunchers
But I didn't use parse at that time
BrianH
11-Aug-2012
[816]
RLE might help for binary data too, including that reduced encoding 
I mentioned for strings above.
Sujoy
13-Aug-2012
[817]
Thanks for that Ladislav (median calculation)
GrahamC
14-Aug-2012
[818]
>> do http://reb4.me/r/altjson.r
connecting to: reb4.me
Script: "REBOL <-> JSON" (15-Jul-2011)
>> j: to-json make object! [ b: none ]
== {{"b":null}}