r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Profiling] Rebol code optimisation and algorithm comparisons.

Maxim
29-Oct-2009
[36x3]
the main one being that foreach is actually the fastest series iterator!
and remove-each is 90 times faster if it always return true rather 
than false !
(probably exponentially faster as the series grows)
Maxim
30-Oct-2009
[39x2]
wow I'm already at 7kb of output text with notes and proper header 
... I haven't done half the tests yet!
did you know that FOR is 60x ... let me write that out ... SIXTY 
TIMES slower than REPEAT  !!!
Geomol
30-Oct-2009
[41]
Yeah, there's often a huge difference between a mezzanine function 
and a native. In R2, FOR is mezz, REPEAT is native.
Maxim
30-Oct-2009
[42x10]
the comment above about remove-each is false... it was a coding error.
but I'm discovering a lot of discrepancies in things like string 
vs block speed of certain loops... 
and a lot of other neat things like:

pick series 1   

is  15% faster  than

not tail? series
1000 < i: i + 1     is     10%  faster than    (i: i + 1) > 1000
and its not because of the paren... I checked that....
(i: i + 1) > 1000       same speed as      i: i + 1   i > 1000
profiling almost done... my machine has been looping series and indexes 
non-stop for about 8 hours now  :-)


be ready for the most in-depth analysis on loops ever done for R2 
 ;-)
will be nice to do the same exercise on R3
See who is the overall winner in this REBOL iterator slug fest analysis!!!
   

over 8 hours of practically non-stop cpu cycling over a wide variety 
of exit conditions, datasets and ALL iterators in rebol 2 

(loop, repeat, for, forever, foreach, remove-each, forskip, forall, 
while, until )

20 kb of data, statistics, comments and test details.

INVALUABLE data for people wanting to optimize their REBOL code.


http://www.pointillistic.com/open-REBOL/moa/scream/rebol-iterator-comparison.txt
I would a few peer reviews so I can continue to evolve this document 
in order to make it as precise/usefull for everyone.
would *like*
Steeve
30-Oct-2009
[52x3]
A thing should be noted.
repeat and foreach do a bind/copy of the evaluated block.

Even if they are the fastest loops, they should be not used too intensivly 
because they will polluate the memory.

It's particularly sensitive for graphics applications or services 
that linger in memory. 


So, that's why  I advise to use only LOOP, WHILE and UNTIL for intensive 
repeated loopings, if you don't want to blow up the memory used by 
your app.
Your bench doesn''t take in account the time taken by the GC to recycle 
 the memory.
Some functions polluate the memory some other not.
You should add the time needed to recycle after each test.
but perhaps i'm wrong, you take it in account
Maxim
30-Oct-2009
[55x5]
thanks steeve, I'm accumulating all comments

First revision of the benchmarks will include:
	-RAM stats

 -empty vs filled-up loops.  many words and a single func with the 
 same content called from the loop
	-GC de-activated tests + recycle time stats
as noted in the document test notes:

I specifically didn't do any GC control, cause I wanted, at this 
point, to see how the loops react under normal rebol execution.  

the GC normally is pretty aggressive and when you look at the tests, 
most loops roll for several hundred thousands times, so the GC will 
have kicked-in... if it can.
I did note, that there is a HUGE memory leak which occured probably 
in the actual benchmark procedure itself.


although I keep no reference to any of the data or transient test 
blocks and funcs, they are kept somewhere, and my rebol.exe process 
keeps growing and growing.... I caught it at 500MB !! but it didn't 
do any difference in actual speeds... after a few tests.... cause 
i was a bit scared.
this will also have to be investigated further (the leak)
I tried manually recycling... but it didn't do anything.
Steeve
30-Oct-2009
[60]
what do you mean ?, it does it here:


>> recycle s: stats loop 1000000 [foreach a [1 2 3][a: a]] print 
stats - s recycle print stats - s
1569504  ;memory allocated by the loop
-320          ; after the recycle
Maxim
30-Oct-2009
[61]
>> stats
== 541502965
>> recycle
>> stats
== 272784493


but that's just for about 10 % of the tests... the more tests I do 
the more ram stays "stuck" somewhere inside the interpreter.
Steeve
30-Oct-2009
[62x3]
yes, i noticed that too, it's a probem with R2
R3 is better with that
and if you activate recycle/on, does that make any difference ?
Maxim
30-Oct-2009
[65]
I think R2 GC can't determine co-dependent unused references... in 
some situations.
ex:
blk: reduce [ a: context [b: none] b: context [c: a] a/b: b ]
blk: none


in this case both a and b point to each other, and clearing blk doesn't 
tell a or b that they aren't used anymore... that is my guess.
Steeve
30-Oct-2009
[66]
yep, but your tests seem not having such cases
Maxim
30-Oct-2009
[67x3]
I reduce a block which is the test... and since foreach copy/deep, 
and there is NO word ever refering to the content of the refered 
block, I think the contents of the blocks prevent the blocks and 
the data they contain from being collected... 


the block contains words which are not GC counted as zero reference, 
so nothing gets de-allocated...

that's just my guess.
not sure I'm making sense... in how I explain it.
in any case I want to build a single script which does all the tests, 
statistics, and eventually graphics and html pages of all results 
in one (VERY) long process.

so I can better control how the tests are done and prevent automated 
test creation as I am doing now.
sqlab
30-Oct-2009
[70]
M: looks more like 50 thousands than 50 millions for repeat.
So take care of your powers of 10
Maxim
30-Oct-2009
[71x3]
as indicated in the document introductions... the repeat test is:

loop ops [repeat i 1000 []]   

so with 100000 ops taking near 2 seconds.  we end up with:

100,000 * 1000 / 2 = (50 million loops / second)
but I did miscalculate the remove-each MB/second create/scan/erase 
cycle... its not 100MB... its 10MB.
updated document.
BrianH
30-Oct-2009
[74]
Thanks for the info, Maxim. We can do a little deduction from that 
data to guess how REBOL is implemented. The scientific method :)
Maxim
30-Oct-2009
[75]
for my 3D engine, this base line test was neccessary.  I need to 
squeze every hz out of rebol... its nice to see how some exit conditions 
are 10-15% faster in some equivalebt tests...  who would have tought 
that 'PICK was faster than NOT TAIL  ?  :-/
BrianH
30-Oct-2009
[76]
For instance, 1000 > i would be faster than i < 1000 because ops 
redirect to actions, and actions dispatch based on the type of their 
first argument. If the first argument is a literal value, the action 
of that type can be called directly. If it is a referenced value, 
it woulkd have to be dereferenced first, which is apparently slower.


As for PICK being faster than NOT TAIL?, that is one native compared 
to two, with interpreter overhead between the two. Low-level natives 
like PICK, NOT and TAIL? don't do much in comparison with the interpreter 
overhead. Large numbers of small operations tend to be slower than 
small numbers of large operations, if the amount of work done is 
comparable. This is why structure manipulation is faster in REBOL 
than simple math.
Maxim
30-Oct-2009
[77x2]
better described than I would put it, but those where my assumptions... 
I will include this info in my next revision of the document.
I intend to devote a whole site to these tests eventually.  with 
a very extensive and comprehensive set of test functions and statistics.
BrianH
30-Oct-2009
[79]
With the typos fixed I hope :)
Maxim
30-Oct-2009
[80x2]
you learn a lot by doing it.
with the tests I did, I think I can probably optimise liquid by at 
least 20% just by changing the loops and changing none of the algorithms 
or features.


I am about to create a second reference liquid node type. which will 
be completely compatible from the outside, but with less features 
inside.  I expect to DOUBLE its processing capacity.
BrianH
30-Oct-2009
[82x2]
Code converted from R2 to R3 tends to need reoptimizing - the balance 
of what is native vs what is mezz has changed, usually for the better. 
All of the loop functions are native now, for instance. Also, some 
natives have been converted to actions, and vice versa.
The plus side is that what you would just do, before you try to optimize, 
tends to be faster in R3 than it is in R2.
Maxim
30-Oct-2009
[84]
yeah amd with the faster fuunction calling in R3 liquid should get 
an immediate boost in speed... it does A LOT of tiny function calls 
to manage the graph as if they where first level types with accessors 
instead of using a controler which loops over the graphs and limits 
flexibility.
Maxim
17-May-2010
[85]
.