r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

Andreas
17-May-2010
[16752x4]
interestingly enough, a naive c extension is only negligibly faster
indices?-null 	 0:00:00.184183
indices?-p1 	 0:00:01.082892
indices?-p2 	 0:00:01.523015
indices?-u1 	 0:00:00.347117
indices?-u2 	 0:00:00.345846
indices?-u3 	 0:00:00.346959
indices?-ext 	 0:00:00.329520
null just creates a result block, to demonstrate that 50% of runtime 
is mem allocation for the 10m result array (so that's where one should 
really spend time optimising).


p1 is ladislav's `1 1 value` parse, p2 is pekrs `quote (value)` parse, 
u1/2/3 are the until-based versions shown above, ext is a naive C 
extension.
the kernel of the C extension:

    result = RXI_MAKE_BLOCK(series_n - series_i);
    result_n = 0;
    for (; series_i < series_n; ++series_i) {
        elem_type = RXI_GET_VALUE(series, series_i, &elem);

        if (elem.int64 == value.int64 && elem_type == value_type) {
            result_v.int64 = series_i + 1;

            RXI_SET_VALUE(result, result_n++, result_v, RXT_INTEGER);
        }
    }
Maxim
17-May-2010
[16756x2]
I've just finished my tests... I've got a keyed search func which 
returns the exact same results as feach but  20 times faster!

I'll put the whole script in the profiling group... it  has several 
dataset creations for comparison and includes a clean run-time printout.
all speeds are dependent on data... so YMMV
Paul
17-May-2010
[16758x4]
what is all this?  What are you guys testing?
what is all this?  What are you guys testing?
what is all this?  What are you guys testing?
weird, I only hit enter once.
Maxim
17-May-2010
[16762]
looking at  possible search optimising within rebol, for big data 
sets.
Paul
17-May-2010
[16763]
you means searching a block such as [1 "this" 2 "that" 3 "more"] 
 etc..?
Terry
18-May-2010
[16764]
ideally, a large block of  key/values like ["key1" "value 1" "key 
2" "value 2"] with the ability to use pattern matching on keys or 
values... but FAST
Ladislav
18-May-2010
[16765]
Terry: "foreach is the winner speed wise.. as a bonus, If i use foreach, 
I don't need the index?" - unbelievable, how you compare apples and 
oranges without noticing
Terry
18-May-2010
[16766x2]
It's all about the goal, Lad... apples, oranges.. unripened bananas... 
i don't care
I was debating the merits of Rebol to the Redis group, and they said 
the same thing.. I said "Rebol + Cheyenne" is so much faster than 
Redis + PHP + Apache.. and they said "I'm comparing apples to oranges"

What? Apples? Oranges?  It's the RESULT i'm interested in. In that 
case it's was Redis pulling 7200 values from 100,000 keys per second 
vs Rebol pulling millions per second.
Ladislav
18-May-2010
[16768x2]
Terry: "I don't care" - you should, since you are comparing speed 
of code adhering to different specifications. If you really want 
to find the fastest code for a given specification, that is not the 
way to take.
You are certainly entitled to do whatever you like, but saying "foreach 
is the winner speed wise..." is wrong, since you did not allow parse 
to do what you allowed foreach to do.
Terry
18-May-2010
[16770]
fair enough
Pekr
18-May-2010
[16771]
Interesting - "ladislav's original parse example is much faster for 
me, than pekr's "quote"-based one" - then why on my machine was it 
otherwise? What could be the technical reason?
Ladislav
18-May-2010
[16772]
different OS, different processor, maybe even different R3?
Maxim
18-May-2010
[16773]
all my tests are being done on R2, for the record.
Pekr
18-May-2010
[16774]
btw - will there be any difference between: 

result: make block! length? series
and
result: make series 0


I mean - e.g. not prealocating large enough series (make series 0) 
will be slowed down by GC constantly increasing the size?
Maxim
18-May-2010
[16775x2]
so far, preallocating too large buffer takes much more time... but 
look at my funcs... I do even better  :-)

result: clear []


which re-uses the same block over and over, unless ladislav knows 
otherwise, the memory for that block isn't released by clear, only 
its internal size is reset to 0  which is why clear is so fast AFAIK.
so it will grow to match the needs of the dataset, optimising itself 
in size within a few searches and then not re-allocating itself too 
often.
Ladislav
18-May-2010
[16777]
there are many important differences:

*make series 0 does not necessarily make a block

*regarding the allocation length - if you can estimate reliably the 
necessary length, then you are better off
Maxim
18-May-2010
[16778]
with 10 million records, I estimated my dense datasets at about 120000 
records and did a few tests, they wher MUCH slower than using 
result: clear [ ]
Ladislav
18-May-2010
[16779]
but, make block! 1000 is certainly slower than make block! 0 if it 
turns out, that you need only 10 elements, e.g.
Maxim
18-May-2010
[16780x2]
(estimated 120000 results)
thing is many times, if not most of the times, you don't need to 
copy the result as a new block, and that also saves A LOT of ram 
in my tests...


overall, about 500MB of RAM where saved by not pre-allocating large 
buffers
Ladislav
18-May-2010
[16782]
memory reuse: yes, if it is possible at all (it is like optimizing 
GC by hand), but that can be used only in special circumstances
Maxim
18-May-2010
[16783x2]
for searches its usually ok... since the result is often used only 
to cycle over and create other data.
and in any case, you can copy the result of the search, at which 
point you have a perfect size and as little wasted RAM as possible.
Pekr
18-May-2010
[16785]
Max - where do I get the dataset from, if I would try to rewrite 
your find-fast into a version using 'parse? :-) Do you generate one?
Maxim
18-May-2010
[16786x2]
look in profiling, there is a full script with verbose printing and 
everything you need, just replace the loop in one of the funcs  :-)
you can easily compare your results with the current best ... I'll 
be happy if you can beat the ultimate-find and give the exact same 
feature...

searching on any field of a record and return the whole record.
Henrik
18-May-2010
[16788]
overall, about 500MB of RAM where saved by not pre-allocating large 
buffers

 <- hmm... I thought the allocation did not necessarily mean usage, 
 only that GC is simpler, or is it different under Unix and Windows?.
Maxim
18-May-2010
[16789x2]
I use windows task manager to look at ram use... the peak was 900MB 
and average was 700MB... removing pre-allocation it went down to 
350 with peaks of  ~ 500  IIRC
linux usually has more precise RAM reports AFAIK.
Henrik
18-May-2010
[16791x2]
ok, let's say you allocate 2 GB, if you can, does Windows start to 
swap?
because if Windows only reports allocation and not actual use, then 
the task manager doesn't report true usage.
Maxim
18-May-2010
[16793x5]
the process manager reports  a few different values, current, swapped, 
peak, and some more obscure ones.
if an application allocates and reserves 2GB I really don't care 
if its only using 10mb of it... my system is clogged and its not 
the OS's fault.
though I did a special of XP install which forces the OS NEVER to 
swap... and XP tends to be MUCH smoother because of it.
(special install of XP)  playing around with some obscure registry 
keys.
though for these tests, no swapping occured.
Henrik
18-May-2010
[16798x2]
I recently watched a talk by Poul Henning Kamp, author of Varnish, 
who talked about how many people misunderstand how memory allocation 
works in modern OS'es. Since he's a FreeBSD kernel developer, he 
has some bias, but he made some interesting points in that memory 
allocation is nearly free in various unixes, but most people ignore 
that an only allocate, perhaps just enough or below what they need.
Whether this can be translated directly to REBOL, I don't know.
Maxim
18-May-2010
[16800]
problem is when you task switch, or run several RAM intensive apps... 
they do kill each other, even on unix.
Henrik
18-May-2010
[16801]
but that's because the RAM is actually used, correct?