World: r3wp
[Core] Discuss core issues
older newer | first last |
Andreas 17-May-2010 [16752x4] | interestingly enough, a naive c extension is only negligibly faster |
indices?-null 0:00:00.184183 indices?-p1 0:00:01.082892 indices?-p2 0:00:01.523015 indices?-u1 0:00:00.347117 indices?-u2 0:00:00.345846 indices?-u3 0:00:00.346959 indices?-ext 0:00:00.329520 | |
null just creates a result block, to demonstrate that 50% of runtime is mem allocation for the 10m result array (so that's where one should really spend time optimising). p1 is ladislav's `1 1 value` parse, p2 is pekrs `quote (value)` parse, u1/2/3 are the until-based versions shown above, ext is a naive C extension. | |
the kernel of the C extension: result = RXI_MAKE_BLOCK(series_n - series_i); result_n = 0; for (; series_i < series_n; ++series_i) { elem_type = RXI_GET_VALUE(series, series_i, &elem); if (elem.int64 == value.int64 && elem_type == value_type) { result_v.int64 = series_i + 1; RXI_SET_VALUE(result, result_n++, result_v, RXT_INTEGER); } } | |
Maxim 17-May-2010 [16756x2] | I've just finished my tests... I've got a keyed search func which returns the exact same results as feach but 20 times faster! I'll put the whole script in the profiling group... it has several dataset creations for comparison and includes a clean run-time printout. |
all speeds are dependent on data... so YMMV | |
Paul 17-May-2010 [16758x4] | what is all this? What are you guys testing? |
what is all this? What are you guys testing? | |
what is all this? What are you guys testing? | |
weird, I only hit enter once. | |
Maxim 17-May-2010 [16762] | looking at possible search optimising within rebol, for big data sets. |
Paul 17-May-2010 [16763] | you means searching a block such as [1 "this" 2 "that" 3 "more"] etc..? |
Terry 18-May-2010 [16764] | ideally, a large block of key/values like ["key1" "value 1" "key 2" "value 2"] with the ability to use pattern matching on keys or values... but FAST |
Ladislav 18-May-2010 [16765] | Terry: "foreach is the winner speed wise.. as a bonus, If i use foreach, I don't need the index?" - unbelievable, how you compare apples and oranges without noticing |
Terry 18-May-2010 [16766x2] | It's all about the goal, Lad... apples, oranges.. unripened bananas... i don't care |
I was debating the merits of Rebol to the Redis group, and they said the same thing.. I said "Rebol + Cheyenne" is so much faster than Redis + PHP + Apache.. and they said "I'm comparing apples to oranges" What? Apples? Oranges? It's the RESULT i'm interested in. In that case it's was Redis pulling 7200 values from 100,000 keys per second vs Rebol pulling millions per second. | |
Ladislav 18-May-2010 [16768x2] | Terry: "I don't care" - you should, since you are comparing speed of code adhering to different specifications. If you really want to find the fastest code for a given specification, that is not the way to take. |
You are certainly entitled to do whatever you like, but saying "foreach is the winner speed wise..." is wrong, since you did not allow parse to do what you allowed foreach to do. | |
Terry 18-May-2010 [16770] | fair enough |
Pekr 18-May-2010 [16771] | Interesting - "ladislav's original parse example is much faster for me, than pekr's "quote"-based one" - then why on my machine was it otherwise? What could be the technical reason? |
Ladislav 18-May-2010 [16772] | different OS, different processor, maybe even different R3? |
Maxim 18-May-2010 [16773] | all my tests are being done on R2, for the record. |
Pekr 18-May-2010 [16774] | btw - will there be any difference between: result: make block! length? series and result: make series 0 I mean - e.g. not prealocating large enough series (make series 0) will be slowed down by GC constantly increasing the size? |
Maxim 18-May-2010 [16775x2] | so far, preallocating too large buffer takes much more time... but look at my funcs... I do even better :-) result: clear [] which re-uses the same block over and over, unless ladislav knows otherwise, the memory for that block isn't released by clear, only its internal size is reset to 0 which is why clear is so fast AFAIK. |
so it will grow to match the needs of the dataset, optimising itself in size within a few searches and then not re-allocating itself too often. | |
Ladislav 18-May-2010 [16777] | there are many important differences: *make series 0 does not necessarily make a block *regarding the allocation length - if you can estimate reliably the necessary length, then you are better off |
Maxim 18-May-2010 [16778] | with 10 million records, I estimated my dense datasets at about 120000 records and did a few tests, they wher MUCH slower than using result: clear [ ] |
Ladislav 18-May-2010 [16779] | but, make block! 1000 is certainly slower than make block! 0 if it turns out, that you need only 10 elements, e.g. |
Maxim 18-May-2010 [16780x2] | (estimated 120000 results) |
thing is many times, if not most of the times, you don't need to copy the result as a new block, and that also saves A LOT of ram in my tests... overall, about 500MB of RAM where saved by not pre-allocating large buffers | |
Ladislav 18-May-2010 [16782] | memory reuse: yes, if it is possible at all (it is like optimizing GC by hand), but that can be used only in special circumstances |
Maxim 18-May-2010 [16783x2] | for searches its usually ok... since the result is often used only to cycle over and create other data. |
and in any case, you can copy the result of the search, at which point you have a perfect size and as little wasted RAM as possible. | |
Pekr 18-May-2010 [16785] | Max - where do I get the dataset from, if I would try to rewrite your find-fast into a version using 'parse? :-) Do you generate one? |
Maxim 18-May-2010 [16786x2] | look in profiling, there is a full script with verbose printing and everything you need, just replace the loop in one of the funcs :-) |
you can easily compare your results with the current best ... I'll be happy if you can beat the ultimate-find and give the exact same feature... searching on any field of a record and return the whole record. | |
Henrik 18-May-2010 [16788] | overall, about 500MB of RAM where saved by not pre-allocating large buffers <- hmm... I thought the allocation did not necessarily mean usage, only that GC is simpler, or is it different under Unix and Windows?. |
Maxim 18-May-2010 [16789x2] | I use windows task manager to look at ram use... the peak was 900MB and average was 700MB... removing pre-allocation it went down to 350 with peaks of ~ 500 IIRC |
linux usually has more precise RAM reports AFAIK. | |
Henrik 18-May-2010 [16791x2] | ok, let's say you allocate 2 GB, if you can, does Windows start to swap? |
because if Windows only reports allocation and not actual use, then the task manager doesn't report true usage. | |
Maxim 18-May-2010 [16793x5] | the process manager reports a few different values, current, swapped, peak, and some more obscure ones. |
if an application allocates and reserves 2GB I really don't care if its only using 10mb of it... my system is clogged and its not the OS's fault. | |
though I did a special of XP install which forces the OS NEVER to swap... and XP tends to be MUCH smoother because of it. | |
(special install of XP) playing around with some obscure registry keys. | |
though for these tests, no swapping occured. | |
Henrik 18-May-2010 [16798x2] | I recently watched a talk by Poul Henning Kamp, author of Varnish, who talked about how many people misunderstand how memory allocation works in modern OS'es. Since he's a FreeBSD kernel developer, he has some bias, but he made some interesting points in that memory allocation is nearly free in various unixes, but most people ignore that an only allocate, perhaps just enough or below what they need. |
Whether this can be translated directly to REBOL, I don't know. | |
Maxim 18-May-2010 [16800] | problem is when you task switch, or run several RAM intensive apps... they do kill each other, even on unix. |
Henrik 18-May-2010 [16801] | but that's because the RAM is actually used, correct? |
older newer | first last |