World: r3wp
[Core] Discuss core issues
older newer | first last |
Maxim 17-May-2010 [16741] | (the loop is twice as fast, meaning // is MUCH faster than using mod) |
Andreas 17-May-2010 [16742] | here's a tiny bit of setup code, adjust dim1/dim2 as you wish. then find the needles in the haystack: indices? haystack needle |
Maxim 17-May-2010 [16743] | I'm working on a (bigger) script which has similar setup code... specifically meant to compare different dataset scenarios |
Andreas 17-May-2010 [16744x3] | using find/tail instead of just find speeds up things slightly: |
loop kernel becomes: | |
not if series: find/tail series value [ append result (index? series) - 1 ] | |
Maxim 17-May-2010 [16747] | ah good idea! |
Andreas 17-May-2010 [16748x2] | now, dropping the if for an all, speeds up things minimally, but it clean up the code |
indices?-3: func [series value /local result][ result: make block! length? series until [ not all [ series: find/tail series value append result (index? series) - 1 ] ] result ] | |
Maxim 17-May-2010 [16750] | hehe... just what I am doing ;-) |
Andreas 17-May-2010 [16751x5] | ladislav's original parse example is much faster for me, than pekr's "quote"-based one |
interestingly enough, a naive c extension is only negligibly faster | |
indices?-null 0:00:00.184183 indices?-p1 0:00:01.082892 indices?-p2 0:00:01.523015 indices?-u1 0:00:00.347117 indices?-u2 0:00:00.345846 indices?-u3 0:00:00.346959 indices?-ext 0:00:00.329520 | |
null just creates a result block, to demonstrate that 50% of runtime is mem allocation for the 10m result array (so that's where one should really spend time optimising). p1 is ladislav's `1 1 value` parse, p2 is pekrs `quote (value)` parse, u1/2/3 are the until-based versions shown above, ext is a naive C extension. | |
the kernel of the C extension: result = RXI_MAKE_BLOCK(series_n - series_i); result_n = 0; for (; series_i < series_n; ++series_i) { elem_type = RXI_GET_VALUE(series, series_i, &elem); if (elem.int64 == value.int64 && elem_type == value_type) { result_v.int64 = series_i + 1; RXI_SET_VALUE(result, result_n++, result_v, RXT_INTEGER); } } | |
Maxim 17-May-2010 [16756x2] | I've just finished my tests... I've got a keyed search func which returns the exact same results as feach but 20 times faster! I'll put the whole script in the profiling group... it has several dataset creations for comparison and includes a clean run-time printout. |
all speeds are dependent on data... so YMMV | |
Paul 17-May-2010 [16758x4] | what is all this? What are you guys testing? |
what is all this? What are you guys testing? | |
what is all this? What are you guys testing? | |
weird, I only hit enter once. | |
Maxim 17-May-2010 [16762] | looking at possible search optimising within rebol, for big data sets. |
Paul 17-May-2010 [16763] | you means searching a block such as [1 "this" 2 "that" 3 "more"] etc..? |
Terry 18-May-2010 [16764] | ideally, a large block of key/values like ["key1" "value 1" "key 2" "value 2"] with the ability to use pattern matching on keys or values... but FAST |
Ladislav 18-May-2010 [16765] | Terry: "foreach is the winner speed wise.. as a bonus, If i use foreach, I don't need the index?" - unbelievable, how you compare apples and oranges without noticing |
Terry 18-May-2010 [16766x2] | It's all about the goal, Lad... apples, oranges.. unripened bananas... i don't care |
I was debating the merits of Rebol to the Redis group, and they said the same thing.. I said "Rebol + Cheyenne" is so much faster than Redis + PHP + Apache.. and they said "I'm comparing apples to oranges" What? Apples? Oranges? It's the RESULT i'm interested in. In that case it's was Redis pulling 7200 values from 100,000 keys per second vs Rebol pulling millions per second. | |
Ladislav 18-May-2010 [16768x2] | Terry: "I don't care" - you should, since you are comparing speed of code adhering to different specifications. If you really want to find the fastest code for a given specification, that is not the way to take. |
You are certainly entitled to do whatever you like, but saying "foreach is the winner speed wise..." is wrong, since you did not allow parse to do what you allowed foreach to do. | |
Terry 18-May-2010 [16770] | fair enough |
Pekr 18-May-2010 [16771] | Interesting - "ladislav's original parse example is much faster for me, than pekr's "quote"-based one" - then why on my machine was it otherwise? What could be the technical reason? |
Ladislav 18-May-2010 [16772] | different OS, different processor, maybe even different R3? |
Maxim 18-May-2010 [16773] | all my tests are being done on R2, for the record. |
Pekr 18-May-2010 [16774] | btw - will there be any difference between: result: make block! length? series and result: make series 0 I mean - e.g. not prealocating large enough series (make series 0) will be slowed down by GC constantly increasing the size? |
Maxim 18-May-2010 [16775x2] | so far, preallocating too large buffer takes much more time... but look at my funcs... I do even better :-) result: clear [] which re-uses the same block over and over, unless ladislav knows otherwise, the memory for that block isn't released by clear, only its internal size is reset to 0 which is why clear is so fast AFAIK. |
so it will grow to match the needs of the dataset, optimising itself in size within a few searches and then not re-allocating itself too often. | |
Ladislav 18-May-2010 [16777] | there are many important differences: *make series 0 does not necessarily make a block *regarding the allocation length - if you can estimate reliably the necessary length, then you are better off |
Maxim 18-May-2010 [16778] | with 10 million records, I estimated my dense datasets at about 120000 records and did a few tests, they wher MUCH slower than using result: clear [ ] |
Ladislav 18-May-2010 [16779] | but, make block! 1000 is certainly slower than make block! 0 if it turns out, that you need only 10 elements, e.g. |
Maxim 18-May-2010 [16780x2] | (estimated 120000 results) |
thing is many times, if not most of the times, you don't need to copy the result as a new block, and that also saves A LOT of ram in my tests... overall, about 500MB of RAM where saved by not pre-allocating large buffers | |
Ladislav 18-May-2010 [16782] | memory reuse: yes, if it is possible at all (it is like optimizing GC by hand), but that can be used only in special circumstances |
Maxim 18-May-2010 [16783x2] | for searches its usually ok... since the result is often used only to cycle over and create other data. |
and in any case, you can copy the result of the search, at which point you have a perfect size and as little wasted RAM as possible. | |
Pekr 18-May-2010 [16785] | Max - where do I get the dataset from, if I would try to rewrite your find-fast into a version using 'parse? :-) Do you generate one? |
Maxim 18-May-2010 [16786x2] | look in profiling, there is a full script with verbose printing and everything you need, just replace the loop in one of the funcs :-) |
you can easily compare your results with the current best ... I'll be happy if you can beat the ultimate-find and give the exact same feature... searching on any field of a record and return the whole record. | |
Henrik 18-May-2010 [16788] | overall, about 500MB of RAM where saved by not pre-allocating large buffers <- hmm... I thought the allocation did not necessarily mean usage, only that GC is simpler, or is it different under Unix and Windows?. |
Maxim 18-May-2010 [16789x2] | I use windows task manager to look at ram use... the peak was 900MB and average was 700MB... removing pre-allocation it went down to 350 with peaks of ~ 500 IIRC |
linux usually has more precise RAM reports AFAIK. | |
older newer | first last |