World: r3wp
[Core] Discuss core issues
older newer | first last |
Maxim 17-May-2010 [16702] | I'm just about to test it. |
Terry 17-May-2010 [16703] | moving the result: make series 0 out of the while loop had a 10 milli improvement over 10 iterations. |
Maxim 17-May-2010 [16704] | that might account for the rarity of results. if your data has a lot of occurences, then that number will increase, since your result block will grow much more. |
Gregg 17-May-2010 [16705] | You may need to move beyond brute force Terry, and set up a data structure to optimize the searching. |
Terry 17-May-2010 [16706x6] | There must be a way. An index is a symbol that represents some value. What I need is to add some metadata to that index, and make that searchable. |
the goal is a blazing key/value store that's as fast as pulling by index :) | |
I thought i had it with find/any .. but it doesn't work on blocks | |
I haven't tried it, but my guess is storing or converting values to string to use find/any on will be slower than foreach. as in foreach [key value] n [... | |
and i don't think you can search keys in hash or map! without using foreach? | |
I suppose the while loop with counter, is adding extra burden in my examples as well so real world would be faster | |
Maxim 17-May-2010 [16712] | nope... since that is 10 ops within hundreds of millions. |
Terry 17-May-2010 [16713x2] | The other thing i considered was using pairs! to as a pair of symbols, but can't search those either without foreach ie: [ 23x54 "value 1" 984x2093 "value 2"] |
Maxim, the foreach results against strings in your doc is an interesting result. | |
Maxim 17-May-2010 [16715] | right now, for sparse data, I've got an algorithm that traverses 5 times faster than your foreach example. but if its really dense, then it will be slower. |
Terry 17-May-2010 [16716] | I'm thinking 10,000,000 values min, and preferably to max memory |
Maxim 17-May-2010 [16717x3] | 10 x 10 million items, with a single value within the block... 0:00:00.234 using a 1.5GHz laptop. |
vs: 0:00:01.594 for your foreach example. | |
plus record-size is variable, and is supplied as a parameter to the function. | |
Terry 17-May-2010 [16720] | not bad.. with index? i was getting around 2.5million/sec against 100,000 |
Maxim 17-May-2010 [16721] | find-fast: func [ series value record-length i "iterations" /local result st s ][ st: now/precise while [i > 0][ result: make series 0 ;length? series s: head series until [ not if s: find s value [ either 0 = mod -1 + (index? s) record-length [ insert tail result copy/part series record-length not tail? s: next s ][ s: next s ] ] ] i: i - 1 ] print difference now/precise st print length? result result ] |
Terry 17-May-2010 [16722x2] | nice name .. |
we can call the winner find-fastest | |
Maxim 17-May-2010 [16724x2] | this can probably be optimised further... but note that the algorithm scales with density of your block |
density = ratio of matches vs size of block | |
Terry 17-May-2010 [16726x4] | can it be modified for searching keys? |
and especially pattern matching of some sort? | |
I gues find/part would work if the data were string? no? | |
On your doc, regarding foreach you mentioned.. blocks get faster as you grab more at a time, while strings slow down, go figure! but the stats look the opposite? | |
Maxim 17-May-2010 [16730x2] | if you look at the speeds, cycling through two items at a time should give roughly twice the ops/second. when you look at the results... blocks end up being faster than this, and strings are less than this. |
btw, in my find-fast above... when search matchs aprox 1/10 times, it ends up being twice as slow as the foreach... so speed will depend on the dataset, as always. | |
Andreas 17-May-2010 [16732] | maxim, your find-fast returns the values, not the indices (not sure if that was intended) |
Maxim 17-May-2010 [16733] | so does terry's |
Andreas 17-May-2010 [16734] | his "feach" yes, but not his "ind" adaptation |
Maxim 17-May-2010 [16735x2] | I'm comparing to his feach, which was faster... but if I just returned the index, it would be faster still (no copy/part). |
I'm trying out something else, which might be even faster... | |
Andreas 17-May-2010 [16737x2] | indices?-maxim-1: func [series value /local result][ result: make block! length? series until [ not if series: find series value [ append result index? series series: next series ] ] result ] |
that's your above find, stripped of the record-length stuff, and returning indices instead of the actual values | |
Maxim 17-May-2010 [16739] | btw... I discovered // instead of mod, which is twice as fast..... turns out MOD is a mezz... never noticed that. |
Andreas 17-May-2010 [16740] | prin "Initialising ... " random/seed now/precise dim1: 10'000'000 dim2: dim1 / 2 needle: random dim2 haystack: make block! dim1 loop dim1 [append haystack random dim2] print "done" |
Maxim 17-May-2010 [16741] | (the loop is twice as fast, meaning // is MUCH faster than using mod) |
Andreas 17-May-2010 [16742] | here's a tiny bit of setup code, adjust dim1/dim2 as you wish. then find the needles in the haystack: indices? haystack needle |
Maxim 17-May-2010 [16743] | I'm working on a (bigger) script which has similar setup code... specifically meant to compare different dataset scenarios |
Andreas 17-May-2010 [16744x3] | using find/tail instead of just find speeds up things slightly: |
loop kernel becomes: | |
not if series: find/tail series value [ append result (index? series) - 1 ] | |
Maxim 17-May-2010 [16747] | ah good idea! |
Andreas 17-May-2010 [16748x2] | now, dropping the if for an all, speeds up things minimally, but it clean up the code |
indices?-3: func [series value /local result][ result: make block! length? series until [ not all [ series: find/tail series value append result (index? series) - 1 ] ] result ] | |
Maxim 17-May-2010 [16750] | hehe... just what I am doing ;-) |
Andreas 17-May-2010 [16751] | ladislav's original parse example is much faster for me, than pekr's "quote"-based one |
older newer | first last |