r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

Maxim
17-May-2010
[16718x2]
vs: 
0:00:01.594 for your foreach example.
plus record-size is variable, and is supplied as a parameter to the 
function.
Terry
17-May-2010
[16720]
not bad.. with index? i was getting around 2.5million/sec against 
100,000
Maxim
17-May-2010
[16721]
find-fast: func [
	series
	value
	record-length
	i "iterations"
	/local result st s
][
	
	st: now/precise
	while [i > 0][
		result: make series 0 ;length? series
		s: head series
		until [
			not if s: find  s value [
				either 0 = mod -1 + (index? s) record-length [
					insert tail result copy/part series record-length
					not tail? s: next s
				][
					s: next s
				]
			]
		]
		i: i - 1
	]
	
	print difference now/precise st
	print length? result
	
	result
]
Terry
17-May-2010
[16722x2]
nice name
..
we can call the winner find-fastest
Maxim
17-May-2010
[16724x2]
this can probably be optimised further...  but note that the algorithm 
scales with density of your block
density = ratio of matches vs size of block
Terry
17-May-2010
[16726x4]
can it be modified for searching keys?
and especially pattern matching of some sort?
I gues find/part would work if the data were string? no?
On your doc, regarding foreach you mentioned.. 

blocks get faster as you grab more at a time, while strings slow 
down, go figure!
 
but the stats look the opposite?
Maxim
17-May-2010
[16730x2]
if you look at the speeds,  cycling through two items at a time should 
give roughly twice the ops/second.


when you look at the results... blocks end up being faster than this, 
and strings are less than this.
btw, in my find-fast above... when search matchs aprox 1/10 times, 
it ends up being twice as slow as the foreach... 

so speed will depend on the dataset, as always.
Andreas
17-May-2010
[16732]
maxim, your find-fast returns the values, not the indices (not sure 
if that was intended)
Maxim
17-May-2010
[16733]
so does terry's
Andreas
17-May-2010
[16734]
his "feach" yes, but not his "ind" adaptation
Maxim
17-May-2010
[16735x2]
I'm comparing to his feach, which was faster... but if I just returned 
the index, it would be faster still (no copy/part).
I'm trying out something else, which might be even faster...
Andreas
17-May-2010
[16737x2]
indices?-maxim-1: func [series value /local result][
    result: make block! length? series
    until [
        not if series: find series value [
            append result index? series
            series: next series
        ]
    ]
    result
]
that's your above find, stripped of the record-length stuff, and 
returning indices instead of the actual values
Maxim
17-May-2010
[16739]
btw... I discovered  //  instead   of   mod,   which is twice as 
fast..... turns out  MOD is a mezz... never noticed that.
Andreas
17-May-2010
[16740]
prin "Initialising ... "
random/seed now/precise
dim1: 10'000'000
dim2: dim1 / 2
needle: random dim2
haystack: make block! dim1
loop dim1 [append haystack random dim2]
print "done"
Maxim
17-May-2010
[16741]
(the loop is twice as fast, meaning // is MUCH faster than using 
 mod)
Andreas
17-May-2010
[16742]
here's a tiny bit of setup code, adjust dim1/dim2 as you wish. then 
find the needles in the haystack: indices? haystack needle
Maxim
17-May-2010
[16743]
I'm working on a (bigger) script which has similar setup code... 
specifically meant to compare different dataset scenarios
Andreas
17-May-2010
[16744x3]
using find/tail instead of just find speeds up things slightly:
loop kernel becomes:
not if series: find/tail series value [
            append result (index? series) - 1
        ]
Maxim
17-May-2010
[16747]
ah good idea!
Andreas
17-May-2010
[16748x2]
now, dropping the if for an all, speeds up things minimally, but 
it clean up the code
indices?-3: func [series value /local result][
    result: make block! length? series
    until [
        not all [
            series: find/tail series value
            append result (index? series) - 1
        ]
    ]
    result
]
Maxim
17-May-2010
[16750]
hehe... just what I am doing  ;-)
Andreas
17-May-2010
[16751x5]
ladislav's original parse example is much faster for me, than pekr's 
"quote"-based one
interestingly enough, a naive c extension is only negligibly faster
indices?-null 	 0:00:00.184183
indices?-p1 	 0:00:01.082892
indices?-p2 	 0:00:01.523015
indices?-u1 	 0:00:00.347117
indices?-u2 	 0:00:00.345846
indices?-u3 	 0:00:00.346959
indices?-ext 	 0:00:00.329520
null just creates a result block, to demonstrate that 50% of runtime 
is mem allocation for the 10m result array (so that's where one should 
really spend time optimising).


p1 is ladislav's `1 1 value` parse, p2 is pekrs `quote (value)` parse, 
u1/2/3 are the until-based versions shown above, ext is a naive C 
extension.
the kernel of the C extension:

    result = RXI_MAKE_BLOCK(series_n - series_i);
    result_n = 0;
    for (; series_i < series_n; ++series_i) {
        elem_type = RXI_GET_VALUE(series, series_i, &elem);

        if (elem.int64 == value.int64 && elem_type == value_type) {
            result_v.int64 = series_i + 1;

            RXI_SET_VALUE(result, result_n++, result_v, RXT_INTEGER);
        }
    }
Maxim
17-May-2010
[16756x2]
I've just finished my tests... I've got a keyed search func which 
returns the exact same results as feach but  20 times faster!

I'll put the whole script in the profiling group... it  has several 
dataset creations for comparison and includes a clean run-time printout.
all speeds are dependent on data... so YMMV
Paul
17-May-2010
[16758x4]
what is all this?  What are you guys testing?
what is all this?  What are you guys testing?
what is all this?  What are you guys testing?
weird, I only hit enter once.
Maxim
17-May-2010
[16762]
looking at  possible search optimising within rebol, for big data 
sets.
Paul
17-May-2010
[16763]
you means searching a block such as [1 "this" 2 "that" 3 "more"] 
 etc..?
Terry
18-May-2010
[16764]
ideally, a large block of  key/values like ["key1" "value 1" "key 
2" "value 2"] with the ability to use pattern matching on keys or 
values... but FAST
Ladislav
18-May-2010
[16765]
Terry: "foreach is the winner speed wise.. as a bonus, If i use foreach, 
I don't need the index?" - unbelievable, how you compare apples and 
oranges without noticing
Terry
18-May-2010
[16766x2]
It's all about the goal, Lad... apples, oranges.. unripened bananas... 
i don't care
I was debating the merits of Rebol to the Redis group, and they said 
the same thing.. I said "Rebol + Cheyenne" is so much faster than 
Redis + PHP + Apache.. and they said "I'm comparing apples to oranges"

What? Apples? Oranges?  It's the RESULT i'm interested in. In that 
case it's was Redis pulling 7200 values from 100,000 keys per second 
vs Rebol pulling millions per second.