World: r3wp

Join the discussions in the REBOL3 world...

[Core] Discuss core issues

older newer	first last
Terry 17-May-2010 [16710x2]	and i don't think you can search keys in hash or map! without using foreach?
Terry 17-May-2010 [16710x2]	I suppose the while loop with counter, is adding extra burden in my examples as well so real world would be faster
Maxim 17-May-2010 [16712]	nope... since that is 10 ops within hundreds of millions.
Terry 17-May-2010 [16713x2]	The other thing i considered was using pairs! to as a pair of symbols, but can't search those either without foreach ie: [ 23x54 "value 1" 984x2093 "value 2"]
Terry 17-May-2010 [16713x2]	Maxim, the foreach results against strings in your doc is an interesting result.
Maxim 17-May-2010 [16715]	right now, for sparse data, I've got an algorithm that traverses 5 times faster than your foreach example. but if its really dense, then it will be slower.
Terry 17-May-2010 [16716]	I'm thinking 10,000,000 values min, and preferably to max memory
Maxim 17-May-2010 [16717x3]	10 x 10 million items, with a single value within the block... 0:00:00.234 using a 1.5GHz laptop.
	vs: 0:00:01.594 for your foreach example.
	plus record-size is variable, and is supplied as a parameter to the function.
Terry 17-May-2010 [16720]	not bad.. with index? i was getting around 2.5million/sec against 100,000
Maxim 17-May-2010 [16721]	find-fast: func [ series value record-length i "iterations" /local result st s ][ st: now/precise while [i > 0][ result: make series 0 ;length? series s: head series until [ not if s: find s value [ either 0 = mod -1 + (index? s) record-length [ insert tail result copy/part series record-length not tail? s: next s ][ s: next s ] ] ] i: i - 1 ] print difference now/precise st print length? result result ]
Terry 17-May-2010 [16722x2]	nice name ..
Terry 17-May-2010 [16722x2]	we can call the winner find-fastest
Maxim 17-May-2010 [16724x2]	this can probably be optimised further... but note that the algorithm scales with density of your block
Maxim 17-May-2010 [16724x2]	density = ratio of matches vs size of block
Terry 17-May-2010 [16726x4]	can it be modified for searching keys?
	and especially pattern matching of some sort?
	I gues find/part would work if the data were string? no?
	On your doc, regarding foreach you mentioned.. blocks get faster as you grab more at a time, while strings slow down, go figure! but the stats look the opposite?
Maxim 17-May-2010 [16730x2]	if you look at the speeds, cycling through two items at a time should give roughly twice the ops/second. when you look at the results... blocks end up being faster than this, and strings are less than this.
Maxim 17-May-2010 [16730x2]	btw, in my find-fast above... when search matchs aprox 1/10 times, it ends up being twice as slow as the foreach... so speed will depend on the dataset, as always.
Andreas 17-May-2010 [16732]	maxim, your find-fast returns the values, not the indices (not sure if that was intended)
Maxim 17-May-2010 [16733]	so does terry's
Andreas 17-May-2010 [16734]	his "feach" yes, but not his "ind" adaptation
Maxim 17-May-2010 [16735x2]	I'm comparing to his feach, which was faster... but if I just returned the index, it would be faster still (no copy/part).
Maxim 17-May-2010 [16735x2]	I'm trying out something else, which might be even faster...
Andreas 17-May-2010 [16737x2]	indices?-maxim-1: func [series value /local result][ result: make block! length? series until [ not if series: find series value [ append result index? series series: next series ] ] result ]
Andreas 17-May-2010 [16737x2]	that's your above find, stripped of the record-length stuff, and returning indices instead of the actual values
Maxim 17-May-2010 [16739]	btw... I discovered // instead of mod, which is twice as fast..... turns out MOD is a mezz... never noticed that.
Andreas 17-May-2010 [16740]	prin "Initialising ... " random/seed now/precise dim1: 10'000'000 dim2: dim1 / 2 needle: random dim2 haystack: make block! dim1 loop dim1 [append haystack random dim2] print "done"
Maxim 17-May-2010 [16741]	(the loop is twice as fast, meaning // is MUCH faster than using mod)
Andreas 17-May-2010 [16742]	here's a tiny bit of setup code, adjust dim1/dim2 as you wish. then find the needles in the haystack: indices? haystack needle
Maxim 17-May-2010 [16743]	I'm working on a (bigger) script which has similar setup code... specifically meant to compare different dataset scenarios
Andreas 17-May-2010 [16744x3]	using find/tail instead of just find speeds up things slightly:
	loop kernel becomes:
	not if series: find/tail series value [ append result (index? series) - 1 ]
Maxim 17-May-2010 [16747]	ah good idea!
Andreas 17-May-2010 [16748x2]	now, dropping the if for an all, speeds up things minimally, but it clean up the code
Andreas 17-May-2010 [16748x2]	indices?-3: func [series value /local result][ result: make block! length? series until [ not all [ series: find/tail series value append result (index? series) - 1 ] ] result ]
Maxim 17-May-2010 [16750]	hehe... just what I am doing ;-)
Andreas 17-May-2010 [16751x5]	ladislav's original parse example is much faster for me, than pekr's "quote"-based one
	interestingly enough, a naive c extension is only negligibly faster
	indices?-null 0:00:00.184183 indices?-p1 0:00:01.082892 indices?-p2 0:00:01.523015 indices?-u1 0:00:00.347117 indices?-u2 0:00:00.345846 indices?-u3 0:00:00.346959 indices?-ext 0:00:00.329520
	null just creates a result block, to demonstrate that 50% of runtime is mem allocation for the 10m result array (so that's where one should really spend time optimising). p1 is ladislav's `1 1 value` parse, p2 is pekrs `quote (value)` parse, u1/2/3 are the until-based versions shown above, ext is a naive C extension.
	the kernel of the C extension: result = RXI_MAKE_BLOCK(series_n - series_i); result_n = 0; for (; series_i < series_n; ++series_i) { elem_type = RXI_GET_VALUE(series, series_i, &elem); if (elem.int64 == value.int64 && elem_type == value_type) { result_v.int64 = series_i + 1; RXI_SET_VALUE(result, result_n++, result_v, RXT_INTEGER); } }
Maxim 17-May-2010 [16756x2]	I've just finished my tests... I've got a keyed search func which returns the exact same results as feach but 20 times faster! I'll put the whole script in the profiling group... it has several dataset creations for comparison and includes a clean run-time printout.
Maxim 17-May-2010 [16756x2]	all speeds are dependent on data... so YMMV
Paul 17-May-2010 [16758x2]	what is all this? What are you guys testing?
Paul 17-May-2010 [16758x2]	what is all this? What are you guys testing?
older newer	first last