World: r3wp

Join the discussions in the REBOL3 world...

[Core] Discuss core issues

older newer	first last
Maxim 17-May-2010 [16701x2]	I'm building a little example which I think will outperform your examples... I'm curious.
Maxim 17-May-2010 [16701x2]	I'm just about to test it.
Terry 17-May-2010 [16703]	moving the result: make series 0 out of the while loop had a 10 milli improvement over 10 iterations.
Maxim 17-May-2010 [16704]	that might account for the rarity of results. if your data has a lot of occurences, then that number will increase, since your result block will grow much more.
Gregg 17-May-2010 [16705]	You may need to move beyond brute force Terry, and set up a data structure to optimize the searching.
Terry 17-May-2010 [16706x6]	There must be a way. An index is a symbol that represents some value. What I need is to add some metadata to that index, and make that searchable.
	the goal is a blazing key/value store that's as fast as pulling by index :)
	I thought i had it with find/any .. but it doesn't work on blocks
	I haven't tried it, but my guess is storing or converting values to string to use find/any on will be slower than foreach. as in foreach [key value] n [...
	and i don't think you can search keys in hash or map! without using foreach?
	I suppose the while loop with counter, is adding extra burden in my examples as well so real world would be faster
Maxim 17-May-2010 [16712]	nope... since that is 10 ops within hundreds of millions.
Terry 17-May-2010 [16713x2]	The other thing i considered was using pairs! to as a pair of symbols, but can't search those either without foreach ie: [ 23x54 "value 1" 984x2093 "value 2"]
Terry 17-May-2010 [16713x2]	Maxim, the foreach results against strings in your doc is an interesting result.
Maxim 17-May-2010 [16715]	right now, for sparse data, I've got an algorithm that traverses 5 times faster than your foreach example. but if its really dense, then it will be slower.
Terry 17-May-2010 [16716]	I'm thinking 10,000,000 values min, and preferably to max memory
Maxim 17-May-2010 [16717x3]	10 x 10 million items, with a single value within the block... 0:00:00.234 using a 1.5GHz laptop.
	vs: 0:00:01.594 for your foreach example.
	plus record-size is variable, and is supplied as a parameter to the function.
Terry 17-May-2010 [16720]	not bad.. with index? i was getting around 2.5million/sec against 100,000
Maxim 17-May-2010 [16721]	find-fast: func [ series value record-length i "iterations" /local result st s ][ st: now/precise while [i > 0][ result: make series 0 ;length? series s: head series until [ not if s: find s value [ either 0 = mod -1 + (index? s) record-length [ insert tail result copy/part series record-length not tail? s: next s ][ s: next s ] ] ] i: i - 1 ] print difference now/precise st print length? result result ]
Terry 17-May-2010 [16722x2]	nice name ..
Terry 17-May-2010 [16722x2]	we can call the winner find-fastest
Maxim 17-May-2010 [16724x2]	this can probably be optimised further... but note that the algorithm scales with density of your block
Maxim 17-May-2010 [16724x2]	density = ratio of matches vs size of block
Terry 17-May-2010 [16726x4]	can it be modified for searching keys?
	and especially pattern matching of some sort?
	I gues find/part would work if the data were string? no?
	On your doc, regarding foreach you mentioned.. blocks get faster as you grab more at a time, while strings slow down, go figure! but the stats look the opposite?
Maxim 17-May-2010 [16730x2]	if you look at the speeds, cycling through two items at a time should give roughly twice the ops/second. when you look at the results... blocks end up being faster than this, and strings are less than this.
Maxim 17-May-2010 [16730x2]	btw, in my find-fast above... when search matchs aprox 1/10 times, it ends up being twice as slow as the foreach... so speed will depend on the dataset, as always.
Andreas 17-May-2010 [16732]	maxim, your find-fast returns the values, not the indices (not sure if that was intended)
Maxim 17-May-2010 [16733]	so does terry's
Andreas 17-May-2010 [16734]	his "feach" yes, but not his "ind" adaptation
Maxim 17-May-2010 [16735x2]	I'm comparing to his feach, which was faster... but if I just returned the index, it would be faster still (no copy/part).
Maxim 17-May-2010 [16735x2]	I'm trying out something else, which might be even faster...
Andreas 17-May-2010 [16737x2]	indices?-maxim-1: func [series value /local result][ result: make block! length? series until [ not if series: find series value [ append result index? series series: next series ] ] result ]
Andreas 17-May-2010 [16737x2]	that's your above find, stripped of the record-length stuff, and returning indices instead of the actual values
Maxim 17-May-2010 [16739]	btw... I discovered // instead of mod, which is twice as fast..... turns out MOD is a mezz... never noticed that.
Andreas 17-May-2010 [16740]	prin "Initialising ... " random/seed now/precise dim1: 10'000'000 dim2: dim1 / 2 needle: random dim2 haystack: make block! dim1 loop dim1 [append haystack random dim2] print "done"
Maxim 17-May-2010 [16741]	(the loop is twice as fast, meaning // is MUCH faster than using mod)
Andreas 17-May-2010 [16742]	here's a tiny bit of setup code, adjust dim1/dim2 as you wish. then find the needles in the haystack: indices? haystack needle
Maxim 17-May-2010 [16743]	I'm working on a (bigger) script which has similar setup code... specifically meant to compare different dataset scenarios
Andreas 17-May-2010 [16744x3]	using find/tail instead of just find speeds up things slightly:
	loop kernel becomes:
	not if series: find/tail series value [ append result (index? series) - 1 ]
Maxim 17-May-2010 [16747]	ah good idea!
Andreas 17-May-2010 [16748x2]	now, dropping the if for an all, speeds up things minimally, but it clean up the code
Andreas 17-May-2010 [16748x2]	indices?-3: func [series value /local result][ result: make block! length? series until [ not all [ series: find/tail series value append result (index? series) - 1 ] ] result ]
Maxim 17-May-2010 [16750]	hehe... just what I am doing ;-)
older newer	first last