r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Core] Discuss core issues

Maxim
17-May-2010
[16702]
I'm just about to test it.
Terry
17-May-2010
[16703]
moving the result: make series 0 out of the while loop had a 10 milli 
improvement over 10 iterations.
Maxim
17-May-2010
[16704]
that might account for the rarity of results.  if your data has a 
lot of occurences, then that number will increase, since your result 
block will grow much more.
Gregg
17-May-2010
[16705]
You may need to move beyond brute force Terry, and set up a data 
structure to optimize the searching.
Terry
17-May-2010
[16706x6]
There must be a way.

An index is a symbol that represents some value. What I need is to 
add some metadata to that index, and make that searchable.
the goal is a blazing key/value store that's as fast as pulling by 
index
:)
I thought i had it with find/any .. but it doesn't work on blocks
I haven't tried it, but my guess is storing or converting values 
to string to use find/any on will be slower than foreach. as in foreach 
[key value] n [...
and i don't think you can search keys in hash or map! without using 
foreach?
I suppose the while loop with counter, is adding extra burden in 
my examples as well so real world would be faster
Maxim
17-May-2010
[16712]
nope... since that is 10 ops within hundreds of millions.
Terry
17-May-2010
[16713x2]
The other thing i considered was using pairs! to as a pair of symbols, 
but can't search those either without foreach
ie: [ 23x54 "value 1" 984x2093 "value 2"]
Maxim, the foreach results against strings in your doc is an interesting 
result.
Maxim
17-May-2010
[16715]
right now, for sparse data, I've got an algorithm that traverses 
5 times faster than your foreach example.  but if its really dense, 
then it will be slower.
Terry
17-May-2010
[16716]
I'm thinking 10,000,000 values min, and preferably to max memory
Maxim
17-May-2010
[16717x3]
10 x 10 million items, with a single value within the block...  

0:00:00.234  using a 1.5GHz laptop.
vs: 
0:00:01.594 for your foreach example.
plus record-size is variable, and is supplied as a parameter to the 
function.
Terry
17-May-2010
[16720]
not bad.. with index? i was getting around 2.5million/sec against 
100,000
Maxim
17-May-2010
[16721]
find-fast: func [
	series
	value
	record-length
	i "iterations"
	/local result st s
][
	
	st: now/precise
	while [i > 0][
		result: make series 0 ;length? series
		s: head series
		until [
			not if s: find  s value [
				either 0 = mod -1 + (index? s) record-length [
					insert tail result copy/part series record-length
					not tail? s: next s
				][
					s: next s
				]
			]
		]
		i: i - 1
	]
	
	print difference now/precise st
	print length? result
	
	result
]
Terry
17-May-2010
[16722x2]
nice name
..
we can call the winner find-fastest
Maxim
17-May-2010
[16724x2]
this can probably be optimised further...  but note that the algorithm 
scales with density of your block
density = ratio of matches vs size of block
Terry
17-May-2010
[16726x4]
can it be modified for searching keys?
and especially pattern matching of some sort?
I gues find/part would work if the data were string? no?
On your doc, regarding foreach you mentioned.. 

blocks get faster as you grab more at a time, while strings slow 
down, go figure!
 
but the stats look the opposite?
Maxim
17-May-2010
[16730x2]
if you look at the speeds,  cycling through two items at a time should 
give roughly twice the ops/second.


when you look at the results... blocks end up being faster than this, 
and strings are less than this.
btw, in my find-fast above... when search matchs aprox 1/10 times, 
it ends up being twice as slow as the foreach... 

so speed will depend on the dataset, as always.
Andreas
17-May-2010
[16732]
maxim, your find-fast returns the values, not the indices (not sure 
if that was intended)
Maxim
17-May-2010
[16733]
so does terry's
Andreas
17-May-2010
[16734]
his "feach" yes, but not his "ind" adaptation
Maxim
17-May-2010
[16735x2]
I'm comparing to his feach, which was faster... but if I just returned 
the index, it would be faster still (no copy/part).
I'm trying out something else, which might be even faster...
Andreas
17-May-2010
[16737x2]
indices?-maxim-1: func [series value /local result][
    result: make block! length? series
    until [
        not if series: find series value [
            append result index? series
            series: next series
        ]
    ]
    result
]
that's your above find, stripped of the record-length stuff, and 
returning indices instead of the actual values
Maxim
17-May-2010
[16739]
btw... I discovered  //  instead   of   mod,   which is twice as 
fast..... turns out  MOD is a mezz... never noticed that.
Andreas
17-May-2010
[16740]
prin "Initialising ... "
random/seed now/precise
dim1: 10'000'000
dim2: dim1 / 2
needle: random dim2
haystack: make block! dim1
loop dim1 [append haystack random dim2]
print "done"
Maxim
17-May-2010
[16741]
(the loop is twice as fast, meaning // is MUCH faster than using 
 mod)
Andreas
17-May-2010
[16742]
here's a tiny bit of setup code, adjust dim1/dim2 as you wish. then 
find the needles in the haystack: indices? haystack needle
Maxim
17-May-2010
[16743]
I'm working on a (bigger) script which has similar setup code... 
specifically meant to compare different dataset scenarios
Andreas
17-May-2010
[16744x3]
using find/tail instead of just find speeds up things slightly:
loop kernel becomes:
not if series: find/tail series value [
            append result (index? series) - 1
        ]
Maxim
17-May-2010
[16747]
ah good idea!
Andreas
17-May-2010
[16748x2]
now, dropping the if for an all, speeds up things minimally, but 
it clean up the code
indices?-3: func [series value /local result][
    result: make block! length? series
    until [
        not all [
            series: find/tail series value
            append result (index? series) - 1
        ]
    ]
    result
]
Maxim
17-May-2010
[16750]
hehe... just what I am doing  ;-)
Andreas
17-May-2010
[16751]
ladislav's original parse example is much faster for me, than pekr's 
"quote"-based one