r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Profiling] Rebol code optimisation and algorithm comparisons.

Maxim
18-May-2010
[146]
both dense tests perform pretty much the same, the moment I convert 
it to a hash, it gets reallllly slow.
Terry
18-May-2010
[147x2]
yeah, i see that too
mind you, that's pretty dense data
Maxim
18-May-2010
[149]
the strange thing is i did tests using a record size of 2, which 
wouldn't trigger strange mis aligned key/value issues.  I even removed 
the copy to make sure that wasn't the issue and one test with only 
400000 records took more than 4 minutes to complete vs .297 for the 
feach test!
Terry
18-May-2010
[150x2]
I'm looking for the 6 integer.. it's still cranking and i can hear 
my system struggling..
must be a loop error
Maxim
18-May-2010
[152]
well, the results where the same at the end... pretty weird... maybe 
someone has encountered this before and can explain why this happens....
Pekr
18-May-2010
[153]
Max - just a question - wouldn't using parse be faster than find/skip?
Ladislav
18-May-2010
[154]
my advice would be:

1) to test Parse as Pekr noted (traversing only the respective field)
2) to use a hash to index the respective field
Maxim
18-May-2010
[155]
I didn't do any parse test tweaks... but find/skip is very fast so 
far, we can skip over 100 million records within a millisecond.  
not sure parse can beat that
Terry
18-May-2010
[156]
Did you find a solution to the density issue Max?
Maxim
18-May-2010
[157]
nope... I'm working on urgent stuff won't have time for a few days 
to put more time on this.
Steeve
18-May-2010
[158]
didn't tested since a while in R2, but in R3, parse is faster in 
most of the cases (if you write correctly the rules)
Terry
18-May-2010
[159]
I'm wondering if it has something to do with recreating the hash 
each time a value is found?
Terry
19-May-2010
[160]
Looking at Maxim's ulitmate-find (above, monday 11:32) , does anyone 
have an idea why when dealing with hash! , the more matches it finds, 
the slower it gets?
Ladislav
19-May-2010
[161]
I think, that it is quite natural. You should probably generate some 
random data having (approximately) similar properties as what you 
intend to process and try some variant approaches to really find 
out, which one is best for the task. Do you know, that it is possible 
to index just a specific record field, i.e. you don't need to make 
a hash containing all the data from the database?
Terry
19-May-2010
[162x2]
Yeah, i've tried some actual data finding 3270 matches out of a hash 
that is 732981 in length.. 

when it's block the search takes .033 s, and same run against has 
is 0.6

but if the matches are just a few, hash is 1000x faster
(against has = against hash)
Ladislav
19-May-2010
[164]
.033 s, and same run against has is 0.6
 - do you mean 0.6s, ie. roughly 18 times slower?
Terry
19-May-2010
[165]
yeah
Ladislav
19-May-2010
[166x2]
that is interesting, can you post your data generator?
or, do you use the real-world data?
Maxim
19-May-2010
[168]
the only thing that I'm thinking is that when the hash index changes, 
its rehashing its content... which is strange.
Terry
19-May-2010
[169]
it's maxim's ultimate-find above ( and im using real world data)
Maxim
19-May-2010
[170]
ladislav, there is a script earlier in this discussion which has 
a complete working example.
Ladislav
19-May-2010
[171]
aha, you are using real-world data. OK, then, you should tell me 
how many matches you see
Maxim
19-May-2010
[172x2]
(and a revision to ultimate-find, just after it)
the example shows the problem very well.
Terry
19-May-2010
[174]
3270 matches
Maxim
19-May-2010
[175]
the example creates several sets of data with different organizations 
and it compares all of them amongst each other.


so with that script, you should be able to do all the analysis you 
need.
Terry
19-May-2010
[176]
495 matches against the same 732981 hash takes only .003
Maxim
19-May-2010
[177]
above I said "its rehashing its content... which is strange."

that is a guess... it should say:

its *might* be rehashing its content... which *would be* strange.
Ladislav
19-May-2010
[178x2]
hmm, what if the hash is optimized for unique elements?
...then you are most probably out of luck trying to use hash! for 
indexing purposes
Maxim
19-May-2010
[180]
ah   yes.... ususally a hash will have to skip over  elements which 
return the same hash key.


so if your table has a few thousand similar items, you aren't benifiting 
from the hashing... and its quite possible that looking up a hash 
itself is actually longer when it has to skip over and over (comparing 
data on top of the hash).


though one could argue, that the speeds should be a bit slower than 
using a block, not this slower...  possibly related to the implementation 
itself.
Terry
19-May-2010
[181]
my dilema is indexing triples in a key/value world
Andreas
19-May-2010
[182x3]
generall speaking :) ?
you could have a look at the various dedicated triplestores available 
(even though many of them have a semweb/rdf/... background).
or have a look at cassandra and/or monetdb (w/o knowing anything 
about your intended usage)
Terry
19-May-2010
[185x3]
yeah, I've looked a a few
rdf is to xml what War and Peace is to Cat in the Hat -- Triples 
are working even with Maxim's code above (just not in hashes for 
more than a query with a single value).. but i crave the speed of 
index? against large datasets.
I WILL NOT STOP TILL I HAVE A FAST AND SIMPLE TRIPLE STORE!  
(sleep is my enemy)
Maxim
19-May-2010
[188]
terry, index? is not a procedure within rebol .. its the same as 
length?
its a stored value which is simply looked up when you call index?


nothing will be as fast as index?    its the "getting to" index which 
consumes cycles
Steeve
19-May-2010
[189]
Where's the dilema ? you just have to maintain 3 indexes at the same 
time (for triples), there isn't any other choice if you looking for 
speed on readings.
Terry
19-May-2010
[190x4]
i know .. keys can be integers that are indexes of values in map! 
or hash.
yeah Steeve, im scratching out notes on that now.. it's not quite 
as simple as it sounds
ie: a value might be a large binary ..
1 GB values as keys don't work very well.
Steeve
19-May-2010
[194]
I already said to you to compute a checksum to build keys from large 
data, it's built-in in Rebol
Terry
19-May-2010
[195]
yeah, but then you risk collisions