r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Profiling] Rebol code optimisation and algorithm comparisons.

Maxim
18-May-2010
[120]
zz should be 3.
Terry
18-May-2010
[121x2]
oh.. i thought zz was the length? of dataset
ah.. that works GREAT
Maxim
18-May-2010
[123]
>> dataset: [1 "a" "B" 4 "h" "V" 1 "z" "Z" 4 "p" "d" 4 "k" "i" 4 
"y" "o"]
== [1 "a" "B" 4 "h" "V" 1 "z" "Z" 4 "p" "d" 4 "k" "i" 4 "y" "o"]
>> ultimate-find dataset 4 1 3 1
ultimate find(): 1. -> 0:00   4 matches found
== [4 "h" "V" 4 "p" "d" 4 "k" "i" 4 "y" "o"]
Terry
18-May-2010
[124]
very nice
Maxim
18-May-2010
[125x2]
ultimate find(): 1. -> 0:00   1 matches found
== [1 "a" "B"]

:-)
oops missing cmd line...
Terry
18-May-2010
[127]
so if the dataset is key/value just use 2 as the record-length
Maxim
18-May-2010
[128x2]
>> ultimate-find dataset "a" 2 3 1
ultimate find(): 1. -> 0:00   1 matches found
== [1 "a" "B"]
yep
Terry
18-May-2010
[130x3]
cool
i inserted "maxim" "age" "unknown" and appended "terry" "age" "42" 
into the dataset containing 6 million records.. 

>> ultimate-find dataset "age"  2 3 1
ultimate find(): 1. -> 0:00:00.093   2 matches found
== ["maximn" "age" "unknown" "terry" "age" "42"]
I'll say that's a respectable time... and the leading contestant 
:)
Maxim
18-May-2010
[133]
:-)
Terry
18-May-2010
[134x4]
now if only i was 42 again...
But wait, there's more.... 

convert dataset to hash! and run ultimate-find again!
>> ultimate-find dataset "age"  2 3 100
ultimate find():  -> 0:00   2 matches found
== ["maximn" "age" "unknown" "terry" "age" "42"]

100 iterations not even registering
1000 iterations 0.40
Maxim
18-May-2010
[138]
OMG !
Terry
18-May-2010
[139]
exactly
Maxim
18-May-2010
[140x2]
but I'm getting an odd deadlock here on some tests... hum...
I'm getting extremely slow results on dense tests...
Terry
18-May-2010
[142x2]
interesting... im not too worried as density isn't a big issue with 
triple stores
im off.. good luck with your optimizations
Maxim
18-May-2010
[144]
I'm talking like 100 times worse!   the larger the list the worse 
it gets... seems like an exponential issue.
Terry
18-May-2010
[145]
that seems like an anomaly
Maxim
18-May-2010
[146]
both dense tests perform pretty much the same, the moment I convert 
it to a hash, it gets reallllly slow.
Terry
18-May-2010
[147x2]
yeah, i see that too
mind you, that's pretty dense data
Maxim
18-May-2010
[149]
the strange thing is i did tests using a record size of 2, which 
wouldn't trigger strange mis aligned key/value issues.  I even removed 
the copy to make sure that wasn't the issue and one test with only 
400000 records took more than 4 minutes to complete vs .297 for the 
feach test!
Terry
18-May-2010
[150x2]
I'm looking for the 6 integer.. it's still cranking and i can hear 
my system struggling..
must be a loop error
Maxim
18-May-2010
[152]
well, the results where the same at the end... pretty weird... maybe 
someone has encountered this before and can explain why this happens....
Pekr
18-May-2010
[153]
Max - just a question - wouldn't using parse be faster than find/skip?
Ladislav
18-May-2010
[154]
my advice would be:

1) to test Parse as Pekr noted (traversing only the respective field)
2) to use a hash to index the respective field
Maxim
18-May-2010
[155]
I didn't do any parse test tweaks... but find/skip is very fast so 
far, we can skip over 100 million records within a millisecond.  
not sure parse can beat that
Terry
18-May-2010
[156]
Did you find a solution to the density issue Max?
Maxim
18-May-2010
[157]
nope... I'm working on urgent stuff won't have time for a few days 
to put more time on this.
Steeve
18-May-2010
[158]
didn't tested since a while in R2, but in R3, parse is faster in 
most of the cases (if you write correctly the rules)
Terry
18-May-2010
[159]
I'm wondering if it has something to do with recreating the hash 
each time a value is found?
Terry
19-May-2010
[160]
Looking at Maxim's ulitmate-find (above, monday 11:32) , does anyone 
have an idea why when dealing with hash! , the more matches it finds, 
the slower it gets?
Ladislav
19-May-2010
[161]
I think, that it is quite natural. You should probably generate some 
random data having (approximately) similar properties as what you 
intend to process and try some variant approaches to really find 
out, which one is best for the task. Do you know, that it is possible 
to index just a specific record field, i.e. you don't need to make 
a hash containing all the data from the database?
Terry
19-May-2010
[162x2]
Yeah, i've tried some actual data finding 3270 matches out of a hash 
that is 732981 in length.. 

when it's block the search takes .033 s, and same run against has 
is 0.6

but if the matches are just a few, hash is 1000x faster
(against has = against hash)
Ladislav
19-May-2010
[164]
.033 s, and same run against has is 0.6
 - do you mean 0.6s, ie. roughly 18 times slower?
Terry
19-May-2010
[165]
yeah
Ladislav
19-May-2010
[166x2]
that is interesting, can you post your data generator?
or, do you use the real-world data?
Maxim
19-May-2010
[168]
the only thing that I'm thinking is that when the hash index changes, 
its rehashing its content... which is strange.
Terry
19-May-2010
[169]
it's maxim's ultimate-find above ( and im using real world data)