r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Profiling] Rebol code optimisation and algorithm comparisons.

Steeve
19-May-2010
[220x4]
by doing, index2/"age" you'll get all the triple having the verb 
"age"
with index3/"75", you'll get those with 75 as the target
then intersect the 2 blocks
I just think you don't get it, but I know my explanation sucks, sorry.
Terry
19-May-2010
[224]
i get it.
Sunanda
19-May-2010
[225]
Ron Everett presented  a database that did much of what you want 
at DevCon2007. The live discussion is here:

   http://www.rebol.org/aga-display-posts.r?offset=0&post=r3wp500x1919
The video of the presentation may be on qtask.
Pekr
19-May-2010
[226x2]
yeah, associative stuff :-)
Senteces from Lazysoft was another product of such kind ...
Terry
19-May-2010
[228]
i remember that
Steeve
19-May-2010
[229x2]
http://www.rebol.net/cgi-bin/r3blog.r?view=0161
And my comment...

it's remember me the IDE Plex(obsydian) in the nineties. it used 
widely the concept of triples (tuples) to modelize applications and 
databases.
Terry
19-May-2010
[231x3]
hmm, i thought i got it. :( now I'm lost in block hell
nope.. i got it again :)
Should have listened to my mother and became a lawyer.
Pekr
19-May-2010
[234]
why :-)
Terry
19-May-2010
[235x3]
Shouldn't Intersect take an block of blocks?
ie: intersect [ blk1 blk2 blk3.. ]
Works great Steeve. I haven't benchmarked it yet, but given the whole 
thing can sit in hashes or maps! , i would say the performance should 
be spectacular
Gregg
19-May-2010
[238]
Terry, I think INTERSECT is fine the way it is, and it's easy to 
wrap if you want.

    fold: func [
        series [series!]
        fn     [any-function!]
        /local value
    ][
        value: pick series 1
        foreach item next series [value: fn value item]
    ]

    intersect-all: func [
        series [block!] "A block of series values to intersect"
    ][
        fold series :intersect
    ]
Terry
19-May-2010
[239]
I knew someone would comback with a function.. might as well have 
been you Gregg.
I like PHP's array_merge()
Maxim
19-May-2010
[240]
steeve, REBOL doesn't support path with strings, and furthermore, 
it would only return the first index, if you used it within a paren.


so I'd really like you to give a small snippet of code with your 
idea, I am curious about your method... cause I don't see *how* what 
you say works.
Steeve
20-May-2010
[241]
I would use map! (or hash!) as indexes and a key would contain one 
or several triples (inside a block)
verb/"age": [ index indexn ...]
Ladislav
20-May-2010
[242]
Max, Steeve proposed the datastructure so, that the above feach operations 
become elementary ("instantaneous").
Steeve
20-May-2010
[243]
Yeah !!! :)
Terry
20-May-2010
[244x2]
Here's what I did..
ie: [["Steeve" "isa" "person"]["Steeve" "age" "42"]["Tweety" "isa" 
"Canary"]["Tweety" "age" "75"]["Chirp" "isa" "Canary"]]
i1: ["Steeve" [1 2] "Tweety" [3 4] "Chirp" [5]]
i2: ["isa" [1 3 5] "age" [2 4]]
i3: ["person" [1] "42" [2] "Canary" [3 5]]


rocket: func [it][
    st: now/precise

	while[it > 0][
		s: i1/("Steeve")
		p: i2/("isa")

		res1: intersect s p
		it: it - 1
	]
	prin join " -> " difference now/precise st
]
Steeve
20-May-2010
[246x2]
Yeah !!! You got it :)
and i1,i2,i3 should be of type hash! or map!
Terry
20-May-2010
[248x4]
This finds all "Steve isa... "

If i wanted to find all "Canaries".. 

p: i2/("isa")
v: i3/("Canary")

result intersect p v
yeah.. im just building some benchmarks with real data now
my rebol is so rusty.. how do i turn 

["2c3cukne98" "femaleend" "E1016 Camlok" "2c3cukne98" "isa" "cable" 
"2c3cukne98" "uid" "8558"... ]

into 

[["2c3cukne98" "femaleend" "E1016 Camlok"][ "2c3cukne98" "isa" "cable" 
]["2c3cukne98" "uid" "8558"]...
remold make-block reduce enlarge.. or somethin.. ?
Steeve
20-May-2010
[252]
map-each
Ladislav
20-May-2010
[253]
To make it practical for other operations too, the triple store should 
be defined so, that:


1) every triple is stored as a fourtuple, the additional field being 
a unique triple ID
2) create the main index using the triple ID, 

3) create three field indices searchable by by the field contents, 
and "inside" also by the main ID


This way, the triple storage will be moderately fast for triple removal 
as well
Izkata
20-May-2010
[254]
For R2, I wrote a function called Stagger for that:
Stagger: func [L I /local][
    forall L [
        L/1: copy/part L I 
        remove/part next L (I - 1)
    ] 
    return L
]

>> Stagger ["2c3cukne98" "femaleend" "E1016 Camlok" "2c3cukne98" 
"isa" "cable" "2c3cukne98" "uid" "8558"] 3

== [["2c3cukne98" "femaleend" "E1016 Camlok"] ["2c3cukne98" "isa" 
"cable"] ["2c3cukne98" "uid" "8558"]]
Terry
20-May-2010
[255]
cool,  thanks
Izkata
20-May-2010
[256]
AFAIK, R2 doesn't have a built in function for it (at least, I never 
ran across it)
Ladislav
20-May-2010
[257]
hmm, the Stagger function defined this way is O(N ** 2), i.e. not 
advisable for large data
Terry
20-May-2010
[258x2]
just realizing that now :)
I can hear my HD revving up
Izkata
20-May-2010
[260]
It is?  I've only used it for relatively small data, so never had 
a problem..
Steeve
20-May-2010
[261x2]
Besides you don't need to store triples as blocks (they can be flattened) 
in the triple store, then you'll just need to do some math to calculate 
the index of a triple.

The advantage is that you could still use the triple store do some 
slow searchings.
... with find
Terry
20-May-2010
[263x7]
yeah, i thought of that, but once the inference is there, not sure 
there's a need
with maybe the exception of pattern  matching?
i1 indexes / 3 is always decimal .33333
i2 indexes / 3 is always decimal .6666
i3 indexes / 3 is integer
another advantage of a flat triple store is for simple key/value 
lookups

ie: if you have a html element  <div id="20984"> where the contents 
of the div is  "pick ie 20984"
doesn't get any faster than that
And, this data is tight.. the  250,000 real world triples (manages 
a multi-million dollar electrical equipment rental biz, including 
 1500 contacts, invoicing, and 10,000s of items)  is only 9mb... 
zips to 1.3mb
that's all the data a logic for a medium sized company stored on 
a floppy.
To make it practical for other operations too, the triple store should 
be defined so, that:

1) every triple is stored as a fourtuple, the 
additional field being a unique triple ID...

Why not just use the index of the first part of the triple? 

ie: ["tweety" "isa" "canary" "Sylvester" "isa" "cat"...]


so 1 and 4 are the two triples( i1)  and the next two are i2 and 
i3

i1: [ "Tweety" [1] "Sylvester" [4][
i2: ["isa" [1 4]
i3: ["canary" [ 1] "cat" [4]]